gxm.wrappers

gxm.wrappers#

Wrappers for gxm environments.

class AutoReset(wrapped, unwrap=True)#

Bases: EnvironmentWrapper[Any]

Wrapper that automatically resets an environment on episode end.

On each step, both the stepped state and a freshly reset state are computed and blended via jnp.where on done, keeping output shapes static under jit/vmap/scan.

init(key)#

Initialize the dynamics and return the initial state.

Parameters:: key (Array) – A JAX random key for any stochastic initialization.
Return type:: tuple[DynamicsState, Timestep]
Returns:: A tuple of the initial state and the initial step output.

reset(key, state)#

Reset the dynamics to an initial state.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (DynamicsState) – The current state.

Return type:

tuple[DynamicsState, Timestep]

Returns:

A tuple of the reset state and the initial step output.

step(key, state, action)#

Advance the dynamics by one step given an action.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (DynamicsState) – The current state.
action (Any) – The action to apply.

Return type:

tuple[DynamicsState, Timestep]

Returns:

A tuple of the new state and the resulting step output.

class ClipReward(wrapped, unwrap=True, min=-1.0, max=1.0)#

Bases: EnvironmentWrapper[Any]

Wrapper that clips the reward to a specified range.

clip(reward)#

Return type:: Array

init(key)#

Initialize the dynamics and return the initial state.

Parameters:: key (Array) – A JAX random key for any stochastic initialization.
Return type:: tuple[DynamicsState, Timestep]
Returns:: A tuple of the initial state and the initial step output.

reset(key, state)#

Reset the dynamics to an initial state.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (DynamicsState) – The current state.

Return type:

tuple[DynamicsState, Timestep]

Returns:

A tuple of the reset state and the initial step output.

step(key, state, action)#

Advance the dynamics by one step given an action.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (DynamicsState) – The current state.
action (Any) – The action to apply.

Return type:

tuple[DynamicsState, Timestep]

Returns:

A tuple of the new state and the resulting step output.

wrapped: Environment#

class Discretize(wrapped, actions, unwrap=True)#

Bases: Wrapper[Any, TStep]

Wrapper that discretizes a continuous action space. Maps a discrete set of actions to the continuous action space of the environment. The actions are specified as a list of continuous actions \(A\). The action space of the wrapped environment is then \(\{0, 1, \ldots, |A|-1\}\).

>>> import gxm
>>> from gxm.wrappers import Discretize
>>> env = make("Gymnasium/Pendulum-v1")
>>> actions = jnp.array([-2.0, 0.0, 2.0])
>>> env = Discretize(env, actions)

The actions passed to the Discretize wrapper need to be of shape \((|A|, D)\), where \(|A|\) is the number of discrete actions and \(D\) is the dimensionality of the continuous action space of the wrapped environment.

actions: Any#

init(key)#

Initialize the dynamics and return the initial state.

Parameters:: key (Array) – A JAX random key for any stochastic initialization.
Return type:: tuple[DynamicsState, TypeVar(TStep, bound= Step, covariant=True)]
Returns:: A tuple of the initial state and the initial step output.

reset(key, state)#

Reset the dynamics to an initial state.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (DynamicsState) – The current state.

Return type:

tuple[DynamicsState, TypeVar(TStep, bound= Step, covariant=True)]

Returns:

A tuple of the reset state and the initial step output.

step(key, state, action)#

Advance the dynamics by one step given an action.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (DynamicsState) – The current state.
action (Any) – The action to apply.

Return type:

tuple[DynamicsState, TypeVar(TStep, bound= Step, covariant=True)]

Returns:

A tuple of the new state and the resulting step output.

wrapped: Dynamics[Any, TypeVar(TStep, bound= Step, covariant=True)]#

class EnvironmentWrapper(wrapped, unwrap=True)#

Bases: Generic[TWrapperState], Wrapper[TWrapperState, Timestep]

Base class for wrappers that only operate on Environments (need reward/terminated/truncated).

wrapped: Environment#

class EpisodeCounter(wrapped, unwrap=True)#

Bases: EnvironmentWrapper[EpisodeCounterState]

A wrapper that counts the number of episodes completed in the environment.

init(key)#

Initialize the dynamics and return the initial state.

Parameters:: key (Array) – A JAX random key for any stochastic initialization.
Return type:: tuple[EpisodeCounterState, Timestep]
Returns:: A tuple of the initial state and the initial step output.

reset(key, state)#

Reset the dynamics to an initial state.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (EpisodeCounterState) – The current state.

Return type:

tuple[EpisodeCounterState, Timestep]

Returns:

A tuple of the reset state and the initial step output.

step(key, state, action)#

Advance the dynamics by one step given an action.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (EpisodeCounterState) – The current state.
action (Any) – The action to apply.

Return type:

tuple[EpisodeCounterState, Timestep]

Returns:

A tuple of the new state and the resulting step output.

class EpisodicLife(wrapped)#

Bases: EnvironmentWrapper[EpisodicLifeState]

A wrapper that makes losing a life in an environment (like Atari games) count as the end of an episode. It assumes that the environment’s timestep info dictionary contains a “lives” key indicating the number of lives remaining.

init(key)#

Initialize the dynamics and return the initial state.

Parameters:: key (Array) – A JAX random key for any stochastic initialization.
Return type:: tuple[EpisodicLifeState, Timestep]
Returns:: A tuple of the initial state and the initial step output.

reset(key, state)#

Reset the dynamics to an initial state.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (EpisodicLifeState) – The current state.

Return type:

tuple[EpisodicLifeState, Timestep]

Returns:

A tuple of the reset state and the initial step output.

step(key, state, action)#

Advance the dynamics by one step given an action.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (EpisodicLifeState) – The current state.
action (Any) – The action to apply.

Return type:

tuple[EpisodicLifeState, Timestep]

Returns:

A tuple of the new state and the resulting step output.

wrapped: Environment#

class Evaluate(wrapped, unwrap=True)#

Bases: EnvironmentWrapper[EvaluateState]

init(key)#

Initialize the dynamics and return the initial state.

Parameters:: key (Array) – A JAX random key for any stochastic initialization.
Return type:: tuple[EvaluateState, Timestep]
Returns:: A tuple of the initial state and the initial step output.

reset(key, state)#

Reset the dynamics to an initial state.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (EvaluateState) – The current state.

Return type:

tuple[EvaluateState, Timestep]

Returns:

A tuple of the reset state and the initial step output.

step(key, state, action)#

Advance the dynamics by one step given an action.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (EvaluateState) – The current state.
action (Any) – The action to apply.

Return type:

tuple[EvaluateState, Timestep]

Returns:

A tuple of the new state and the resulting step output.

wrapped: Environment#

class FlattenObservation(wrapped, unwrap=True)#

Bases: Wrapper[Any, TStep]

Wrapper that adds a rollout method to the environment.

classmethod flatten(obs)#

Return type:: Array

init(key)#

Initialize the dynamics and return the initial state.

Parameters:: key (Array) – A JAX random key for any stochastic initialization.
Return type:: tuple[DynamicsState, TypeVar(TStep, bound= Step, covariant=True)]
Returns:: A tuple of the initial state and the initial step output.

reset(key, state)#

Reset the dynamics to an initial state.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (DynamicsState) – The current state.

Return type:

tuple[DynamicsState, TypeVar(TStep, bound= Step, covariant=True)]

Returns:

A tuple of the reset state and the initial step output.

step(key, state, action)#

Advance the dynamics by one step given an action.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (DynamicsState) – The current state.
action (Any) – The action to apply.

Return type:

tuple[DynamicsState, TypeVar(TStep, bound= Step, covariant=True)]

Returns:

A tuple of the new state and the resulting step output.

class IgnoreTruncation(wrapped)#

Bases: EnvironmentWrapper[Any]

A wrapper that treats truncation as termination.

Truncation is folded into the terminated flag and true_next_obs is set equal to next_obs, so downstream code sees a plain termination with no distinction between the two episode-ending conditions.

init(key)#

Initialize the dynamics and return the initial state.

Parameters:: key (Array) – A JAX random key for any stochastic initialization.
Return type:: tuple[DynamicsState, Timestep]
Returns:: A tuple of the initial state and the initial step output.

reset(key, state)#

Reset the dynamics to an initial state.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (DynamicsState) – The current state.

Return type:

tuple[DynamicsState, Timestep]

Returns:

A tuple of the reset state and the initial step output.

step(key, state, action)#

Advance the dynamics by one step given an action.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (DynamicsState) – The current state.
action (Any) – The action to apply.

Return type:

tuple[DynamicsState, Timestep]

Returns:

A tuple of the new state and the resulting step output.

class RecordEpisodeStatistics(wrapped, unwrap=True, gamma=1.0, n_episodes=1)#

Bases: EnvironmentWrapper[RecordEpisodeStatisticsState]

A wrapper that records the episode length \(T\) , episodic return \(J(\tau) = \sum_{t=0}^{T} r_t\) , and discounted episodic return \(G(\tau) = \sum_{t=0}^{T} \gamma^t r_t\) at the end of each episode. The statistics can be accessed from the info field of the Timestep returned by the environment. It will contain the stats of the most recent finished episode. By default , the discount factor \(\gamma\) is set to 1.0, meaning that the episodic return and discounted episodic return are the same.

gamma: float#: The discount factor \(\gamma\) for calculating the discounted episodic return.

static get_averaged_stats(episode_stats)#

Return type:: dict[str, Array]

init(key)#

Initialize the dynamics and return the initial state.

Parameters:: key (Array) – A JAX random key for any stochastic initialization.
Return type:: tuple[RecordEpisodeStatisticsState, Timestep]
Returns:: A tuple of the initial state and the initial step output.

n_episodes: int#: The number of past episodes to record statistics for.

reset(key, state)#

Reset the dynamics to an initial state.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (RecordEpisodeStatisticsState) – The current state.

Return type:

tuple[RecordEpisodeStatisticsState, Timestep]

Returns:

A tuple of the reset state and the initial step output.

step(key, state, action)#

Advance the dynamics by one step given an action.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (RecordEpisodeStatisticsState) – The current state.
action (Any) – The action to apply.

Return type:

tuple[RecordEpisodeStatisticsState, Timestep]

Returns:

A tuple of the new state and the resulting step output.

class StackObservations(wrapped, n_stack, padding='reset')#

Bases: EnvironmentWrapper[StackObservationsState]

Wrapper that stacks the observation along a new axis.

init(key)#

Initialize the dynamics and return the initial state.

Parameters:: key (Array) – A JAX random key for any stochastic initialization.
Return type:: tuple[StackObservationsState, Timestep]
Returns:: A tuple of the initial state and the initial step output.

num_stack: int#

padding: str#

reset(key, state)#

Reset the dynamics to an initial state.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (StackObservationsState) – The current state.

Return type:

tuple[StackObservationsState, Timestep]

Returns:

A tuple of the reset state and the initial step output.

step(key, state, action)#

Advance the dynamics by one step given an action.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (StackObservationsState) – The current state.
action (Any) – The action to apply.

Return type:

tuple[StackObservationsState, Timestep]

Returns:

A tuple of the new state and the resulting step output.

class StepCounter(wrapped, unwrap=True)#

Bases: Wrapper[StepCounterState, TStep]

A wrapper that counts the number of steps taken in the environment.

init(key)#

Initialize the dynamics and return the initial state.

Parameters:: key (Array) – A JAX random key for any stochastic initialization.
Return type:: tuple[StepCounterState, TypeVar(TStep, bound= Step, covariant=True)]
Returns:: A tuple of the initial state and the initial step output.

reset(key, state)#

Reset the dynamics to an initial state.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (StepCounterState) – The current state.

Return type:

tuple[StepCounterState, TypeVar(TStep, bound= Step, covariant=True)]

Returns:

A tuple of the reset state and the initial step output.

step(key, state, action)#

Advance the dynamics by one step given an action.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (StepCounterState) – The current state.
action (Any) – The action to apply.

Return type:

tuple[StepCounterState, TypeVar(TStep, bound= Step, covariant=True)]

Returns:

A tuple of the new state and the resulting step output.

class StickyAction(wrapped, unwrap=True, stickiness=0.25)#

Bases: Wrapper[StickyActionState, TStep]

A wrapper that makes actions sticky with a given probability.

init(key)#

Initialize the dynamics and return the initial state.

Parameters:: key (Array) – A JAX random key for any stochastic initialization.
Return type:: tuple[StickyActionState, TypeVar(TStep, bound= Step, covariant=True)]
Returns:: A tuple of the initial state and the initial step output.

reset(key, state)#

Reset the dynamics to an initial state.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (StickyActionState) – The current state.

Return type:

tuple[StickyActionState, TypeVar(TStep, bound= Step, covariant=True)]

Returns:

A tuple of the reset state and the initial step output.

step(key, state, action)#

Advance the dynamics by one step given an action.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (StickyActionState) – The current state.
action (Any) – The action to apply.

Return type:

tuple[StickyActionState, TypeVar(TStep, bound= Step, covariant=True)]

Returns:

A tuple of the new state and the resulting step output.

class TimeLimit(wrapped, unwrap=True, time_limit=1000)#

Bases: EnvironmentWrapper[TimeLimitState]

Wrapper that terminates an episode after a fixed number of steps.

init(key)#

Initialize the dynamics and return the initial state.

Parameters:: key (Array) – A JAX random key for any stochastic initialization.
Return type:: tuple[TimeLimitState, Timestep]
Returns:: A tuple of the initial state and the initial step output.

reset(key, state)#

Reset the dynamics to an initial state.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (TimeLimitState) – The current state.

Return type:

tuple[TimeLimitState, Timestep]

Returns:

A tuple of the reset state and the initial step output.

step(key, state, action)#

Advance the dynamics by one step given an action.

Parameters:

key (Array) – A JAX random key for any stochasticity.
state (TimeLimitState) – The current state.
action (Any) – The action to apply.

Return type:

tuple[TimeLimitState, Timestep]

Returns:

A tuple of the new state and the resulting step output.

wrapped: Environment#

class Wrapper(wrapped, unwrap=True)#

Bases: Generic[TWrapperState, TStep], Dynamics[TWrapperState, TStep]

Base class for wrappers in gxm, over either bare Dynamics or an Environment.

get_wrapper(wrapper_type)#

Retrieve the first wrapper of a specific type from the dynamics.

Parameters:: wrapper_type (type[Dynamics]) – The type of the wrapper to retrieve.
Return type:: Dynamics
Returns:: The first wrapper of the specified type.
Raises:: ValueError – If no wrapper of the specified type is found.

has_wrapper(wrapper_type)#

Check if the dynamics or any of its wrappers is of a specific type.

Parameters:: wrapper_type (type[Dynamics]) – The type to check for.
Return type:: bool
Returns:: True if the dynamics or any of its wrappers is of the specified type, False otherwise.

unwrap: bool = True#

property unwrapped: Dynamics#

Retrieve the base dynamics by unwrapping all wrappers.

Returns:: The base dynamics without any wrappers.

wrapped: Dynamics[Any, TypeVar(TStep, bound= Step, covariant=True)]#

gxm.wrappers

Contents

gxm.wrappers#