gxm.wrappers.RecordEpisodeStatistics#
- class RecordEpisodeStatistics(env, unwrap=True, gamma=1.0, n_episodes=1)#
Bases:
Wrapper[RecordEpisodeStatisticsState]A wrapper that records the episode length \(T\) , episodic return \(J(\tau) = \sum_{t=0}^{T} r_t\) , and discounted episodic return \(G(\tau) = \sum_{t=0}^{T} \gamma^t r_t\) at the end of each episode. The statistics can be accessed from the
infofield of theTimestepreturned by the environment. It will contain the stats of the most recent finished episode. By default , the discount factor \(\gamma\) is set to 1.0, meaning that the episodic return and discounted episodic return are the same.- __init__(env, unwrap=True, gamma=1.0, n_episodes=1)#
Methods
__init__(env[, unwrap, gamma, n_episodes])get_averaged_stats(episode_stats)get_wrapper(wrapper_type)Retrieve the first wrapper of a specific type from the environment.
has_wrapper(wrapper_type)Check if the environment or any of its wrappers is of a specific type.
init(key)Initialize the environment and return the initial state.
reset(key, env_state)Reset the environment to its initial state.
step(key, env_state, action)Perform a step in the environment given an action.
Attributes
unwrapunwrappedRetrieve the base environment by unwrapping all wrappers.
The discount factor \(\gamma\) for calculating the discounted episodic return.
The number of past episodes to record statistics for.
The unique identifier of the environment.
The action space of the environment.
The observation space of the environment.
- action_space: Space#
The action space of the environment.
- env: Environment#
-
gamma:
float# The discount factor \(\gamma\) for calculating the discounted episodic return.
- static get_averaged_stats(episode_stats)#
- Return type:
dict[str,Array]
- id: str#
The unique identifier of the environment.
- init(key)#
Initialize the environment and return the initial state.
- Parameters:
key (
Array) – A JAX random key for any stochastic initialization.- Return type:
tuple[RecordEpisodeStatisticsState,Timestep]- Returns:
A tuple containing the initial environment state and the initial timestep.
-
n_episodes:
int# The number of past episodes to record statistics for.
- observation_space: Space#
The observation space of the environment.
- reset(key, env_state)#
Reset the environment to its initial state.
- Parameters:
key (
Array) – A JAX random key for any stochasticity in the environment.env_state (
RecordEpisodeStatisticsState) – The current state of the environment.
- Return type:
tuple[RecordEpisodeStatisticsState,Timestep]- Returns:
A tuple containing the reset environment state and the initial timestep.
- step(key, env_state, action)#
Perform a step in the environment given an action.
- Parameters:
key (
Array) – A JAX random key for any stochasticity in the environment.env_state (
RecordEpisodeStatisticsState) – The current state of the environment.action (
Any) – The action to take in the environment.
- Return type:
tuple[RecordEpisodeStatisticsState,Timestep]- Returns:
A tuple containing the new environment state and the resulting timestep.