gxm.Timestep#

class Timestep(reward, terminated, truncated, obs, true_obs, info)#

Bases: object

Class representing a single timestep \((R_i, S_{i+1})\) in an environment. Where \(R_i\) is the reward received after taking an action at timestep \(i\) and \(S_{i+1}\) is the observation at the next timestep. In case of truncation, true_obs represents the observation \(\hat{S}_{i+1}\) that would have been observed if the episode had not been truncated.

__init__(reward, terminated, truncated, obs, true_obs, info)#

Methods

`__init__`(reward, terminated, truncated, obs, ...)
`trajectory`(first_obs, action[, first_info])	Convert a sequence of timesteps \((R_0, S_1, ..., S_n)\) with the first observation \(S_0\) and the actions \((A_0, A_1, ..., A_{n-1})\) into a trajectory \((S_0, A_0, R_0, S_1, ..., S_n)\).
`transition`(prev_obs, action[, prev_info])	Convert the current timestep \((R_t, S_{t+1})\) into a transition \((S_t, A_t, R_t, S_{t+1})\) given the previous observation \(S_t\) and the action \(A_t\).

Attributes

`done`	Whether the episode has terminated or been truncated.
`reward`	The reward \(R_i\) received at this timestep \(i\).
`terminated`	Whether the episode has terminated at this timestep.
`truncated`	Whether the episode has been truncated at this timestep.
`obs`	The observation \(S_{i+1}\) at this timestep \(i\).
`true_obs`	The true observation \(\hat{S}_{i+1}\) at this timestep.
`info`	Additional information about the timestep.

property done: Array#: Whether the episode has terminated or been truncated.

info: dict[str, Any]#: Additional information about the timestep.

obs: Any#: The observation \(S_{i+1}\) at this timestep \(i\).

reward: Array#: The reward \(R_i\) received at this timestep \(i\).

terminated: Array#: Whether the episode has terminated at this timestep.

trajectory(first_obs, action, first_info={})#

Convert a sequence of timesteps \((R_0, S_1, ..., S_n)\) with the first observation \(S_0\) and the actions \((A_0, A_1, ..., A_{n-1})\) into a trajectory \((S_0, A_0, R_0, S_1, ..., S_n)\).

Parameters:

first_obs (Any) – The observation at the first timestep.
action (Array) – The action taken at each timestep.

Return type:

Trajectory

Returns:

A Trajectory object containing the sequence of timesteps.

transition(prev_obs, action, prev_info={})#

Convert the current timestep \((R_t, S_{t+1})\) into a transition \((S_t, A_t, R_t, S_{t+1})\) given the previous observation \(S_t\) and the action \(A_t\). :type prev_obs: Any :param prev_obs: The observation at the previous timestep. :type action: Array :param action: The action taken at the current timestep. :type prev_info: dict[str, Any] :param prev_info: The info at the previous timestep.

Return type:: Transition
Returns:: A Transition object containing the current and next timesteps.

true_obs: Any#: The true observation \(\hat{S}_{i+1}\) at this timestep. This may differ from obs in environments that allow truncation, if and only if truncation is True.

truncated: Array#: Whether the episode has been truncated at this timestep.

gxm.Timestep

Contents

gxm.Timestep#