gxm.Timestep#

class Timestep(reward, terminated, truncated, obs, true_obs, info)#

Bases: object

Class representing a single timestep \((R_i, S_{i+1})\) in an environment. Where \(R_i\) is the reward received after taking an action at timestep \(i\) and \(S_{i+1}\) is the observation at the next timestep. In case of truncation, true_obs represents the observation \(\hat{S}_{i+1}\) that would have been observed if the episode had not been truncated.

__init__(reward, terminated, truncated, obs, true_obs, info)#

Methods

__init__(reward, terminated, truncated, obs, ...)

trajectory(first_obs, action[, first_info])

Convert a sequence of timesteps \((R_0, S_1, ..., S_n)\) with the first observation \(S_0\) and the actions \((A_0, A_1, ..., A_{n-1})\) into a trajectory \((S_0, A_0, R_0, S_1, ..., S_n)\).

transition(prev_obs, action[, prev_info])

Convert the current timestep \((R_t, S_{t+1})\) into a transition \((S_t, A_t, R_t, S_{t+1})\) given the previous observation \(S_t\) and the action \(A_t\).

Attributes

done

Whether the episode has terminated or been truncated.

reward

The reward \(R_i\) received at this timestep \(i\).

terminated

Whether the episode has terminated at this timestep.

truncated

Whether the episode has been truncated at this timestep.

obs

The observation \(S_{i+1}\) at this timestep \(i\).

true_obs

The true observation \(\hat{S}_{i+1}\) at this timestep.

info

Additional information about the timestep.

property done: Array#

Whether the episode has terminated or been truncated.

info: dict[str, Any]#

Additional information about the timestep.

obs: Any#

The observation \(S_{i+1}\) at this timestep \(i\).

reward: Array#

The reward \(R_i\) received at this timestep \(i\).

terminated: Array#

Whether the episode has terminated at this timestep.

trajectory(first_obs, action, first_info={})#

Convert a sequence of timesteps \((R_0, S_1, ..., S_n)\) with the first observation \(S_0\) and the actions \((A_0, A_1, ..., A_{n-1})\) into a trajectory \((S_0, A_0, R_0, S_1, ..., S_n)\).

Parameters:
  • first_obs (Any) – The observation at the first timestep.

  • action (Array) – The action taken at each timestep.

Return type:

Trajectory

Returns:

A Trajectory object containing the sequence of timesteps.

transition(prev_obs, action, prev_info={})#

Convert the current timestep \((R_t, S_{t+1})\) into a transition \((S_t, A_t, R_t, S_{t+1})\) given the previous observation \(S_t\) and the action \(A_t\). :type prev_obs: Any :param prev_obs: The observation at the previous timestep. :type action: Array :param action: The action taken at the current timestep. :type prev_info: dict[str, Any] :param prev_info: The info at the previous timestep.

Return type:

Transition

Returns:

A Transition object containing the current and next timesteps.

true_obs: Any#

The true observation \(\hat{S}_{i+1}\) at this timestep. This may differ from obs in environments that allow truncation, if and only if truncation is True.

truncated: Array#

Whether the episode has been truncated at this timestep.