gxm.Timestep#
- class Timestep(reward, terminated, truncated, obs, true_obs, info)#
Bases:
objectClass representing a single timestep \((R_i, S_{i+1})\) in an environment. Where \(R_i\) is the reward received after taking an action at timestep \(i\) and \(S_{i+1}\) is the observation at the next timestep. In case of truncation,
true_obsrepresents the observation \(\hat{S}_{i+1}\) that would have been observed if the episode had not been truncated.- __init__(reward, terminated, truncated, obs, true_obs, info)#
Methods
__init__(reward, terminated, truncated, obs, ...)trajectory(first_obs, action[, first_info])Convert a sequence of timesteps \((R_0, S_1, ..., S_n)\) with the first observation \(S_0\) and the actions \((A_0, A_1, ..., A_{n-1})\) into a trajectory \((S_0, A_0, R_0, S_1, ..., S_n)\).
transition(prev_obs, action[, prev_info])Convert the current timestep \((R_t, S_{t+1})\) into a transition \((S_t, A_t, R_t, S_{t+1})\) given the previous observation \(S_t\) and the action \(A_t\).
Attributes
Whether the episode has terminated or been truncated.
The reward \(R_i\) received at this timestep \(i\).
Whether the episode has terminated at this timestep.
Whether the episode has been truncated at this timestep.
The observation \(S_{i+1}\) at this timestep \(i\).
The true observation \(\hat{S}_{i+1}\) at this timestep.
Additional information about the timestep.
- property done: Array#
Whether the episode has terminated or been truncated.
-
info:
dict[str,Any]# Additional information about the timestep.
-
obs:
Any# The observation \(S_{i+1}\) at this timestep \(i\).
-
reward:
Array# The reward \(R_i\) received at this timestep \(i\).
-
terminated:
Array# Whether the episode has terminated at this timestep.
- trajectory(first_obs, action, first_info={})#
Convert a sequence of timesteps \((R_0, S_1, ..., S_n)\) with the first observation \(S_0\) and the actions \((A_0, A_1, ..., A_{n-1})\) into a trajectory \((S_0, A_0, R_0, S_1, ..., S_n)\).
- Parameters:
first_obs (
Any) – The observation at the first timestep.action (
Array) – The action taken at each timestep.
- Return type:
- Returns:
A Trajectory object containing the sequence of timesteps.
- transition(prev_obs, action, prev_info={})#
Convert the current timestep \((R_t, S_{t+1})\) into a transition \((S_t, A_t, R_t, S_{t+1})\) given the previous observation \(S_t\) and the action \(A_t\). :type prev_obs:
Any:param prev_obs: The observation at the previous timestep. :type action:Array:param action: The action taken at the current timestep. :type prev_info:dict[str,Any] :param prev_info: The info at the previous timestep.- Return type:
- Returns:
A Transition object containing the current and next timesteps.
-
true_obs:
Any# The true observation \(\hat{S}_{i+1}\) at this timestep. This may differ from
obsin environments that allow truncation, if and only if truncation is True.
-
truncated:
Array# Whether the episode has been truncated at this timestep.