Skip to content

Observation Space

Below is the observations of an agent.

Observation Descriptions Details
image the egocentric view of the agent camera_resolution_width*camera_resolution_height numpy array
text the text received by the agent (i.e. what the player just send) string.
If it is empty, it means nothing has been sent by the user. Note that the environment does not maintain a chat history. If needed, it should be recorded by the agent itself.

You are only allowed to use image and chat as input for your agents. This is necessary to ensure the generalizability of the agent. However, during training or data generation you are allowed to use additional info from the environment. This information is returned along with the observation, with the content as follows.

Observation Descriptions Details
game_states all the game inner states json object(Dict).
api_returns the returns of the api_calls in the last action json object(Dict).

Below is the explanation for each field in game_states.

Key Descriptions Details
instances the information of all objects obs['instances'][i]['prefab'] is the prefab name of the object.
obs['instances'][i]['position'] is the position of the object.
obs['instances'][i]['forward'] the direction the object is facing.
player the information of the player obs['player']['position'] is the position of the player.
obs['player']['forward'] the direction the player is facing.
agent the information of the agent obs['agent']['position'] is the position of the agent.
obs['agent']['forward'] the direction the agent is facing
player_camera The position of the player's camera, from which the egocentric image is obtained. obs['player_camera']['position'] is the position of the camera.
obs['player_camera']['forward'] the direction the camera is facing.
agent_camera The position of the agent's camera, from which the egocentric image is obtained. obs['agent_camera']['position'] is the position of the camera.
obs['agent_camera']['forward'] the direction the camera is facing.
player_grab_instance The index of the object that the player has grabbed. "player_grab_instance": i means instances[i] is grabbed.
agent_grab_instance The index of the object that the agent has grabbed. "agent_grab_instance": i means instances[i] is grabbed.

This information is useful, for instance, for spatial calculations, determining task completion, or calculating rewards.