Skip to content

Action Space

Actions can generally be divided into planning and control. Some environments offer actions like goto(object), which belong to planning. Planning actions cannot operate in unannotated scenes. The potential of control actions is significantly greater, both in terms of operational capability and generalization ability. LEGENT adopts control actions. However, LEGENT currently does not employ real robot controls, which theoretically would require precise control of each joint's rotation and more, making it overly complex for researchers not working with physical robots. While ensuring that the action is a control type, we have also simplified the control difficulty as much as possible.

Below is the actions of an agent. Note that, along with basic move-forward action (equivalent to pressing the forward key on the keyboard), LEGENT also supports moving forward over any distance immediately (teleport_forward). During the continuous move-forward process, the information increment in repeatedly "move forward, move forward, move forward" is very small. This is not a great issue for small models. However, in cases where the computation cost is significantly high for large models during training, using the 'move forward with a distance' can greatly increase the information of the samples. When deployed for use, the model infers the distance to move forward, allowing to wait until this distance is covered before performing the next inference, effectively avoiding the huge overhead brought by inferring every frame. This action design is firstly employed by LEGENT, as a platform aimed at large models.

Action Descriptions Details
text text send to the player string.
If it is empty, it means nothing to sent.
move_forward move forward in the next frame bool. If True, go forward. When use_teleport==False, it becomes effective.
teleport_forward move forward with a distance float. The number of meters to travel forward. When use_teleport==True, it becomes effective.
rotate_right rotate camera horizontally float. [-180, 180). Positive value means rotating right. Negative values mean rotating left
rotate_down rotate camera vertically float. [-90, 90). Positive value means rotating downwards. Negative values mean rotating upwards
grab grab bool. If True and the agent is holding an object, grab the object at the center of the image. If True and not holding, put the object on the surface at the center of the image
api_calls api calls to the environment List[Callable]. The api returns will be put in the returned observations.

The types of these actions vary, but they are all expressed by codes for the model. For example:

speak("OK")
move_forward(2.4)
rotate_right(35)

Below is the APIs provided to python by the environment.

API Descriptions Params Returns
PathToUser Obtain the key points of the path to walk towards the player. The agent can walk to the player along the key points one by one in straight line without barriers in between. None api_returns['corners'] is the list of key points.
PathToObject Obtain the key points of the path to walk towards an object. The index of the object in the scene config api_returns['corners'] is the list of key points.