Pre-trained Policies#

Pre-trained policies are available in agile/data/policy/.

Directory Structure#

policy/
  velocity_g1/              # G1 - Velocity tracking (TorchScript)
  velocity_height_g1/       # G1 - Velocity + height (TorchScript + Checkpoint)
    exported/               # Exported student policy (TorchScript + ONNX)
    *_teacher.pt            # Teacher policy (TorchScript)
    *_student.pt            # Student policy (TorchScript)
    *_student_checkpoint.pt # Student training checkpoint (State dict)
  velocity_t1/              # T1 - Velocity tracking (TorchScript)

Available Policies#

Policy

Task

Commands

Format

Description

velocity_g1/unitree_g1_velocity_history.pt

Velocity-G1-History-v0

v_x, v_y, w_z

TorchScript

History-based

velocity_height_g1/unitree_g1_velocity_height_teacher.pt

Velocity-Height-G1-v0

v_x, v_y, w_z

TorchScript

Privileged teacher

velocity_height_g1/unitree_g1_velocity_height_recurrent_student.pt

Velocity-Height-G1-Distillation-Recurrent-v0

v_x, v_y, w_z, h

TorchScript

Recurrent LSTM student

velocity_height_g1/unitree_g1_velocity_height_recurrent_student_checkpoint.pt

Velocity-Height-G1-Distillation-Recurrent-v0

v_x, v_y, w_z, h

State dict

Training checkpoint

velocity_t1/booster_t1_velocity_v0.pt

Velocity-T1-v0

v_x, v_y, w_z

TorchScript

History-based

Note

Root linear velocity is considered privileged information, as accurate estimation usually requires additional hardware during deployment. Only the velocity-height teacher policy accesses this information; all other policies are suitable for direct deployment on real robots. The velocity-height policies are tuned for improved command tracking performance. The teacher policy is also useful in simulation since it observes privileged linear velocity and performs better at velocity tracking.

Policy Formats#

  • TorchScript (.pt + .yaml): Exported policies ready for deployment. Self-contained with normalizer included. Load with torch.jit.load().

  • State dict (.pt only): Training checkpoints containing model_state_dict, optimizer_state_dict, and iter. Load with torch.load(). Required for resuming training or batched evaluation.

  • ONNX (.onnx): For hardware inference engines.

  • YAML files: Required for TorchScript policy deployment in MuJoCo and on real hardware, containing task and architecture configs.

Usage#

# TorchScript policies (auto-detected)
python scripts/eval.py --task Velocity-G1-History-v0 \
    --checkpoint agile/data/policy/velocity_g1/unitree_g1_velocity_history.pt

# State dict checkpoint (for batched evaluation / resuming training)
python scripts/eval.py --task Velocity-Height-G1-Distillation-Recurrent-v0 \
    --checkpoint agile/data/policy/velocity_height_g1/unitree_g1_velocity_height_recurrent_student_checkpoint.pt

The evaluation script automatically detects the format, loads accordingly, and exports policies to exported/ (TorchScript + ONNX).

I/O Descriptor Export#

Export observation and action space descriptors for deployment:

python scripts/export_IODescriptors.py --task Velocity-T1-v0 --output_dir .

Generates a YAML file describing the model’s input/output spaces, used by the sim-to-MuJoCo framework and deployment pipelines.