Training Guide#
AGILE uses the same training script interface as Isaac Lab. Run python scripts/train.py -h for all available arguments.
Training#
python scripts/train.py --task Velocity-T1-v0 --num_envs 4096 --headless --logger wandb
Resuming Training#
python scripts/train.py \
--task Velocity-T1-v0 \
--num_envs 4096 \
--headless \
--resume True \
--load_run "2024-01-15_.*" \
--checkpoint "model_5000.pt"
Hydra Overrides#
Environment and agent parameters can be overridden via Hydra-style arguments:
python scripts/train.py \
--task Velocity-T1-v0 \
--num_envs 4096 \
--headless \
env.rewards.track_lin_vel_xy_exp.weight=2.0 \
agent.algorithm.learning_rate=5e-4
Environment Validation (Play)#
Before training, validate your task setup (scene, actions, MDP functions) using sinusoidal test actions – no trained policy needed:
python scripts/play.py --task Velocity-T1-v0 --num_envs 2
For tasks with pre-collected fallen states (e.g., StandUp-T1-v0), validate the dataset:
python scripts/play.py --task StandUp-T1-v0 --num_envs 16 --validate-fallen-states
See also
See Evaluation for the full evaluation guide including deterministic scenarios, HTML reports, and trajectory analysis.
Teacher-Student Distillation#
The standard pipeline for sim-to-real: train a privileged teacher, then distill to a deployable student.
# 1. Train the teacher (with privileged observations)
python scripts/train.py \
--task Velocity-Height-G1-v0 \
--num_envs 4096 --headless --logger wandb
# 2. Export the teacher
python scripts/export_policy.py \
--task Velocity-Height-G1-v0 \
--checkpoint /path/to/teacher/model_30000.pt --headless
# 3. Train the student (distills from exported teacher)
python scripts/train.py \
--task Velocity-Height-G1-Distillation-Recurrent-v0 \
--num_envs 4096 --headless --logger wandb
# 4. Evaluate and export the student
python scripts/eval.py \
--task Velocity-Height-G1-Distillation-Recurrent-v0 \
--num_envs 32 \
--checkpoint /path/to/student/model_5000.pt \
--run_evaluation --save_trajectories --generate_report
python scripts/export_policy.py \
--task Velocity-Height-G1-Distillation-Recurrent-v0 \
--checkpoint /path/to/student/model_5000.pt --headless
The student task config references the exported teacher via teacher_path in the agent config. Students can be MLP or recurrent (LSTM/GRU).
Hyperparameter Sweeps#
AGILE includes a W&B sweep system for hyperparameter optimization. Files are in scripts/wandb_sweep/.
Quick Start#
# 1. Configure sweep.yaml with your parameters
# 2. Initialize (run once)
python scripts/wandb_sweep/init_sweep.py --project_name my-sweep
# 3. Start agents (run on each GPU/machine)
python scripts/wandb_sweep/run_sweep.py --project_name my-sweep --agent_count 10
The --agent_count parameter specifies how many experiments a single agent will run sequentially, not in parallel. To run in parallel, launch multiple instances across terminals or machines.
Scaled-Dictionary Parameters#
The sweep system supports a special pattern for scaling complex dictionary parameters, which is useful for robot actuator configurations. For each scaled-dictionary parameter, define:
Distribution and range (min/max as a scalar multiplier)
param_name_cli_path– the Hydra path to the parameterparam_name_base_dict– the baseline dictionary values
parameters:
# Standard parameter
agent.algorithm.learning_rate:
distribution: log_uniform_values
min: 1.0e-4
max: 1.0e-3
# Scaled-dictionary parameter: multiplies all values in the dict
p_gain_leg:
distribution: uniform
min: 0.5 # 50% of baseline
max: 2.0 # 200% of baseline
p_gain_leg_cli_path:
value: env.scene.robot.actuators.legs.stiffness
p_gain_leg_base_dict:
value: |
{
".*_hip_yaw_joint": 100.0,
".*_hip_roll_joint": 100.0,
".*_hip_pitch_joint": 100.0,
".*_knee_joint": 200.0,
"waist.*": 200.0
}
This generates Hydra override commands like:
env.scene.robot.actuators.legs.stiffness={.*_hip_yaw_joint:150.0,...}
Policy Export#
# Export to TorchScript (JIT)
python scripts/export_policy.py \
--task Velocity-T1-v0 \
--checkpoint /path/to/model.pt --headless
# Or export during evaluation
python scripts/eval.py \
--task Velocity-T1-v0 \
--checkpoint /path/to/model.pt \
--export_io