Evaluation#
AGILE provides two evaluation paths – Isaac Lab and Sim2MuJoCo – that follow the same design principle. Both paths share an identical workflow: load a trained policy, apply commands (deterministic schedules, sweeps, or random), roll out the policy in simulation, and save trajectory data for analysis. They use the same YAML eval config format, produce the same Parquet output schema, and work with the same plotting and analysis tools.
The two paths differ in simulator backend and feature set:
Aspect |
Isaac Lab |
Sim2MuJoCo |
|---|---|---|
Script |
|
|
Simulator |
Isaac Sim (GPU) |
MuJoCo (CPU) |
Parallel envs |
Yes (N envs) |
Single env |
Eval config |
Shared YAML format |
Shared YAML format |
Output format |
Parquet + metadata.json |
Parquet + metadata.json |
Metrics (metrics.json) |
Yes |
No |
HTML reports |
Yes |
No |
Interactive control |
No |
Keyboard teleop |
Random commands |
Yes |
Yes |
Observation noise |
Yes |
Yes |
Isaac Lab Evaluation#
# Evaluate a trained policy
python scripts/eval.py \
--task Velocity-T1-v0 \
--num_envs 32 \
--checkpoint /path/to/model.pt \
--run_evaluation
# With trajectory saving and HTML report
python scripts/eval.py \
--task Velocity-T1-v0 \
--num_envs 32 \
--checkpoint /path/to/model.pt \
--run_evaluation \
--save_trajectories \
--generate_report
# With a deterministic evaluation scenario
python scripts/eval.py \
--task Velocity-Height-G1-v0 \
--num_envs 16 \
--checkpoint /path/to/model.pt \
--run_evaluation \
--eval_config agile/algorithms/evaluation/configs/examples/x_velocity_sweep.yaml
CLI Options#
Option |
Description |
|---|---|
|
Enable PolicyEvaluator |
|
Save trajectory data to parquet files |
|
Specific fields to save (default: all) |
|
Generate HTML report (requires |
|
Path to YAML scenario config for deterministic testing |
Output Structure#
logs/rsl_rl/<experiment_name>/
trajectories/
episode_000.parquet
episode_001.parquet
...
metrics.json
reports/ # if --generate_report
index.html
episodes/
episode_000.html
...
Sim2MuJoCo Evaluation#
The Sim2MuJoCo path runs policies in MuJoCo for cross-simulator validation. See Sim-to-MuJoCo Transfer for setup instructions (policy export, MJCF acquisition).
# Interactive keyboard control
python scripts/sim2mujoco_eval.py \
--checkpoint /path/to/policy.pt \
--config /path/to/config.yaml \
--mjcf /path/to/scene.xml \
--duration 30.0
# Deterministic evaluation (same YAML config format as Isaac Lab)
python scripts/sim2mujoco_eval.py \
--checkpoint /path/to/policy.pt \
--config /path/to/config.yaml \
--mjcf /path/to/scene.xml \
--eval-config agile/sim2mujoco/configs/x_velocity_sweep.yaml \
--save-data --no-viewer
# Random commands (reproducible with seed)
python scripts/sim2mujoco_eval.py \
--checkpoint /path/to/policy.pt \
--config /path/to/config.yaml \
--mjcf /path/to/scene.xml \
--random-commands all --random-interval 2.0 --random-seed 42 \
--duration 50.0 --save-data --no-viewer
CLI Options#
Option |
Description |
|---|---|
|
Path to policy checkpoint ( |
|
Path to exported I/O descriptor YAML |
|
Path to MuJoCo MJCF file (overrides config default) |
|
Simulation duration in seconds |
|
Path to YAML eval config (deterministic command schedule) |
|
Save trajectory data to Parquet files |
|
Custom output directory for saved data |
|
Randomize commands: field names ( |
|
Seconds between random resamples (default: 2.0) |
|
RNG seed for reproducible random commands |
|
Observation noise scale (0=off, 1=match training, >1=stress test) |
|
Scale factor for PD gains (use 0.3–0.5 for stability) |
|
Disable MuJoCo viewer (headless mode) |
|
Disable real-time pacing (runs as fast as possible) |
Command Modes#
Three mutually exclusive command modes are available:
Keyboard control (default): Interactive teleoperation via the MuJoCo viewer. Arrow keys for movement, U/O for turning, Page Up/Down for height.
Eval config (
--eval-config): Deterministic command schedules from YAML files, using the same format as Isaac Lab evaluation. Duration is set from the config’sepisode_length_s.Random commands (
--random-commands): Uniform random resampling at a fixed interval. Specify individual fields (vx,vy,wz,height) orall. Use--random-seedfor reproducibility.
Note
--eval-config and --random-commands are mutually exclusive. Keyboard control is
automatically disabled when either is active or when --no-viewer is set.
Output Structure#
logs/sim2mujoco/<task>/<eval_config>_<timestamp>/
trajectories/
metadata.json
episode_000.parquet
The Parquet schema matches the Isaac Lab output: joint_pos_{i}, joint_vel_{i},
joint_acc_{i}, root_pos_{i}, root_lin_vel_robot_{i}, commands_{i}, actions_{i},
plus metadata columns (episode_id, env_id, frame_idx, timestep).
Deterministic Scenario Configs#
Both evaluation paths use the same YAML config format for deterministic testing. Configs define controlled command sequences instead of random commands, enabling systematic and reproducible evaluation.
Isaac Lab configs live in agile/algorithms/evaluation/configs/examples/;
Sim2MuJoCo configs live in agile/sim2mujoco/configs/. The format is identical –
only task-specific values (sweep ranges, durations) differ.
Two specification modes are available:
Sweep Mode#
Uniform time intervals cycling through a list of values:
evaluation:
task_name: "Velocity-Height-G1-Dev-v0"
num_envs: 4
episode_length_s: 50.0
num_episodes: 1
environments:
- env_ids: [0]
name: "x_velocity_test"
sweep:
interval: 5.0
commands:
base_velocity:
lin_vel_x: [-1.0, 0.0, 1.0]
lin_vel_y: 0.0
ang_vel_z: 0.0
base_height: 0.75
Schedule Mode#
Explicit time-based command sequences for complex maneuvers:
environments:
- env_ids: [0]
name: "complex_maneuver"
schedule:
- time: 0.0
commands:
base_velocity:
lin_vel_x: 0.5
lin_vel_y: 0.0
ang_vel_z: 0.0
base_height: 0.75
- time: 10.0
commands:
base_velocity:
lin_vel_x: 1.0
lin_vel_y: 0.0
ang_vel_z: 0.0
base_height: 0.75
Multi-Environment Testing#
Assign different tests to different environments (Isaac Lab only – Sim2MuJoCo runs a single env):
environments:
- env_ids: [0, 1]
name: "test_a"
sweep: ...
- env_ids: [2]
name: "test_b"
schedule: ...
Unassigned environments use random commands (training behavior).
Pre-built Scenarios#
Isaac Lab examples in agile/algorithms/evaluation/configs/examples/:
Config |
Description |
|---|---|
|
Forward/backward walking |
|
Lateral movement |
|
Turning |
|
Height control |
|
All capabilities in parallel (one per env) |
|
Complex maneuver sequence |
Sim2MuJoCo examples in agile/sim2mujoco/configs/:
Config |
Description |
|---|---|
|
Forward/backward velocity sweep |
|
Lateral velocity sweep |
|
Turning rate sweep |
|
Base height sweep (velocity+height tasks) |
All base_velocity commands must specify all 4 fields (lin_vel_x, lin_vel_y, ang_vel_z, base_height). Commands are automatically clamped to valid ranges defined in the task config.
Tip
Start with num_envs: 1 to validate configs. Use longer episodes than training (e.g., 50s vs 30s) for thorough testing.
HTML Reports#
Interactive HTML reports with tracking analysis and per-joint plots. Reports are generated by the Isaac Lab evaluation path. The Sim2MuJoCo path does not generate reports directly, but its Parquet output is compatible with the plotting API for custom analysis (see Analyzing Trajectories below).
Generation#
# Automatic (during evaluation)
python scripts/eval.py --task <task_name> --checkpoint path/to/model.pt \
--run_evaluation --save_trajectories --generate_report
# Manual (after evaluation)
python agile/algorithms/evaluation/generate_report.py \
--log_dir logs/evaluation/task_datetime
# Specific or failed episodes only
python agile/algorithms/evaluation/generate_report.py \
--log_dir logs/evaluation/task_datetime \
--episodes failed
# Specific episode IDs
python agile/algorithms/evaluation/generate_report.py \
--log_dir logs/evaluation/task_datetime \
--episodes 0,3,5
Report Contents#
Summary Dashboard (
index.html): Success rate, sortable episode table with search/filter, tracking error summary plotsEpisode Pages (
episodes/episode_XXX.html): Tracking performance (lin_vel_x, lin_vel_y, ang_vel_z, height), all joints organized by body part (upper/lower) with collapsible sections, joint position and velocity limits shown, interactive Plotly plots (zoom, pan, hover)
Analyzing Trajectories (Python/Jupyter)#
The plotting API works with trajectory data from both evaluation paths, since they share the same Parquet schema and metadata format.
import sys
sys.path.insert(0, "agile/algorithms/evaluation")
from plotting import load_episode, load_metadata, plot_joint_trajectories
import matplotlib.pyplot as plt
# Works with either Isaac Lab or Sim2MuJoCo output directories
metadata = load_metadata("logs/rsl_rl/experiment")
df = load_episode("logs/rsl_rl/experiment", episode_id=0)
fig, axes = plot_joint_trajectories(
df,
joint_names=['left_hip_yaw_joint', 'right_knee_joint'],
metadata=metadata,
show_limits=True,
)
plt.show()
Evaluation Framework Internals#
The evaluation framework lives in agile/algorithms/evaluation/.
PolicyEvaluator#
The main evaluation class (evaluator.py) that collects trajectory data from policy rollouts and computes metrics:
Requires an
evalobservation group providing joint positions, velocities, accelerations, root state, commands, and actionsHandles terminal state observations correctly by using previous-frame data for terminated environments
Supports configurable joint groups for per-body-part metrics
Optionally saves trajectory data to Parquet files for offline analysis
MotionMetricsAnalyzer#
Computes and aggregates motion quality metrics (motion_metrics_analyzer.py):
Mean/max joint acceleration: Smoothness indicator (lower is better)
Mean/max acceleration rate (jerk): Jerkiness indicator
Mean/max joint velocity: Activity level
All metrics computed for whole body and per joint group
Separate statistics for all episodes vs. successful episodes only
Results saved as JSON with grouped metrics
TrajectoryReportGenerator#
Generates interactive HTML reports from saved trajectory data (report_generator.py):
Uses Plotly for interactive, zoomable plots
Supports filtering by success/failure status
Works standalone without Isaac Sim (only requires pandas, plotly, jinja2)