Video to Data / Isaac
NVIDIA Isaac

Video Data

An end-to-end pipeline that converts human demonstration videos into simulation-ready assets and physics-grounded robot training data.

Read the quickstart Browse the repository
Pipeline

From human video to robot data

Raw human video becomes video segments, reconstructed trajectories, simulation assets, grounded policies, and robot data.

Video to Data pipeline overview — from human demonstration video through ingestion, reconstruction, human-object trajectory and simulation environments, robotic grounding, and data augmentation in Isaac Lab to a physics-grounded dataset, foundation models, and real-robot deployment
1

Video Ingestion Agent

LangGraph workflow that segments demos into action clips, extracts an entity-relation scene graph, and stores SigLIP-2 frame embeddings.

video_ingestion_agent/
2

Reconstruction

Containerized vision modules turn selected clips into per-frame depth, masks, textured meshes, 6-DoF object poses, and parametric human hand and body models.

reconstruction/
3

Robotic Grounding

Retarget human motion onto the target embodiment, then drive Isaac Lab environments trained with RL to produce deployable policies.

robotic_grounding/
Demos

See it run, stage by stage

Raw human demonstration clip
Raw human demonstration
Grounded robot policies in Isaac Lab
Grounded robot policies in Isaac Lab
Deploy to real robot
Deploy to real robot
Packages

Explore the toolkit

video_ingestion_agent Docs available

Video → action segments, an entity scene graph, and frame embeddings. LangGraph pipeline plus an EGAgent-style natural-language retrieval agent and an optional Gradio UI.

Open docs →
reconstruction Docs available

Video → depth, masks, object meshes, 6D pose trajectories, and human body mesh and motion.

Open docs →
robotic_grounding Docs coming soon

Motion retargeting and RL training in Isaac Lab with contact-wrench guidance from human demonstrations.

Get started

Clone the repo, follow the quickstart, and turn your first demonstration video into training data.