Video to Data — NVIDIA Isaac

Pipeline

From human video to robot data

Raw human video becomes video segments, reconstructed trajectories, simulation assets, grounded policies, and robot data.

Video to Data pipeline overview — from human demonstration video through ingestion, reconstruction, human-object trajectory and simulation environments, robotic grounding, and data augmentation in Isaac Lab to a physics-grounded dataset, foundation models, and real-robot deployment

Video Ingestion Agent

LangGraph workflow that segments demos into action clips, extracts an entity-relation scene graph, and stores SigLIP-2 frame embeddings.

video_ingestion_agent/

Reconstruction

Containerized vision modules turn selected clips into per-frame depth, masks, textured meshes, 6-DoF object poses, and parametric human hand and body models.

reconstruction/

Robotic Grounding

Retarget human motion onto the target embodiment, then drive Isaac Lab environments trained with RL to produce deployable policies.

robotic_grounding/

Demos

See it run, stage by stage

Raw human demonstration clip — **Raw human demonstration**

**Grounded robot policies in Isaac Lab**

Packages

Explore the toolkit

video_ingestion_agent Docs available

Video → action segments, an entity scene graph, and frame embeddings. LangGraph pipeline plus an EGAgent-style natural-language retrieval agent and an optional Gradio UI.

Open docs →

reconstruction Docs available

Video → depth, masks, object meshes, 6D pose trajectories, and human body mesh and motion.

Open docs →

robotic_grounding Docs coming soon

Motion retargeting and RL training in Isaac Lab with contact-wrench guidance from human demonstrations.

GitHub → Webpage → Tech Report →

Get started

Clone the repo, follow the quickstart, and turn your first demonstration video into training data.

Read the quickstart Browse the repository

Video → Data