SOTAVerified

Robot Manipulation

Papers

Showing 150 of 430 papers

TitleStatusHype
OpenVLA: An Open-Source Vision-Language-Action ModelCode9
On the Vulnerability of LLM/VLM-Controlled RoboticsCode7
3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D RepresentationsCode5
Magma: A Foundation Model for Multimodal AI AgentsCode5
Evaluating Real-World Robot Manipulation Policies in SimulationCode5
UniVLA: Learning to Act Anywhere with Task-centric Latent ActionsCode5
Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action ModelsCode3
3D Diffuser Actor: Policy Diffusion with 3D Scene RepresentationsCode3
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual RepresentationsCode3
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot LearningCode3
PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from VideosCode3
3D Diffuser Actor: Policy Diffusion with 3D Scene RepresentationsCode3
RVT-2: Learning Precise Manipulation from Few DemonstrationsCode3
RLVR-World: Training World Models with Reinforcement LearningCode3
Latent Action Pretraining from VideosCode3
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object ManipulationCode3
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic ManipulationCode3
RT-1: Robotics Transformer for Real-World Control at ScaleCode3
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World KnowledgeCode3
Affordance-based Robot Manipulation with Flow MatchingCode3
GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal-Conditioned PolicyCode2
VIMA: General Robot Manipulation with Multimodal PromptsCode2
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language ModelsCode2
FurnitureBench: Reproducible Real-World Benchmark for Long-Horizon Complex ManipulationCode2
Unleashing Large-Scale Video Generative Pre-training for Visual Robot ManipulationCode2
Generative Image as Action ModelsCode2
Streaming Diffusion Policy: Fast Policy Synthesis with Variable Noise Diffusion ModelsCode2
Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D PolicyCode2
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial ReasoningCode2
What Matters in Learning from Offline Human Demonstrations for Robot ManipulationCode2
Equivariant Diffusion PolicyCode2
RVT: Robotic View Transformer for 3D Object ManipulationCode2
Robot Trajectron: Trajectory Prediction-based Shared Control for Robot ManipulationCode2
Autoregressive Action Sequence Learning for Robotic ManipulationCode2
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic ControlCode2
SE(3)-DiffusionFields: Learning smooth cost functions for joint grasp and motion optimization through diffusionCode2
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot ExecutionCode2
AutoEval: Autonomous Evaluation of Generalist Robot Manipulation Policies in the Real WorldCode2
Perceiver-Actor: A Multi-Task Transformer for Robotic ManipulationCode2
Act3D: 3D Feature Field Transformers for Multi-Task Robotic ManipulationCode2
R3M: A Universal Visual Representation for Robot ManipulationCode2
RoboUniView: Visual-Language Model with Unified View Representation for Robotic ManipulationCode2
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action PolicyCode2
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from VideosCode2
An Embodied Generalist Agent in 3D WorldCode2
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask LearningCode2
CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation LearningCode1
ABNet: Attention BarrierNet for Safe and Scalable Robot LearningCode1
CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation TasksCode1
Goal-Conditioned Imitation Learning using Score-based Diffusion PoliciesCode1
Show:102550
← PrevPage 1 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1DreamVLAavg. sequence length (D to D)4.44Unverified
2VPPavg. sequence length (D to D)4.29Unverified
3RoboVLMsavg. sequence length (D to D)4.25Unverified
4Openhelixavg. sequence length (D to D)4.08Unverified
5UP-VLAavg. sequence length (D to D)4.08Unverified
6GR-MGavg. sequence length (D to D)4.04Unverified
7MoDEavg. sequence length (D to D)4.01Unverified
8RoboUniViewavg. sequence length (D to D)3.86Unverified
9UniVLAavg. sequence length (D to D)3.8Unverified
10RoboDualavg. sequence length (D to D)3.66Unverified
#ModelMetricClaimedVerifiedStatus
1EquActSucc. Rate (18 tasks, 100 demo/task)89.4Unverified
2SAM2ActSucc. Rate (18 tasks, 100 demo/task)86.8Unverified
3ARP+Succ. Rate (18 tasks, 100 demo/task)84.9Unverified
43D-LOTUSSucc. Rate (18 tasks, 100 demo/task)83.1Unverified
5RVT-2Succ. Rate (18 tasks, 100 demo/task)81.4Unverified
63D Diffuser ActorSucc. Rate (18 tasks, 100 demo/task)81.3Unverified
7Mini DiffuserSucc. Rate (18 tasks, 100 demo/task)77.6Unverified
8SAM-ESucc. Rate (18 tasks, 100 demo/task)70.6Unverified
9Auto-λSucc. Rate (10 tasks, 100 demos/task)69.3Unverified
10Act3DSucc. Rate (18 tasks, 100 demo/task)65Unverified
#ModelMetricClaimedVerifiedStatus
1SoFarVisual Matching0.75Unverified
2SpatialVLAVisual Matching0.72Unverified
3Dita-300MVisual Matching0.69Unverified
4RT-2-XVisual Matching0.61Unverified
5RoboVLMVisual Matching0.56Unverified
6RT-1-XVisual Matching0.53Unverified
7TraceVLAVisual Matching0.46Unverified
8OpenVLAVisual Matching0.28Unverified
9Octo-BaseVisual Matching0.17Unverified
#ModelMetricClaimedVerifiedStatus
1SDPSucc. Rate (12 tasks, 100 demo/task)76Unverified
2EquiDiff (Voxel)Succ. Rate (12 tasks, 100 demo/task)63.9Unverified
3EquiDiff (Image)Succ. Rate (12 tasks, 100 demo/task)53.7Unverified
4DP (Evaluated in EquiDiff)Succ. Rate (12 tasks, 100 demo/task)42Unverified
5DP3 (Evaluated in EquiDiff)Succ. Rate (12 tasks, 100 demo/task)23.9Unverified
6BC RNN (Evaluated in EquiDiff)Succ. Rate (12 tasks, 100 demo/task)22.9Unverified
7ACT (Evaluated in EquiDiff)Succ. Rate (12 tasks, 100 demo/task)21.3Unverified
#ModelMetricClaimedVerifiedStatus
1SoFarAverage0.58Unverified
2SpatialVLAAverage0.34Unverified
3Octo-SmallAverage0.3Unverified
4Octo-BaseAverage0.16Unverified
5RoboVLMAverage0.14Unverified
6RT-1-XAverage0.01Unverified
7OpenVLAAverage0.01Unverified