SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 51265150 of 661570 papers

TitleStatusHype
A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI AutonomyCode2
Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math ReasoningCode2
UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian SplattingCode2
CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video ModelsCode2
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical ReasoningCode2
TaskCraft: Automated Generation of Agentic TasksCode2
Tightly-Coupled LiDAR-IMU-Leg Odometry with Online Learned Leg Kinematics Incorporating Foot Tactile InformationCode2
ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm EngineeringCode2
Do MIL Models Transfer?Code2
Segment This Thing: Foveated Tokenization for Efficient Point-Prompted SegmentationCode2
FedRAG: A Framework for Fine-Tuning Retrieval-Augmented Generation SystemsCode2
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand BetterCode2
StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video StreamsCode2
Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement LearningCode2
Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usabilityCode2
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model EvaluationCode2
SeerAttention-R: Sparse Attention Adaptation for Long ReasoningCode2
AbstentionBench: Reasoning LLMs Fail on Unanswerable QuestionsCode2
Solving the Job Shop Scheduling Problem with Graph Neural Networks: A Customizable Reinforcement Learning EnvironmentCode2
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image GenerationCode2
Speedy Deformable 3D Gaussian Splatting: Fast Rendering and Compression of Dynamic ScenesCode2
Play to Generalize: Learning to Reason Through Game PlayCode2
Snap-and-tune: combining deep learning and test-time optimization for high-fidelity cardiovascular volumetric meshingCode2
BitVLA: 1-bit Vision-Language-Action Models for Robotics ManipulationCode2
FunDiff: Diffusion Models over Function Spaces for Physics-Informed Generative ModelingCode2
Show:102550
← PrevPage 206 of 26463Next →