SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1455114600 of 474278 papers

TitleStatusHype
Stereo World Model: Camera-Guided Stereo Video Generation1
Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models1
Learning Goal-Oriented Vision-and-Language Navigation with Self-Improving Demonstrations at Scale1
Learning to See and Act: Task-Aware Virtual View Exploration for Robotic Manipulation1
InCoder-32B: Code Foundation Model for Industrial Scenarios1
Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning1
Demystifing Video Reasoning1
COREA: Coupled Relightable 3D Gaussians and SDFs for Efficient Normal Alignment1
AIA: Rethinking Architecture Decoupling Strategy In Unified Multimodal Model1
Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation1
Block-Recurrent Dynamics in Vision Transformers1
MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal Scaling of Diffusion Language Models1
M^3: Dense Matching Meets Multi-View Foundation Models for Monocular Gaussian Splatting SLAM1
Semi-Autonomous Formalization of the Vlasov-Maxwell-Landau Equilibrium1
Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning1
HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions1
EndoCoT: Scaling Endogenous Chain-of-Thought Reasoning in Diffusion Models1
HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification1
MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants1
Sharing State Between Prompts and Programs1
How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition1
Panoramic Affordance Prediction1
Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models1
Towards Reason-Informed Video Editing in Unified Models with Self-Reflective Learning1
PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling1
SK-Adapter: Skeleton-Based Structural Control for Native 3D Generation1
GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent1
HEARTS: Benchmarking LLM Reasoning on Health Time Series1
Garments2Look: A Multi-Reference Dataset for High-Fidelity Outfit-Level Virtual Try-On with Clothing and Accessories1
LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning1
Language Models are Injective and Hence Invertible1
MIND-V: Hierarchical World Model for Long-Horizon Robotic Manipulation with RL-based Physical Alignment1
Visual-ERM: Reward Modeling for Visual Equivalence1
V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration1
Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation1
CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges1
GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing1
Embed-RL: Reinforcement Learning for Reasoning-Driven Multimodal Embeddings1
Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning1
RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback1
EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation1
HoneyBee: Data Recipes for Vision-Language Reasoners1
Coarse-Guided Visual Generation via Weighted h-Transform Sampling1
One Model, Many Budgets: Elastic Latent Interfaces for Diffusion Transformers1
Toward Complex-Valued Neural Networks for Waveform Generation1
Sparking Scientific Creativity via LLM-Driven Interdisciplinary Inspiration1
Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following1
FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance1
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections1
ResearchGym: Evaluating Language Model Agents on Real-World AI Research1
Show:102550
← PrevPage 292 of 9486Next →