SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1460114650 of 474278 papers

TitleStatusHype
OSF: On Pre-training and Scaling of Sleep Foundation Models1
Revisiting Text Ranking in Deep Research1
Large Multimodal Models as General In-Context Classifiers1
Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models1
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding1
Rethinking Global Text Conditioning in Diffusion Transformers1
Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation1
Segment Any Events with Language1
Data Darwinism Part I: Unlocking the Value of Scientific Data for Pre-training1
Benchmarking Vision-Language Models for French PDF-to-Markdown Conversion1
See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning1
SeaCache: Spectral-Evolution-Aware Cache for Accelerating Diffusion Models1
Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer1
Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching1
Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space1
GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing1
SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios1
Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training1
VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL1
Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training1
Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking1
Imagination Helps Visual Reasoning, But Not Yet in Latent Space1
Reward Prediction with Factorized World States1
Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs1
Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL1
Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model1
Stereo World Model: Camera-Guided Stereo Video Generation1
π-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs1
Quantile Advantage Estimation: Stabilizing RLVR for LLM Reasoning1
SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs1
GOT-Edit: Geometry-Aware Generic Object Tracking via Online Model Editing1
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment1
ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents1
AgentOCR: Reimagining Agent History via Optical Self-Compression1
Can Language Models Discover Scaling Laws?1
Adam Improves Muon: Adaptive Moment Estimation with Orthogonalized Momentum1
InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning1
LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs1
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale1
CASA: Cross-Attention over Self-Attention for Efficient Vision-Language Fusion1
Mono4DGS-HDR: High Dynamic Range 4D Gaussian Splatting from Alternating-exposure Monocular Videos1
M^3: Dense Matching Meets Multi-View Foundation Models for Monocular Gaussian Splatting SLAM1
BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning1
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections1
Self-Improving World Modelling with Latent Actions1
MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal Scaling of Diffusion Language Models1
ReLoop: Structured Modeling and Behavioral Verification for Reliable LLM-Based Optimization1
HiconAgent: History Context-aware Policy Optimization for GUI Agents1
Revisiting the Platonic Representation Hypothesis: An Aristotelian View1
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise1
Show:102550
← PrevPage 293 of 9486Next →