SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1465114700 of 474278 papers

TitleStatusHype
MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier1
ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors1
BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing?1
The Geometry of Reasoning: Flowing Logics in Representation Space1
RubricBench: Aligning Model-Generated Rubrics with Human Standards1
The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward1
DREAM: Where Visual Understanding Meets Text-to-Image Generation1
NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing1
Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs1
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models1
Next Embedding Prediction Makes World Models Stronger1
Chain of World: World Model Thinking in Latent Motion1
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?1
MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning1
LLM Probability Concentration: How Alignment Shrinks the Generative Horizon1
UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos1
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios1
ORCA: Orchestrated Reasoning with Collaborative Agents for Document Visual Question Answering1
Steering Evaluation-Aware Language Models to Act Like They Are Deployed1
OpenAutoNLU: Open Source AutoML Library for NLU1
CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization1
Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training1
Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training1
Tracking Capabilities for Safer Agents1
Mono4DGS-HDR: High Dynamic Range 4D Gaussian Splatting from Alternating-exposure Monocular Videos1
AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models1
Next Visual Granularity Generation1
Quantile Advantage Estimation: Stabilizing RLVR for LLM Reasoning1
AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition1
AgentOCR: Reimagining Agent History via Optical Self-Compression1
SleepLM: Natural-Language Intelligence for Human Sleep1
OSF: On Pre-training and Scaling of Sleep Foundation Models1
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale1
QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models1
Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks1
MediX-R1: Open Ended Medical Reinforcement Learning1
General Agent Evaluation1
SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation1
TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering1
Imagination Helps Visual Reasoning, But Not Yet in Latent Space1
Large Multimodal Models as General In-Context Classifiers1
Retrieve and Segment: Are a Few Examples Enough to Bridge the Supervision Gap in Open-Vocabulary Segmentation?1
GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL1
V-Retrver: Evidence-Driven Agentic Reasoning for Universal Multimodal Retrieval1
Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark1
Enhancing Multi-Image Understanding through Delimiter Token Scaling1
Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration1
Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets1
LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs1
Revisiting Text Ranking in Deep Research1
Show:102550
← PrevPage 294 of 9486Next →