The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 14651–14700 of 474278 papers

Title	Date	Status	Hype
MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier	Mar 4, 2026	—Unverified	1
ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors	Mar 4, 2026	—Unverified	1
BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing?	Mar 3, 2026	—Unverified	1
The Geometry of Reasoning: Flowing Logics in Representation Space	Mar 3, 2026	—Unverified	1
RubricBench: Aligning Model-Generated Rubrics with Human Standards	Mar 3, 2026	—Unverified	1
The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward	Mar 3, 2026	—Unverified	1
DREAM: Where Visual Understanding Meets Text-to-Image Generation	Mar 3, 2026	—Unverified	1
NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing	Mar 3, 2026	—Unverified	1
Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs	Mar 3, 2026	—Unverified	1
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models	Mar 3, 2026	—Unverified	1
Next Embedding Prediction Makes World Models Stronger	Mar 3, 2026	—Unverified	1
Chain of World: World Model Thinking in Latent Motion	Mar 3, 2026	—Unverified	1
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?	Mar 3, 2026	—Unverified	1
MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning	Mar 2, 2026	—Unverified	1
LLM Probability Concentration: How Alignment Shrinks the Generative Horizon	Mar 2, 2026	—Unverified	1
UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos	Mar 2, 2026	—Unverified	1
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios	Mar 2, 2026	—Unverified	1
ORCA: Orchestrated Reasoning with Collaborative Agents for Document Visual Question Answering	Mar 2, 2026	—Unverified	1
Steering Evaluation-Aware Language Models to Act Like They Are Deployed	Mar 2, 2026	—Unverified	1
OpenAutoNLU: Open Source AutoML Library for NLU	Mar 2, 2026	—Unverified	1
CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization	Mar 2, 2026	—Unverified	1
Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training	Mar 2, 2026	—Unverified	1
Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training	Mar 2, 2026	—Unverified	1
Tracking Capabilities for Safer Agents	Mar 1, 2026	—Unverified	1
Mono4DGS-HDR: High Dynamic Range 4D Gaussian Splatting from Alternating-exposure Monocular Videos	Mar 1, 2026	—Unverified	1
AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models	Mar 1, 2026	—Unverified	1
Next Visual Granularity Generation	Feb 28, 2026	—Unverified	1
Quantile Advantage Estimation: Stabilizing RLVR for LLM Reasoning	Feb 28, 2026	—Unverified	1
AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition	Feb 28, 2026	—Unverified	1
AgentOCR: Reimagining Agent History via Optical Self-Compression	Feb 28, 2026	—Unverified	1
SleepLM: Natural-Language Intelligence for Human Sleep	Feb 27, 2026	—Unverified	1
OSF: On Pre-training and Scaling of Sleep Foundation Models	Feb 27, 2026	—Unverified	1
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale	Feb 27, 2026	—Unverified	1
QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models	Feb 27, 2026	—Unverified	1
Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks	Feb 27, 2026	—Unverified	1
MediX-R1: Open Ended Medical Reinforcement Learning	Feb 26, 2026	—Unverified	1
General Agent Evaluation	Feb 26, 2026	—Unverified	1
SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation	Feb 26, 2026	—Unverified	1
TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering	Feb 26, 2026	—Unverified	1
Imagination Helps Visual Reasoning, But Not Yet in Latent Space	Feb 26, 2026	—Unverified	1
Large Multimodal Models as General In-Context Classifiers	Feb 26, 2026	—Unverified	1
Retrieve and Segment: Are a Few Examples Enough to Bridge the Supervision Gap in Open-Vocabulary Segmentation?	Feb 26, 2026	—Unverified	1
GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL	Feb 25, 2026	—Unverified	1
V-Retrver: Evidence-Driven Agentic Reasoning for Universal Multimodal Retrieval	Feb 25, 2026	—Unverified	1
Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark	Feb 25, 2026	—Unverified	1
Enhancing Multi-Image Understanding through Delimiter Token Scaling	Feb 25, 2026	—Unverified	1
Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration	Feb 25, 2026	—Unverified	1
Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets	Feb 25, 2026	—Unverified	1
LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs	Feb 25, 2026	—Unverified	1
Revisiting Text Ranking in Deep Research	Feb 25, 2026	—Unverified	1