The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4901–4950 of 661570 papers

Title	Date	Status	Hype
Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLMs	Jan 27, 2026	—Unverified	2
OmniGAIA: Towards Native Omni-Modal AI Agents	Feb 28, 2026	—Unverified	2
DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories	Feb 11, 2026	—Unverified	2
CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering	Feb 28, 2026	—Unverified	2
StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets?	Mar 2, 2026	—Unverified	2
Deforming Videos to Masks: Flow Matching for Referring Video Segmentation	Feb 26, 2026	—Unverified	2
AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories	Feb 16, 2026	—Unverified	2
Enhancing Spatial Understanding in Image Generation via Reward Modeling	Feb 27, 2026	—Unverified	2
EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing	Mar 19, 2026	—Unverified	2
Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion	Mar 18, 2026	—Unverified	2
Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator	Mar 5, 2026	—Unverified	2
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding	Mar 5, 2026	—Unverified	2
Latent Denoising Makes Good Tokenizers	Feb 14, 2026	—Unverified	2
VLANeXt: Recipes for Building Strong VLA Models	Feb 20, 2026	—Unverified	2
NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents	Feb 24, 2026	—Unverified	2
Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation	Feb 2, 2026	—Unverified	2
WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories	Mar 2, 2026	—Unverified	2
Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?	Feb 28, 2026	—Unverified	2
How to Correctly Report LLM-as-a-Judge Evaluations	Feb 9, 2026	—Unverified	2
The Trinity of Consistency as a Defining Principle for General World Models	Feb 26, 2026	—Unverified	2
Kanade: A Simple Disentangled Tokenizer for Spoken Language Modeling	Jan 31, 2026	—Unverified	2
XSkill: Continual Learning from Experience and Skills in Multimodal Agents	Mar 13, 2026	—Unverified	2
OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot	Feb 23, 2026	—Unverified	2
Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention	Mar 11, 2026	—Unverified	2
UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation	Feb 24, 2026	—Unverified	2
Unified Multimodal Models as Auto-Encoders	Feb 26, 2026	—Unverified	2
OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams	Mar 12, 2026	—Unverified	2
Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding	Jan 28, 2026	—Unverified	2
Streaming Autoregressive Video Generation via Diagonal Distillation	Mar 11, 2026	—Unverified	2
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs	Mar 5, 2026	—Unverified	2
Experiential Reinforcement Learning	Feb 15, 2026	—Unverified	2
SimVLA: A Simple VLA Baseline for Robotic Manipulation	Feb 20, 2026	—Unverified	2
InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation	Mar 3, 2026	—Unverified	2
SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation	Mar 13, 2026	—Unverified	2
Efficient Reasoning with Balanced Thinking	Mar 19, 2026	—Unverified	2
Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs	Feb 12, 2026	—Unverified	2
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion	Mar 6, 2026	—Unverified	2
MolmoSpaces: A Large-Scale Open Ecosystem for Robot Navigation and Manipulation	Feb 19, 2026	—Unverified	2
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings	Mar 13, 2026	—Unverified	2
Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing	Jan 28, 2026	—Unverified	2
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation	Mar 5, 2026	—Unverified	2
RealWonder: Real-Time Physical Action-Conditioned Video Generation	Mar 5, 2026	—Unverified	2
X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests	Feb 1, 2026	—Unverified	2
From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors	Feb 27, 2026	—Unverified	2
Towards Pixel-Level VLM Perception via Simple Points Prediction	Jan 27, 2026	—Unverified	2
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs	Mar 13, 2026	—Unverified	2
Learning a Generative Meta-Model of LLM Activations	Feb 6, 2026	—Unverified	2
Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?	Feb 4, 2026	—Unverified	2
ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation	Mar 12, 2026	—Unverified	2
EEG Foundation Models: Progresses, Benchmarking, and Open Problems	Feb 5, 2026	—Unverified	2