The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 14551–14600 of 474278 papers

Title	Date	Status	Hype
PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR	Jan 26, 2026	—Unverified	1
Gabliteration: Adaptive Multi-Directional Neural Weight Modification for Selective Behavioral Alteration in Large Language Models	Jan 28, 2026	—Unverified	1
SK-Adapter: Skeleton-Based Structural Control for Native 3D Generation	Mar 14, 2026	—Unverified	1
DREAM: Where Visual Understanding Meets Text-to-Image Generation	Mar 3, 2026	—Unverified	1
Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory	Feb 5, 2026	—Unverified	1
WideSeek: Advancing Wide Research via Multi-Agent Scaling	Feb 2, 2026	—Unverified	1
Does Object Binding Naturally Emerge in Large Pretrained Vision Transformers?	Jan 21, 2026	—Unverified	1
Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-task Multi-Scale Network	Feb 3, 2026	—Unverified	1
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models	Feb 4, 2026	—Unverified	1
EvasionBench: A Large-Scale Benchmark for Detecting Managerial Evasion in Earnings Call Q&A	Feb 4, 2026	—Unverified	1
NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing	Mar 3, 2026	—Unverified	1
Free(): Learning to Forget in Malloc-Only Reasoning Models	Feb 10, 2026	—Unverified	1
Learning Goal-Oriented Vision-and-Language Navigation with Self-Improving Demonstrations at Scale	Mar 18, 2026	—Unverified	1
HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions	Mar 16, 2026	—Unverified	1
How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs	Feb 9, 2026	—Unverified	1
KLASS: KL-Guided Fast Inference in Masked Diffusion Models	Mar 5, 2026	—Unverified	1
Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models	Feb 4, 2026	—Unverified	1
Toward Complex-Valued Neural Networks for Waveform Generation	Mar 12, 2026	—Unverified	1
Tracking Capabilities for Safer Agents	Mar 1, 2026	—Unverified	1
LatentMem: Customizing Latent Memory for Multi-Agent Systems	Mar 9, 2026	—Unverified	1
MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding	Feb 23, 2026	—Unverified	1
LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference	Mar 6, 2026	—Unverified	1
Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision	Feb 13, 2026	—Unverified	1
DSGym: A Holistic Framework for Evaluating and Training Data Science Agents	Jan 22, 2026	—Unverified	1
Do Reasoning Models Enhance Embedding Models?	Jan 29, 2026	—Unverified	1
One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment	Jan 26, 2026	—Unverified	1
Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning	Mar 16, 2026	—Unverified	1
-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space	Mar 5, 2026	—Unverified	1
Towards Reason-Informed Video Editing in Unified Models with Self-Reflective Learning	Mar 15, 2026	—Unverified	1
V_1: Unifying Generation and Self-Verification for Parallel Reasoners	Mar 4, 2026	—Unverified	1
Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models	Jan 26, 2026	—Unverified	1
Retrieve and Segment: Are a Few Examples Enough to Bridge the Supervision Gap in Open-Vocabulary Segmentation?	Feb 26, 2026	—Unverified	1
Multimodal Evaluation of Russian-language Architectures	Jan 26, 2026	—Unverified	1
EditCtrl: Disentangled Local and Global Control for Real-Time Generative Video Editing	Feb 16, 2026	—Unverified	1
Infherno: End-to-end Agent-based FHIR Resource Synthesis from Free-form Clinical Notes	Mar 19, 2026	—Unverified	1
Kairos: Toward Adaptive and Parameter-Efficient Time Series Foundation Models	Feb 13, 2026	—Unverified	1
CFG-Ctrl: Control-Based Classifier-Free Diffusion Guidance	Mar 11, 2026	—Unverified	1
EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control	Jan 31, 2026	—Unverified	1
\$OneMillion-Bench: How Far are Language Agents from Human Experts?	Mar 9, 2026	—Unverified	1
Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts	Jan 29, 2026	—Unverified	1
ReLi3D: Relightable Multi-view 3D Reconstruction with Disentangled Illumination	Mar 20, 2026	—Unverified	1
TokenTrim: Inference-Time Token Pruning for Autoregressive Long Video Generation	Jan 30, 2026	—Unverified	1
Safety Alignment of LMs via Non-cooperative Games	Feb 7, 2026	—Unverified	1
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards	Feb 9, 2026	—Unverified	1
A Mechanistic View on Video Generation as World Models: State and Dynamics	Jan 22, 2026	—Unverified	1
FineRMoE: Dimension Expansion for Finer-Grained Expert with Its Upcycling Approach	Mar 9, 2026	—Unverified	1
PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction	Mar 6, 2026	—Unverified	1
DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving	Mar 7, 2026	—Unverified	1
Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition	Feb 9, 2026	—Unverified	1
RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning	Feb 21, 2026	—Unverified	1