The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 14651–14700 of 474278 papers

Title	Date	Status	Hype
TAPFormer: Robust Arbitrary Point Tracking via Transient Asynchronous Fusion of Frames and Events	Mar 8, 2026	—Unverified	1
QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models	Feb 27, 2026	—Unverified	1
VLS: Steering Pretrained Robot Policies via Vision-Language Models	Feb 3, 2026	—Unverified	1
Privileged Information Distillation for Language Models	Feb 16, 2026	—Unverified	1
Learning Self-Correction in Vision-Language Models via Rollout Augmentation	Feb 9, 2026	—Unverified	1
Large Multimodal Models as General In-Context Classifiers	Feb 26, 2026	—Unverified	1
Coarse-Guided Visual Generation via Weighted h-Transform Sampling	Mar 12, 2026	—Unverified	1
HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions	Mar 16, 2026	—Unverified	1
DREAM: Where Visual Understanding Meets Text-to-Image Generation	Mar 3, 2026	—Unverified	1
How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image Editing	Feb 2, 2026	—Unverified	1
AlphaApollo: A System for Deep Agentic Reasoning	Mar 10, 2026	—Unverified	1
SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents	Feb 13, 2026	—Unverified	1
Nacrith: Neural Lossless Compression via Ensemble Context Modeling and High-Precision CDF Coding	Feb 24, 2026	—Unverified	1
Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models	Mar 16, 2026	—Unverified	1
ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL Problems	Mar 4, 2026	—Unverified	1
Stereo World Model: Camera-Guided Stereo Video Generation	Mar 18, 2026	—Unverified	1
MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants	Mar 16, 2026	—Unverified	1
MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning	Mar 2, 2026	—Unverified	1
SK-Adapter: Skeleton-Based Structural Control for Native 3D Generation	Mar 14, 2026	—Unverified	1
Rethinking Selective Knowledge Distillation	Feb 1, 2026	—Unverified	1
Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry	Jan 30, 2026	—Unverified	1
Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning	Feb 12, 2026	—Unverified	1
Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models	Mar 18, 2026	—Unverified	1
LatentMem: Customizing Latent Memory for Multi-Agent Systems	Mar 9, 2026	—Unverified	1
Mano: Restriking Manifold Optimization for LLM Training	Jan 30, 2026	—Unverified	1
Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing	Feb 10, 2026	—Unverified	1
Demystifing Video Reasoning	Mar 17, 2026	—Unverified	1
Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs	Feb 18, 2026	—Unverified	1
MediX-R1: Open Ended Medical Reinforcement Learning	Feb 26, 2026	—Unverified	1
Show, Don't Tell: Morphing Latent Reasoning into Image Generation	Feb 2, 2026	—Unverified	1
Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets	Feb 25, 2026	—Unverified	1
Safety Alignment of LMs via Non-cooperative Games	Feb 7, 2026	—Unverified	1
Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening	Feb 6, 2026	—Unverified	1
Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data	Feb 24, 2026	—Unverified	1
Same or Not? Enhancing Visual Perception in Vision-Language Models	Feb 4, 2026	—Unverified	1
MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal Scaling of Diffusion Language Models	Mar 17, 2026	—Unverified	1
MedCLIPSeg: Probabilistic Vision-Language Adaptation for Data-Efficient and Generalizable Medical Image Segmentation	Feb 23, 2026	—Unverified	1
Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models	Feb 8, 2026	—Unverified	1
One Model, Many Budgets: Elastic Latent Interfaces for Diffusion Transformers	Mar 12, 2026	—Unverified	1
Reinforced Fast Weights with Next-Sequence Prediction	Feb 18, 2026	—Unverified	1
VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining	Mar 19, 2026	—Unverified	1
InnoEval: On Research Idea Evaluation as a Knowledge-Grounded, Multi-Perspective Reasoning Problem	Feb 16, 2026	—Unverified	1
PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction	Mar 6, 2026	—Unverified	1
AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models	Mar 1, 2026	—Unverified	1
General Agent Evaluation	Feb 26, 2026	—Unverified	1
OpenDecoder: Open Large Language Model Decoding to Incorporate Document Quality in RAG	Jan 24, 2026	—Unverified	1
Glance and Focus Reinforcement for Pan-cancer Screening	Feb 2, 2026	—Unverified	1
Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following	Mar 12, 2026	—Unverified	1
AgentLongBench: A Controllable Long Benchmark For Long-Contexts Agents via Environment Rollouts	Jan 30, 2026	—Unverified	1
Embed-RL: Reinforcement Learning for Reasoning-Driven Multimodal Embeddings	Mar 12, 2026	—Unverified	1