The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 14751–14800 of 474278 papers

Title	Date	Status	Hype
DeepSight: An All-in-One LM Safety Toolkit	Feb 12, 2026	—Unverified	1
Learning to Configure Agentic AI Systems	Feb 12, 2026	—Unverified	1
TADA! Tuning Audio Diffusion Models through Activation Steering	Feb 12, 2026	—Unverified	1
GameDevBench: Evaluating Agentic Capabilities Through Game Development	Feb 11, 2026	—Unverified	1
FeatureBench: Benchmarking Agentic Coding for Complex Feature Development	Feb 11, 2026	—Unverified	1
Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation	Feb 11, 2026	—Unverified	1
ContextBench: A Benchmark for Context Retrieval in Coding Agents	Feb 11, 2026	—Unverified	1
P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads	Feb 10, 2026	—Unverified	1
Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing	Feb 10, 2026	—Unverified	1
Free(): Learning to Forget in Malloc-Only Reasoning Models	Feb 10, 2026	—Unverified	1
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning	Feb 10, 2026	—Unverified	1
When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models	Feb 10, 2026	—Unverified	1
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization	Feb 10, 2026	—Unverified	1
Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding	Feb 10, 2026	—Unverified	1
MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning	Feb 10, 2026	—Unverified	1
Prism: Spectral-Aware Block-Sparse Attention	Feb 9, 2026	—Unverified	1
Fast KVzip: Efficient and Accurate LLM Inference with Gated KV Eviction	Feb 9, 2026	—Unverified	1
Contact-Anchored Policies: Contact Conditioning Creates Strong Robot Utility Models	Feb 9, 2026	—Unverified	1
Data Science and Technology Towards AGI Part I: Tiered Data Management	Feb 9, 2026	—Unverified	1
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards	Feb 9, 2026	—Unverified	1
G-LNS: Generative Large Neighborhood Search for LLM-Based Automatic Heuristic Design	Feb 9, 2026	—Unverified	1
Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition	Feb 9, 2026	—Unverified	1
Luth: Efficient French Specialization for Small Language Models and Cross-Lingual Transfer	Feb 9, 2026	—Unverified	1
Rethinking Global Text Conditioning in Diffusion Transformers	Feb 9, 2026	—Unverified	1
Autoregressive Image Generation with Masked Bit Modeling	Feb 9, 2026	—Unverified	1
When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning	Feb 9, 2026	—Unverified	1
Learning Self-Correction in Vision-Language Models via Rollout Augmentation	Feb 9, 2026	—Unverified	1
How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs	Feb 9, 2026	—Unverified	1
Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation	Feb 9, 2026	—Unverified	1
InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning	Feb 9, 2026	—Unverified	1
The Condition Number as a Scale-Invariant Proxy for Information Encoding in Neural Units	Feb 8, 2026	—Unverified	1
Data Darwinism Part I: Unlocking the Value of Scientific Data for Pre-training	Feb 8, 2026	—Unverified	1
Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models	Feb 8, 2026	—Unverified	1
TodoEvolve: Learning to Architect Agent Planning Systems	Feb 8, 2026	—Unverified	1
Safety Alignment of LMs via Non-cooperative Games	Feb 7, 2026	—Unverified	1
PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks	Feb 6, 2026	—Unverified	1
Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening	Feb 6, 2026	—Unverified	1
T-REGS: Minimum Spanning Tree Regularization for Self-Supervised Learning	Feb 6, 2026	—Unverified	1
POINTS-GUI-G: GUI-Grounding Journey	Feb 6, 2026	—Unverified	1
Mamba-FCS: Joint Spatio- Frequency Feature Fusion, Change-Guided Attention, and SeK Loss for Enhanced Semantic Change Detection in Remote Sensing	Feb 6, 2026	—Unverified	1
Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory	Feb 5, 2026	—Unverified	1
SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs	Feb 5, 2026	—Unverified	1
Vector Quantization using Gaussian Variational Autoencoder	Feb 5, 2026	—Unverified	1
RISE-Video: Can Video Generators Decode Implicit World Rules?	Feb 5, 2026	—Unverified	1
See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning	Feb 5, 2026	—Unverified	1
EgoAVU: Egocentric Audio-Visual Understanding	Feb 5, 2026	—Unverified	1
Horizon-LM: A RAM-Centric Architecture for LLM Training	Feb 5, 2026	—Unverified	1
EvasionBench: A Large-Scale Benchmark for Detecting Managerial Evasion in Earnings Call Q&A	Feb 4, 2026	—Unverified	1
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models	Feb 4, 2026	—Unverified	1
SynthVerse: A Large-Scale Diverse Synthetic Dataset for Point Tracking	Feb 4, 2026	—Unverified	1