The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 14701–14750 of 474278 papers

Title	Date	Status	Hype
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning	Jan 27, 2026	—Unverified	1
Vector Quantization using Gaussian Variational Autoencoder	Feb 5, 2026	—Unverified	1
T-REGS: Minimum Spanning Tree Regularization for Self-Supervised Learning	Feb 6, 2026	—Unverified	1
daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently	Feb 4, 2026	—Unverified	1
SWE-Exp: Experience-Driven Software Issue Resolution	Feb 2, 2026	—Unverified	1
LIVE: Long-horizon Interactive Video World Modeling	Feb 3, 2026	—Unverified	1
ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation	Jan 29, 2026	—Unverified	1
OpenAutoNLU: Open Source AutoML Library for NLU	Mar 2, 2026	—Unverified	1
Contact-Anchored Policies: Contact Conditioning Creates Strong Robot Utility Models	Feb 9, 2026	—Unverified	1
m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models	Feb 18, 2026	—Unverified	1
Evaluating and Steering Modality Preferences in Multimodal Large Language Model	Feb 4, 2026	—Unverified	1
How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image Editing	Feb 2, 2026	—Unverified	1
V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration	Mar 13, 2026	—Unverified	1
Matryoshka Gaussian Splatting	Mar 19, 2026	—Unverified	1
LangMap: A Hierarchical Benchmark for Open-Vocabulary Goal Navigation	Feb 2, 2026	—Unverified	1
Mano: Restriking Manifold Optimization for LLM Training	Jan 30, 2026	—Unverified	1
CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges	Mar 12, 2026	—Unverified	1
When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs	Feb 4, 2026	—Unverified	1
Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following	Mar 12, 2026	—Unverified	1
CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty	Jan 29, 2026	—Unverified	1
Scaling Behavior of Discrete Diffusion Language Models	Feb 15, 2026	—Unverified	1
MARS: Modular Agent with Reflective Search for Automated AI Research	Feb 17, 2026	—Unverified	1
RISE-Video: Can Video Generators Decode Implicit World Rules?	Feb 5, 2026	—Unverified	1
Thinking with Comics: Enhancing Multimodal Reasoning through Structured Visual Storytelling	Feb 3, 2026	—Unverified	1
Which Heads Matter for Reasoning? RL-Guided KV Cache Compression	Jan 30, 2026	—Unverified	1
PaperArena: An Evaluation Benchmark for Tool-Augmented Agentic Reasoning on Scientific Literature	Jan 30, 2026	—Unverified	1
ObjEmbed: Towards Universal Multimodal Object Embeddings	Feb 3, 2026	—Unverified	1
ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models	Mar 19, 2026	—Unverified	1
DeepResearch Bench II: Diagnosing Deep Research Agents via Rubrics from Expert Report	Jan 30, 2026	—Unverified	1
TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents	Feb 3, 2026	—Unverified	1
DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset	Jan 30, 2026	—Unverified	1
Show, Don't Tell: Morphing Latent Reasoning into Image Generation	Feb 2, 2026	—Unverified	1
Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts	Jan 23, 2026	—Unverified	1
LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs	Mar 19, 2026	—Unverified	1
AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models	Mar 1, 2026	—Unverified	1
FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use	Mar 9, 2026	—Unverified	1
Learning Self-Correction in Vision-Language Models via Rollout Augmentation	Feb 9, 2026	—Unverified	1
How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition	Mar 16, 2026	—Unverified	1
Image Generation with a Sphere Encoder	Feb 16, 2026	—Unverified	1
Can Vision-Language Models Solve the Shell Game?	Mar 9, 2026	—Unverified	1
Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening	Feb 6, 2026	—Unverified	1
LLM Probability Concentration: How Alignment Shrinks the Generative Horizon	Mar 2, 2026	—Unverified	1
Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs	Feb 17, 2026	—Unverified	1
Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration	Feb 25, 2026	—Unverified	1
WildOS: Open-Vocabulary Object Search in the Wild	Feb 22, 2026	—Unverified	1
Chain of World: World Model Thinking in Latent Motion	Mar 3, 2026	—Unverified	1
ContextBench: A Benchmark for Context Retrieval in Coding Agents	Feb 11, 2026	—Unverified	1
Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing	Feb 10, 2026	—Unverified	1
Mamba-FCS: Joint Spatio- Frequency Feature Fusion, Change-Guided Attention, and SeK Loss for Enhanced Semantic Change Detection in Remote Sensing	Feb 6, 2026	—Unverified	1
LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning	Mar 13, 2026	—Unverified	1