The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2351–2400 of 659983 papers

Title	Date	Status	Hype
HY-WU (Part I): An Extensible Functional Neural Memory Framework and An Instantiation in Text-Guided Image Editing	Mar 7, 2026	—Unverified	3
LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory	Mar 3, 2026	—Unverified	3
Human3R: Everyone Everywhere All at Once	Mar 3, 2026	—Unverified	3
Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing	Mar 3, 2026	—Unverified	3
tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction	Mar 2, 2026	—Unverified	3
EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering	Mar 2, 2026	—Unverified	3
FireRed-OCR Technical Report	Mar 2, 2026	—Unverified	3
Latent Diffusion Model without Variational Autoencoder	Mar 2, 2026	—Unverified	3
RLP: Reinforcement as a Pretraining Objective	Mar 1, 2026	—Unverified	3
Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision	Mar 1, 2026	—Unverified	3
GEM: A Gym for Agentic LLMs	Mar 1, 2026	—Unverified	3
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence	Feb 26, 2026	—Unverified	3
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution	Feb 26, 2026	—Unverified	3
Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding	Feb 26, 2026	—Unverified	3
EO-1: An Open Unified Embodied Foundation Model for General Robot Control	Feb 25, 2026	—Unverified	3
Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering	Feb 25, 2026	—Unverified	3
A Survey of Data Agents: Emerging Paradigm or Overstated Hype?	Feb 24, 2026	—Unverified	3
Much Ado About Noising: Dispelling the Myths of Generative Robotic Control	Feb 23, 2026	—Unverified	3
JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation	Feb 22, 2026	—Unverified	3
pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation	Feb 19, 2026	—Unverified	3
PartUV: Part-Based UV Unwrapping of 3D Meshes	Feb 17, 2026	—Unverified	3
AnyUp: Universal Feature Upsampling	Feb 16, 2026	—Unverified	3
LLaDA2.1: Speeding Up Text Diffusion via Token Editing	Feb 13, 2026	—Unverified	3
DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing	Feb 13, 2026	—Unverified	3
Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution	Feb 13, 2026	—Unverified	3
LLM-in-Sandbox Elicits General Agentic Intelligence	Feb 12, 2026	—Unverified	3
DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation	Feb 12, 2026	—Unverified	3
SceneSmith: Agentic Generation of Simulation-Ready Indoor Scenes	Feb 9, 2026	—Unverified	3
Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making	Feb 6, 2026	—Unverified	3
Yunjue Agent Tech Report: A Fully Reproducible, Zero-Start In-Situ Self-Evolving Agent System for Open-Ended Tasks	Feb 6, 2026	—Unverified	3
Simulating the Visual World with Artificial Intelligence: A Roadmap	Feb 5, 2026	—Unverified	3
Scaling Multiagent Systems with Process Rewards	Feb 4, 2026	—Unverified	3
SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents	Feb 4, 2026	—Unverified	3
HY3D-Bench: Generation of 3D Assets	Feb 3, 2026	—Unverified	3
CL-bench: A Benchmark for Context Learning	Feb 3, 2026	—Unverified	3
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents	Feb 2, 2026	—Unverified	3
Making Avatars Interact: Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars	Feb 2, 2026	—Unverified	3
EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling	Feb 1, 2026	—Unverified	3
A Survey of Token Compression for Efficient Multimodal Large Language Models	Feb 1, 2026	—Unverified	3
LongCat-Flash-Thinking-2601 Technical Report	Feb 1, 2026	—Unverified	3
MetricAnything: Scaling Metric Depth Pretraining with Noisy Heterogeneous Sources	Jan 29, 2026	—Unverified	3
Deep Delta Learning	Jan 29, 2026	—Unverified	3
JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion	Jan 29, 2026	—Unverified	3
DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion	Jan 29, 2026	—Unverified	3
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows	Jan 28, 2026	—Unverified	3
Self-Distillation Enables Continual Learning	Jan 27, 2026	—Unverified	3
Geometry-Grounded Gaussian Splatting	Jan 27, 2026	—Unverified	3
VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency	Jan 26, 2026	—Unverified	3
AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security	Jan 26, 2026	—Unverified	3
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience	Jan 23, 2026	—Unverified	3