The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 9551–9575 of 474278 papers

Title	Date	Status
Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI	Sep 29, 2025	CodeCode Available
MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech	Sep 29, 2025	—Unverified
Rethinking Entropy Regularization in Large Reasoning Models	Sep 29, 2025	—Unverified
LayerD: Decomposing Raster Graphic Designs into Layers	Sep 29, 2025	—Unverified
VideoAnchor: Reinforcing Subspace-Structured Visual Cues for Coherent Visual-Spatial Reasoning	Sep 29, 2025	CodeCode Available
Who's Your Judge? On the Detectability of LLM-Generated Judgments	Sep 29, 2025	—Unverified
GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts	Sep 29, 2025	—Unverified
Rolling Forcing: Autoregressive Long Video Diffusion in Real Time	Sep 29, 2025	—Unverified
Visual Jigsaw Post-Training Improves MLLMs	Sep 29, 2025	—Unverified
A Culturally-diverse Multilingual Multimodal Video Benchmark & Model	Sep 29, 2025	—Unverified
Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution	Sep 29, 2025	—Unverified
Where LLM Agents Fail and How They can Learn From Failures	Sep 29, 2025	CodeCode Available
Sanitize Your Responses: Mitigating Privacy Leakage in Large Language Models	Sep 29, 2025	CodeCode Available
CoDiEmb: A Collaborative yet Distinct Framework for Unified Representation Learning in Information Retrieval and Semantic Textual Similarity	Sep 29, 2025	—Unverified
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing	Sep 29, 2025	—Unverified
Mitigating Hallucination in Multimodal LLMs with Layer Contrastive Decoding	Sep 29, 2025	CodeCode Available
Efficiently Attacking Memorization Scores	Sep 29, 2025	CodeCode Available
DiTraj: training-free trajectory control for video diffusion transformer	Sep 29, 2025	—Unverified
Fine-Grained Detection of Context-Grounded Hallucinations Using LLMs	Sep 29, 2025	—Unverified
Mechanisms of Matter: Language Inferential Benchmark on Physicochemical Hypothesis in Materials Synthesis	Sep 29, 2025	CodeCode Available
Interpretable 3D Neural Object Volumes for Robust Conceptual Reasoning	Sep 29, 2025	CodeCode Available
Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games	Sep 29, 2025	CodeCode Available
Reward-Agnostic Prompt Optimization for Text-to-Image Diffusion Models	Sep 29, 2025	CodeCode Available
OmniPlay: Benchmarking Omni-Modal Models on Omni-Modal Game Playing	Sep 29, 2025	CodeCode Available
Streaming Sequence-to-Sequence Learning with Delayed Streams Modeling	Sep 29, 2025	CodeCode Available