The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 9176–9200 of 474278 papers

Title	Date	Status
Training-Free Time Series Classification via In-Context Reasoning with LLM Agents	Oct 7, 2025	CodeCode Available
VideoMiner: Iteratively Grounding Key Frames of Hour-Long Videos via Tree-based Group Relative Policy Optimization	Oct 7, 2025	CodeCode Available
Classical AI vs. LLMs for Decision-Maker Alignment in Health Insurance Choices	Oct 7, 2025	CodeCode Available
Belief-Calibrated Multi-Agent Consensus Seeking for Complex NLP Tasks	Oct 7, 2025	CodeCode Available
AutoEdit: Automatic Hyperparameter Tuning for Image Editing	Oct 7, 2025	CodeCode Available
ASPO: Asymmetric Importance Sampling Policy Optimization	Oct 7, 2025	CodeCode Available
Benchmark It Yourself (BIY): Preparing a Dataset and Benchmarking AI Models for Scatterplot-Related Tasks	Oct 7, 2025	CodeCode Available
lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models	Oct 7, 2025	CodeCode Available
LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures	Oct 7, 2025	CodeCode Available
InstaGeo: Compute-Efficient Geospatial Machine Learning from Data to Deployment	Oct 7, 2025	CodeCode Available
From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning	Oct 7, 2025	—Unverified
When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding	Oct 7, 2025	—Unverified
GeoRemover: Removing Objects and Their Causal Visual Artifacts	Oct 7, 2025	CodeCode Available
Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned	Oct 7, 2025	—Unverified
NorMuon: Making Muon more efficient and scalable	Oct 7, 2025	—Unverified
HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video	Oct 7, 2025	—Unverified
VCoT-Grasp: Grasp Foundation Models with Visual Chain-of-Thought Reasoning for Language-driven Grasp Generation	Oct 7, 2025	—Unverified
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use	Oct 7, 2025	—Unverified
Deformable Image Registration for Self-supervised Cardiac Phase Detection in Multi-View Multi-Disease Cardiac Magnetic Resonance Images	Oct 7, 2025	CodeCode Available
AgeBooth: Controllable Facial Aging and Rejuvenation via Diffusion Models	Oct 7, 2025	—Unverified
PLSemanticsBench: Large Language Models As Programming Language Interpreters	Oct 7, 2025	CodeCode Available
SD-MVSum: Script-Driven Multimodal Video Summarization Method and Datasets	Oct 7, 2025	CodeCode Available
QGraphLIME - Explaining Quantum Graph Neural Networks	Oct 7, 2025	CodeCode Available
DecEx-RAG: Boosting Agentic Retrieval-Augmented Generation with Decision and Execution Optimization via Process Supervision	Oct 7, 2025	CodeCode Available
D^3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection	Oct 7, 2025	CodeCode Available