The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2876–2900 of 661570 papers

Title	Date	Tasks	Status	Hype
Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents	Feb 22, 2025	AI Agent	CodeCode Available	3
MEMORYLLM: Towards Self-Updatable Large Language Models	Feb 7, 2024	Model Editing	CodeCode Available	3
BatchTopK Sparse Autoencoders	Dec 9, 2024	Language ModelingLanguage Modelling	CodeCode Available	3
On the Efficiency of NLP-Inspired Methods for Tabular Deep Learning	Nov 26, 2024	Computational EfficiencyDeep Learning	CodeCode Available	3
Large Language Models Are Human-Level Prompt Engineers	Nov 3, 2022	Few-Shot LearningIn-Context Learning	CodeCode Available	3
Zero-Shot Text-to-Image Generation	Feb 24, 2021	Image GenerationText to Image Generation	CodeCode Available	3
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction	Feb 27, 2024	3D geometry3D Object Captioning	CodeCode Available	3
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning	May 15, 2025	cross-modal alignmentGeometry Problem Solving	CodeCode Available	3
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy	Jun 28, 2024	Vision-Language-ActionWorld Knowledge	CodeCode Available	3
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric	Jan 11, 2018	Image Quality AssessmentSSIM	CodeCode Available	3
Cross-Modal Causal Intervention for Medical Report Generation	Mar 16, 2023	Medical Report Generationobject-detection	CodeCode Available	3
Evaluating Large Language Models for Radiology Natural Language Processing	Jul 25, 2023		CodeCode Available	3
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement	Jun 17, 2024	speech-recognitionSpeech Recognition	CodeCode Available	3
Neuron-Level Sequential Editing for Large Language Models	Oct 5, 2024	Model Editing	CodeCode Available	3
The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas	Jun 25, 2025		CodeCode Available	3
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model	Aug 30, 2024	Audio CompressionAudio Generation	CodeCode Available	3
SALMONN: Towards Generic Hearing Abilities for Large Language Models	Oct 20, 2023	Audio captioningAutomatic Speech Recognition	CodeCode Available	3
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model	Apr 24, 2023	AudioCapsAudio Generation	CodeCode Available	3
PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization	Mar 3, 2025		CodeCode Available	3
OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer	Jul 15, 2024	Language ModelingLanguage Modelling	CodeCode Available	3
Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking	Mar 27, 2022	CPUMulti-Object Tracking	CodeCode Available	3
TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving	May 31, 2022	Autonomous DrivingCARLA longest6	CodeCode Available	3
EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba	Mar 15, 2024	Language ModelingLanguage Modelling	CodeCode Available	3
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities	Aug 8, 2024		CodeCode Available	3
Accelerating Diffusion Transformers with Dual Feature Caching	Dec 25, 2024	Video Generation	CodeCode Available	3