The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2801–2850 of 177339 papers

Title	Date	Tasks	Status	Hype	Score
DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector	Apr 13, 2024	Data AugmentationKey Point Matching	CodeCode Available	3	5
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents	Jan 17, 2024	Natural Language Visual Grounding	CodeCode Available	3	5
Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents	Feb 22, 2025	AI Agent	CodeCode Available	3	5
MEMORYLLM: Towards Self-Updatable Large Language Models	Feb 7, 2024	Model Editing	CodeCode Available	3	5
BatchTopK Sparse Autoencoders	Dec 9, 2024	Language ModelingLanguage Modelling	CodeCode Available	3	5
On the Efficiency of NLP-Inspired Methods for Tabular Deep Learning	Nov 26, 2024	Computational EfficiencyDeep Learning	CodeCode Available	3	5
Large Language Models Are Human-Level Prompt Engineers	Nov 3, 2022	Few-Shot LearningIn-Context Learning	CodeCode Available	3	5
Zero-Shot Text-to-Image Generation	Feb 24, 2021	Image GenerationText to Image Generation	CodeCode Available	3	5
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction	Feb 27, 2024	3D geometry3D Object Captioning	CodeCode Available	3	5
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning	May 15, 2025	cross-modal alignmentGeometry Problem Solving	CodeCode Available	3	5
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy	Jun 28, 2024	Vision-Language-ActionWorld Knowledge	CodeCode Available	3	5
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric	Jan 11, 2018	Image Quality AssessmentSSIM	CodeCode Available	3	5
Cross-Modal Causal Intervention for Medical Report Generation	Mar 16, 2023	Medical Report Generationobject-detection	CodeCode Available	3	5
Evaluating Large Language Models for Radiology Natural Language Processing	Jul 25, 2023		CodeCode Available	3	5
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement	Jun 17, 2024	speech-recognitionSpeech Recognition	CodeCode Available	3	5
Neuron-Level Sequential Editing for Large Language Models	Oct 5, 2024	Model Editing	CodeCode Available	3	5
The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas	Jun 25, 2025		CodeCode Available	3	5
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model	Aug 30, 2024	Audio CompressionAudio Generation	CodeCode Available	3	5
SALMONN: Towards Generic Hearing Abilities for Large Language Models	Oct 20, 2023	Audio captioningAutomatic Speech Recognition	CodeCode Available	3	5
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model	Apr 24, 2023	AudioCapsAudio Generation	CodeCode Available	3	5
PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization	Mar 3, 2025		CodeCode Available	3	5
OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer	Jul 15, 2024	Language ModelingLanguage Modelling	CodeCode Available	3	5
Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking	Mar 27, 2022	CPUMulti-Object Tracking	CodeCode Available	3	5
TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving	May 31, 2022	Autonomous DrivingCARLA longest6	CodeCode Available	3	5
EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba	Mar 15, 2024	Language ModelingLanguage Modelling	CodeCode Available	3	5
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities	Aug 8, 2024		CodeCode Available	3	5
Accelerating Diffusion Transformers with Dual Feature Caching	Dec 25, 2024	Video Generation	CodeCode Available	3	5
Keypoint Promptable Re-Identification	Jul 25, 2024	Metric LearningOccluded Person Re-Identification	CodeCode Available	3	5
Proteus: A Self-Designing Range Filter	Jun 30, 2022		CodeCode Available	3	5
SARATR-X: Toward Building A Foundation Model for SAR Target Recognition	May 15, 2024	2D Object DetectionEarth Observation	CodeCode Available	3	5
AutoTimes: Autoregressive Time Series Forecasters via Large Language Models	Feb 4, 2024	DecoderIn-Context Learning	CodeCode Available	3	5
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models	Mar 5, 2024	Knowledge DistillationPrompt Engineering	CodeCode Available	3	5
Matbench Discovery -- A framework to evaluate machine learning crystal stability predictions	Aug 28, 2023	BenchmarkingFormation Energy	CodeCode Available	3	5
Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small Language Models	Mar 10, 2024	Visual Question Answering	CodeCode Available	3	5
SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation	Aug 16, 2024	Image SegmentationMarine Animal Segmentation	CodeCode Available	3	5
Multimodal Foundation Models: From Specialists to General-Purpose Assistants	Sep 18, 2023	Image GenerationSurvey	CodeCode Available	3	5
Aria-UI: Visual Grounding for GUI Instructions	Dec 20, 2024	Natural Language Visual GroundingVisual Grounding	CodeCode Available	3	5
Karatsuba Matrix Multiplication and its Efficient Custom Hardware Implementations	Jan 15, 2025		CodeCode Available	3	5
VRT: A Video Restoration Transformer	Jan 28, 2022	DeblurringDenoising	CodeCode Available	3	5
A Demonstration of Adaptive Collaboration of Large Language Models for Medical Decision-Making	Oct 31, 2024	Decision MakingDiagnostic	CodeCode Available	3	5
TinyAgent: Function Calling at the Edge	Sep 1, 2024	Language ModellingQuantization	CodeCode Available	3	5
Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models	Oct 16, 2024	HallucinationKnowledge Graphs	CodeCode Available	3	5
Graph-Augmented Normalizing Flows for Anomaly Detection of Multiple Time Series	Feb 16, 2022	Anomaly DetectionDensity Estimation	CodeCode Available	3	5
Towards An End-to-End Framework for Flow-Guided Video Inpainting	Apr 6, 2022	HallucinationOptical Flow Estimation	CodeCode Available	3	5
Sintel: A Machine Learning Framework to Extract Insights from Signals	Apr 19, 2022	Anomaly DetectionBIG-bench Machine Learning	CodeCode Available	3	5
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation	Aug 28, 2023	Instance SegmentationOptical Flow Estimation	CodeCode Available	3	5
TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement	Jun 14, 2023	GPUMotion Estimation	CodeCode Available	3	5
Playing Non-Embedded Card-Based Games with Reinforcement Learning	Apr 7, 2025	Board GamesDecision Making	CodeCode Available	3	5
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation	Sep 20, 2018	Multi-task Audio Source SeperationMusic Source Separation	CodeCode Available	3	5
DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models	Oct 26, 2022	DiversityMisinformation	CodeCode Available	3	5