The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,313 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1851–1875 of 177339 papers

Title	Date	Tasks	Status	Hype	Score
Parameter-Efficient Prompt Tuning Makes Generalized and Calibrated Neural Text Retrievers	Jul 14, 2022	RetrievalText Retrieval	CodeCode Available	4	5
Mamba YOLO: A Simple Baseline for Object Detection with State Space Model	Jun 9, 2024	GPUMamba	CodeCode Available	4	5
Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements	Sep 30, 2022		CodeCode Available	4	5
Compressible-composable NeRF via Rank-residual Decomposition	May 30, 2022	NeRF	CodeCode Available	4	5
Structured Pruning for Deep Convolutional Neural Networks: A survey	Mar 1, 2023	Network PruningNeural Architecture Search	CodeCode Available	4	5
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge	Nov 25, 2024		CodeCode Available	4	5
AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks	Mar 21, 2024	Image to Video GenerationStyle Transfer	CodeCode Available	4	5
Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance	Oct 16, 2024	Human Agent Collaboration	CodeCode Available	4	5
Orb: A Fast, Scalable Neural Network Potential	Oct 29, 2024		CodeCode Available	4	5
Spirit LM: Interleaved Spoken and Written Language Model	Feb 8, 2024	Language ModelingLanguage Modelling	CodeCode Available	4	5
When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments	Jul 15, 2024	Language ModelingLanguage Modelling	CodeCode Available	4	5
SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights	Oct 11, 2024	GSM8KMath	CodeCode Available	4	5
I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench	Jan 31, 2024	BenchmarkingMultiple-choice	CodeCode Available	4	5
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens	Jun 17, 2024		CodeCode Available	4	5
Modern Neighborhood Components Analysis: A Deep Tabular Baseline Two Decades Later	Jul 3, 2024		CodeCode Available	4	5
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection	Mar 7, 2022	Object DetectionReal-Time Object Detection	CodeCode Available	4	5
TabM: Advancing Tabular Deep Learning with Parameter-Efficient Ensembling	Oct 31, 2024	Deep LearningRetrieval	CodeCode Available	4	5
INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation	Jun 13, 2023	Language ModelingLanguage Modelling	CodeCode Available	4	5
SegGPT: Segmenting Everything In Context	Apr 6, 2023	Few-Shot Semantic SegmentationIn-Context Learning	CodeCode Available	4	5
TinyLLaVA: A Framework of Small-scale Large Multimodal Models	Feb 22, 2024	Visual Question Answering	CodeCode Available	4	5
Building reliable sim driving agents by scaling self-play	Feb 20, 2025	Autonomous VehiclesBenchmarking	CodeCode Available	4	5
Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts	Mar 13, 2024	Image AnimationImage to Video Generation	CodeCode Available	4	5
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN	May 27, 2022	Image ClassificationInstance Segmentation	CodeCode Available	4	5
SkyReels-A2: Compose Anything in Video Diffusion Transformers	Apr 3, 2025	Human-Domain Subject-to-VideoOpen-Domain Subject-to-Video	CodeCode Available	4	5
Croissant: A Metadata Format for ML-Ready Datasets	Mar 28, 2024	FrictionManagement	CodeCode Available	4	5