The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2001–2050 of 659983 papers

Title	Date	Tasks	Status	Hype
TerraTorch: The Geospatial Foundation Models Toolkit	Mar 26, 2025	BenchmarkingDecoder	CodeCode Available	4
Video-R1: Reinforcing Video Reasoning in MLLMs	Mar 27, 2025	MVBenchReinforcement Learning (RL)	CodeCode Available	4
SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement	Jun 9, 2025	Music Generation	CodeCode Available	4
SpatialTrackerV2: 3D Point Tracking Made Easy	Jul 16, 2025	3D ReconstructionCamera Pose Estimation	CodeCode Available	4
Proactive Detection of Voice Cloning with Localized Watermarking	Jan 30, 2024	Voice Cloning	CodeCode Available	4
Eliciting Latent Predictions from Transformers with the Tuned Lens	Mar 14, 2023	Language Modelling	CodeCode Available	4
REFINE: Inversion-Free Backdoor Defense via Model Reprogramming	Feb 22, 2025	backdoor defense	CodeCode Available	4
Relationships are Complicated! An Analysis of Relationships Between Datasets on the Web	Aug 26, 2024	Decision MakingMulti-class Classification	CodeCode Available	4
Benchmarking Graphormer on Large-Scale Molecular Modeling Datasets	Mar 9, 2022	BenchmarkingGraph Regression	CodeCode Available	4
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment	Oct 3, 2023	Audio ClassificationContrastive Learning	CodeCode Available	4
SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement	Oct 26, 2024	Large Language Model	CodeCode Available	4
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization	Mar 13, 2025	Multimodal Reasoning	CodeCode Available	4
Recurrent Partial Kernel Network for Efficient Optical Flow Estimation	Feb 1, 2024	Optical Flow Estimation	CodeCode Available	4
DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality	Oct 25, 2022	Deep Reinforcement LearningGPU	CodeCode Available	4
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning	Mar 20, 2025	Decision MakingLanguage Modeling	CodeCode Available	4
Are Transformers Effective for Time Series Forecasting?	May 26, 2022	Anomaly DetectionRelation Extraction	CodeCode Available	4
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models	May 8, 2025	Multimodal Reasoning	CodeCode Available	4
Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation	Dec 4, 2023	Depth EstimationGPU	CodeCode Available	4
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot	Jan 2, 2023	Common Sense ReasoningLanguage Modelling	CodeCode Available	4
AlignScore: Evaluating Factual Consistency with a Unified Alignment Function	May 26, 2023	Fact VerificationInformation Retrieval	CodeCode Available	4
TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos	Mar 26, 2024	3D Human Pose Estimation	CodeCode Available	4
TableGPT2: A Large Multimodal Model with Tabular Data Integration	Nov 4, 2024	BenchmarkingData Integration	CodeCode Available	4
Human-Humanoid Robots Cross-Embodiment Behavior-Skill Transfer Using Decomposed Adversarial Learning from Demonstration	Dec 19, 2024	Human-Object Interaction Detectionmotion retargeting	CodeCode Available	4
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds	Jul 1, 2024	Audio GenerationVideo Alignment	CodeCode Available	4
MovieChat+: Question-aware Sparse Memory for Long Video Question Answering	Apr 26, 2024	2kQuestion Answering	CodeCode Available	4
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report	Feb 25, 2024		CodeCode Available	4
R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning	May 22, 2025	MemorizationRAG	CodeCode Available	4
The case for 4-bit precision: k-bit Inference Scaling Laws	Dec 19, 2022	Quantization	CodeCode Available	4
ActiveAnno3D -- An Active Learning Framework for Multi-Modal 3D Object Detection	Feb 5, 2024	3D Object DetectionActive Learning	CodeCode Available	4
DepGraph: Towards Any Structural Pruning	Jan 30, 2023	Network PruningNeural Network Compression	CodeCode Available	4
Improving Training Stability for Multitask Ranking Models in Recommender Systems	Feb 17, 2023	Recommendation Systems	CodeCode Available	4
High-Resolution Image Synthesis with Latent Diffusion Models	Dec 20, 2021	DenoisingGPU	CodeCode Available	4
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources	Jun 7, 2023	Instruction Following	CodeCode Available	4
Decoder Tuning: Efficient Language Understanding as Decoding	Dec 16, 2022	DecoderNatural Language Understanding	CodeCode Available	4
Programming Is Hard -- Or at Least It Used to Be: Educational Opportunities And Challenges of AI Code Generation	Dec 2, 2022	Code GenerationPosition	CodeCode Available	4
The CLRS-Text Algorithmic Reasoning Language Benchmark	Jun 6, 2024		CodeCode Available	4
PointMamba: A Simple State Space Model for Point Cloud Analysis	Feb 16, 2024	GPUMamba	CodeCode Available	4
Reducing Activation Recomputation in Large Transformer Models	May 10, 2022		CodeCode Available	4
Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation	Feb 28, 2024	AttributeExtractive Question-Answering	CodeCode Available	4
ReChorus2.0: A Modular and Task-Flexible Recommendation Library	May 28, 2024	Click-Through Rate PredictionRecommendation Systems	CodeCode Available	4
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects	Dec 13, 2023	3D Object Detection3D Object Tracking	CodeCode Available	4
ChangeMamba: Remote Sensing Change Detection With Spatiotemporal State Space Model	Apr 4, 2024	2D Semantic SegmentationAttribute	CodeCode Available	4
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach	Feb 7, 2025	Language ModelingLanguage Modelling	CodeCode Available	4
SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection	Mar 11, 2024	2D Object Detection2k	CodeCode Available	4
SNAC: Multi-Scale Neural Audio Codec	Oct 18, 2024	Audio CompressionAudio Generation	CodeCode Available	4
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions	Jun 22, 2024	BenchmarkingCode Generation	CodeCode Available	4
Boximator: Generating Rich and Controllable Motions for Video Synthesis	Feb 2, 2024		CodeCode Available	4
Phoenix: Democratizing ChatGPT across Languages	Apr 20, 2023	Language ModelingLanguage Modelling	CodeCode Available	4
Blendify -- Python rendering framework for Blender	Oct 23, 2024	10-shot image generation	CodeCode Available	4
Benchmarking Retrieval-Augmented Generation for Medicine	Feb 20, 2024	BenchmarkingInformation Retrieval	CodeCode Available	4