The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1151–1200 of 177339 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking the Myopic Trap: Positional Bias in Information Retrieval	May 20, 2025	BenchmarkingInformation Retrieval	CodeCode Available	5	5
Randomized Autoregressive Visual Generation	Nov 1, 2024	Image GenerationLanguage Modeling	CodeCode Available	5	5
DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition	Apr 30, 2025	Automated Theorem ProvingLarge Language Model	CodeCode Available	5	5
FlowTok: Flowing Seamlessly Across Text and Image Tokens	Mar 13, 2025	DenoisingImage to text	CodeCode Available	5	5
Loki: An Open-Source Tool for Fact Verification	Oct 2, 2024	Claim VerificationFact Checking	CodeCode Available	5	5
NeuralSVG: An Implicit Representation for Text-to-Vector Generation	Jan 7, 2025	Vector Graphics	CodeCode Available	5	5
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use	Nov 15, 2024		CodeCode Available	5	5
Weakly Supervised Detection of Hallucinations in LLM Activations	Dec 5, 2023	HallucinationLanguage Modeling	CodeCode Available	5	5
Vectorized and performance-portable Quicksort	May 12, 2022	CPU	CodeCode Available	5	5
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation	Apr 2, 2025	Conditional Image GenerationImage Generation	CodeCode Available	5	5
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis	Sep 3, 2024	3D Generation3D Reconstruction	CodeCode Available	5	5
PaSa: An LLM Agent for Comprehensive Academic Paper Search	Jan 17, 2025		CodeCode Available	5	5
Voyager: An Open-Ended Embodied Agent with Large Language Models	May 25, 2023	Lifelong learningMinecraft	CodeCode Available	5	5
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs	Jul 31, 2023	Trajectory PlanningZero-shot Generalization	CodeCode Available	5	5
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts	Jun 18, 2024	Language ModelingLanguage Modelling	CodeCode Available	5	5
On the Computation of the Fisher Information in Continual Learning	Feb 17, 2025	Continual Learning	CodeCode Available	5	5
Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention	May 23, 2025	3D Generation3D geometry	CodeCode Available	5	5
How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey	Feb 20, 2024	3DGSSimultaneous Localization and Mapping	CodeCode Available	5	5
GRUtopia: Dream General Robots in a City at Scale	Jul 15, 2024	Language ModellingLarge Language Model	CodeCode Available	5	5
Fractal Generative Models	Feb 24, 2025	Image Generation	CodeCode Available	5	5
Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations	Oct 10, 2024	Time Series ForecastingVideo Recognition	CodeCode Available	5	5
Factuality Enhanced Language Models for Open-Ended Text Generation	Jun 9, 2022	MisconceptionsSentence	CodeCode Available	5	5
Tool Learning with Foundation Models	Apr 17, 2023		CodeCode Available	5	5
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators	Apr 6, 2024	Chatbotcounterfactual	CodeCode Available	5	5
Deep Lake: a Lakehouse for Deep Learning	Sep 22, 2022	Decision MakingDeep Learning	CodeCode Available	5	5
MARIO Eval: Evaluate Your Math LLM with your Math LLM--A mathematical dataset evaluation toolkit	Apr 22, 2024	Math	CodeCode Available	5	5
YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary	Oct 20, 2024	object-detectionObject Detection	CodeCode Available	5	5
Efficient Diffusion Model for Image Restoration by Residual Shifting	Mar 12, 2024	Blind Face RestorationImage Inpainting	CodeCode Available	5	5
τ^2-Bench: Evaluating Conversational Agents in a Dual-Control Environment	Jun 9, 2025	AI Agent	CodeCode Available	5	5
DUSt3R: Geometric 3D Vision Made Easy	Dec 21, 2023	3D ReconstructionCamera Calibration	CodeCode Available	5	5
Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling	Jan 1, 2024	NeRF	CodeCode Available	5	5
EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine	Jun 21, 2022	MuJoCoreinforcement-learning	CodeCode Available	5	5
ProPainter: Improving Propagation and Transformer for Video Inpainting	Sep 7, 2023	Optical Flow EstimationVideo Inpainting	CodeCode Available	5	5
MedRAX: Medical Reasoning Agent for Chest X-ray	Feb 4, 2025	AI AgentManagement	CodeCode Available	5	5
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression	Oct 10, 2023	Code CompletionFew-Shot Learning	CodeCode Available	5	5
TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document	Mar 7, 2024	document understandingKey Information Extraction	CodeCode Available	5	5
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases	Feb 22, 2024		CodeCode Available	5	5
Self-Instruct: Aligning Language Models with Self-Generated Instructions	Dec 20, 2022	Instruction FollowingLanguage Modelling	CodeCode Available	5	5
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model	Mar 7, 2025	Multimodal Reasoningreinforcement-learning	CodeCode Available	4	5
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis	Feb 6, 2025	Speech Synthesis	CodeCode Available	4	5
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications	Jan 11, 2024	image-classificationImage Classification	CodeCode Available	4	5
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation	Jun 6, 2022	Image SegmentationInstance Segmentation	CodeCode Available	4	5
Baichuan 2: Open Large-scale Language Models	Sep 19, 2023	Feature EngineeringGSM8K	CodeCode Available	4	5
SEED-Story: Multimodal Long Story Generation with Large Language Model	Jul 11, 2024	Image GenerationLanguage Modeling	CodeCode Available	4	5
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents	Oct 14, 2024	RAGRetrieval	CodeCode Available	4	5
Otter: A Multi-Modal Model with In-Context Instruction Tuning	May 5, 2023	GPUIn-Context Learning	CodeCode Available	4	5
Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation	Feb 4, 2025	BenchmarkingInformation Retrieval	CodeCode Available	4	5
Safurai 001: New Qualitative Approach for Code LLM Evaluation	Sep 20, 2023	Language ModelingLanguage Modelling	CodeCode Available	4	5
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models	Mar 14, 2022	CPUQuantization	CodeCode Available	4	5
RePaint: Inpainting using Denoising Diffusion Probabilistic Models	Jan 24, 2022	DenoisingImage Inpainting	CodeCode Available	4	5