The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2551–2600 of 659983 papers

Title	Date	Tasks	Status	Hype
CoMotion: Concurrent Multi-person 3D Motion	Apr 16, 2025	3D Pose EstimationPose Estimation	CodeCode Available	3
Elucidating the Design Space of Multimodal Protein Language Models	Apr 15, 2025	DiversityRepresentation Learning	CodeCode Available	3
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning	Apr 15, 2025	Mathematical ReasoningReinforcement Learning (RL)	CodeCode Available	3
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL	Apr 15, 2025	Inference Optimization	CodeCode Available	3
REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers	Apr 15, 2025	Image Generation	CodeCode Available	3
DataDecide: How to Predict Best Pretraining Data with Small Experiments	Apr 15, 2025	ARCHellaSwag	CodeCode Available	3
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning	Apr 15, 2025	Automated Theorem ProvingLarge Language Model	CodeCode Available	3
REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites	Apr 15, 2025	Autonomous Web NavigationBenchmarking	CodeCode Available	3
DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks	Apr 15, 2025		CodeCode Available	3
Efficient Reasoning Models: A Survey	Apr 15, 2025	Knowledge DistillationModel Compression	CodeCode Available	3
A Clean Slate for Offline Reinforcement Learning	Apr 15, 2025	Offline RLreinforcement-learning	CodeCode Available	3
Evaluation Report on MCP Servers	Apr 15, 2025	Large Language Model	CodeCode Available	3
Ai2 Scholar QA: Organized Literature Synthesis with Attribution	Apr 15, 2025	Question AnsweringRetrieval	CodeCode Available	3
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce	Apr 15, 2025	Reinforcement Learning (RL)	CodeCode Available	3
RAKG:Document-level Retrieval Augmented Knowledge Graph Construction	Apr 14, 2025	coreference-resolutionCoreference Resolution	CodeCode Available	3
The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report	Apr 14, 2025	Super-Resolutionvalid	CodeCode Available	3
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers	Apr 14, 2025		CodeCode Available	3
Deep Reasoning Translation via Reinforcement Learning	Apr 14, 2025	reinforcement-learningReinforcement Learning	CodeCode Available	3
GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents	Apr 14, 2025	Vision-Language-Action	CodeCode Available	3
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution	Apr 13, 2025	GSM8KMath	CodeCode Available	3
TensorNEAT: A GPU-accelerated Library for NeuroEvolution of Augmenting Topologies	Apr 11, 2025	Computational EfficiencyGPU	CodeCode Available	3
DocAgent: A Multi-Agent System for Automated Code Documentation Generation	Apr 11, 2025	Code Documentation Generation	CodeCode Available	3
MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications	Apr 11, 2025	GPU	CodeCode Available	3
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation	Apr 11, 2025	DecoderImage Generation	CodeCode Available	3
PixelFlow: Pixel-Space Generative Models with Flow	Apr 10, 2025	Conditional Image GenerationImage Generation	CodeCode Available	3
Detect Anything 3D in the Wild	Apr 10, 2025	3D Object DetectionAutonomous Driving	CodeCode Available	3
Perception-R1: Pioneering Perception Policy with Reinforcement Learning	Apr 10, 2025	reinforcement-learningReinforcement Learning	CodeCode Available	3
Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory	Apr 10, 2025	MathMMLU	CodeCode Available	3
Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction	Apr 10, 2025	3D Reconstruction4D reconstruction	CodeCode Available	3
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning	Apr 9, 2025	MVBenchObject Tracking	CodeCode Available	3
FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution	Apr 9, 2025	2kDecision Making	CodeCode Available	3
SEA-LION: Southeast Asian Languages in One Network	Apr 8, 2025		CodeCode Available	3
GPU-accelerated Evolutionary Many-objective Optimization Using Tensorized NSGA-III	Apr 8, 2025	Computational EfficiencyCPU	CodeCode Available	3
DDT: Decoupled Diffusion Transformer	Apr 8, 2025	DenoisingImage Generation	CodeCode Available	3
PromptHMR: Promptable Human Mesh Recovery	Apr 8, 2025	3D Human Pose EstimationHuman Mesh Recovery	CodeCode Available	3
Playing Non-Embedded Card-Based Games with Reinforcement Learning	Apr 7, 2025	Board GamesDecision Making	CodeCode Available	3
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation	Apr 7, 2025	3D geometryRGBD Semantic Segmentation	CodeCode Available	3
Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization	Apr 5, 2025	3D GenerationVideo Alignment	CodeCode Available	3
TrafficLLM: Enhancing Large Language Models for Network Traffic Analysis with Generic Traffic Representation	Apr 5, 2025		CodeCode Available	3
Scaling Analysis of Interleaved Speech-Text Language Models	Apr 3, 2025	Transfer Learning	CodeCode Available	3
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation	Apr 3, 2025	Image GenerationWorld Knowledge	CodeCode Available	3
Affordable AI Assistants with Knowledge Graph of Thoughts	Apr 3, 2025	Knowledge GraphsLLM real-life tasks	CodeCode Available	3
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving	Apr 3, 2025	Reinforcement Learning (RL)	CodeCode Available	3
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning	Apr 3, 2025	Image GenerationInstruction Following	CodeCode Available	3
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation	Apr 3, 2025	MambaTalking Head Generation	CodeCode Available	3
End-to-End Driving with Online Trajectory Evaluation via BEV World Model	Apr 2, 2025	Autonomous DrivingBench2Drive	CodeCode Available	3
YourBench: Easy Custom Evaluation Sets for Everyone	Apr 2, 2025	MMLU	CodeCode Available	3
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction	Apr 1, 2025	Image Generation	CodeCode Available	3
MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs	Apr 1, 2025	Knowledge GraphsMathematical Reasoning	CodeCode Available	3
Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB	Apr 1, 2025	Decision MakingRAG	CodeCode Available	3