The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2501–2550 of 659983 papers

Title	Date	Tasks	Status	Hype
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning	May 18, 2025	Reinforcement Learning (RL)Visual Grounding	CodeCode Available	3
Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward	May 18, 2025	GPUGraph Matching	CodeCode Available	3
dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching	May 17, 2025	Denoising	CodeCode Available	3
Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis	May 16, 2025	Continual LearningRepresentation Learning	CodeCode Available	3
SongEval: A Benchmark Dataset for Song Aesthetics Evaluation	May 16, 2025		CodeCode Available	3
Visual Planning: Let's Think Only with Images	May 16, 2025	reinforcement-learningReinforcement Learning	CodeCode Available	3
Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment Benchmarking	May 16, 2025	BenchmarkingManagement	CodeCode Available	3
MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation	May 15, 2025	Image AnimationVideo Generation	CodeCode Available	3
Parallel Scaling Law for Language Models	May 15, 2025		CodeCode Available	3
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning	May 15, 2025	cross-modal alignmentGeometry Problem Solving	CodeCode Available	3
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning	May 13, 2025	Reinforcement Learning (RL)Visual Reasoning	CodeCode Available	3
Generative AI for Autonomous Driving: Frontiers and Opportunities	May 13, 2025	Autonomous DrivingVideo Generation	CodeCode Available	3
OLinear: A Linear Model for Time Series Forecasting in Orthogonally Transformed Domain	May 12, 2025	Multivariate Time Series ForecastingRepresentation Learning	CodeCode Available	3
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	May 12, 2025	Code Generation	CodeCode Available	3
CompSLAM: Complementary Hierarchical Multi-Modal Localization and Mapping for Robot Autonomy in Underground Environments	May 10, 2025	Pose Estimation	CodeCode Available	3
LLMs Get Lost In Multi-Turn Conversation	May 9, 2025		CodeCode Available	3
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization	May 9, 2025	Benchmarking	CodeCode Available	3
SOAP: Style-Omniscient Animatable Portraits	May 8, 2025	Image to 3D	CodeCode Available	3
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation	May 8, 2025	Quantization	CodeCode Available	3
A Common Interface for Automatic Differentiation	May 8, 2025		CodeCode Available	3
FastMap: Revisiting Dense and Scalable Structure from Motion	May 7, 2025	GPU	CodeCode Available	3
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation	May 6, 2025	Robot ManipulationVision-Language-Action	CodeCode Available	3
LiftFeat: 3D Geometry-Aware Local Feature Matching	May 6, 2025	3D geometryDepth Estimation	CodeCode Available	3
Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language Models	May 5, 2025	Policy Gradient MethodsRAG	CodeCode Available	3
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning	May 5, 2025	Reinforcement Learning (RL)	CodeCode Available	3
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis	May 5, 2025	ChatbotDecoder	CodeCode Available	3
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play	May 5, 2025	AI AgentAutomatic Speech Recognition	CodeCode Available	3
Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields	May 4, 2025	Mixture-of-ExpertsNeRF	CodeCode Available	3
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions	May 1, 2025	Survey	CodeCode Available	3
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models	May 1, 2025	Large Language Model	CodeCode Available	3
Nexus-Gen: A Unified Model for Image Understanding, Generation, and Editing	Apr 30, 2025	Image Generation	CodeCode Available	3
Reinforcement Learning for Reasoning in Large Language Models with One Training Example	Apr 29, 2025	Domain GeneralizationMath	CodeCode Available	3
PixelHacker: Image Inpainting with Structural and Semantic Consistency	Apr 29, 2025	DenoisingImage Generation	CodeCode Available	3
ReasonIR: Training Retrievers for Reasoning Tasks	Apr 29, 2025	Information RetrievalMMLU	CodeCode Available	3
Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video	Apr 28, 2025		CodeCode Available	3
Annif at SemEval-2025 Task 5: Traditional XMTC augmented by LLMs	Apr 28, 2025	Synthetic Data Generation	CodeCode Available	3
MP-SfM: Monocular Surface Priors for Robust Structure-from-Motion	Apr 28, 2025		CodeCode Available	3
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos	Apr 24, 2025	MMEVideo MME	CodeCode Available	3
An Empirical Study on Prompt Compression for Large Language Models	Apr 24, 2025	ArticlesMath	CodeCode Available	3
Tina: Tiny Reasoning Models via LoRA	Apr 22, 2025	Reinforcement Learning (RL)	CodeCode Available	3
Grad: Guided Relation Diffusion Generation for Graph Augmentation in Graph Fraud Detection	Apr 22, 2025	Contrastive LearningFraud Detection	CodeCode Available	3
Learning to Reason under Off-Policy Guidance	Apr 21, 2025	MathReinforcement Learning (RL)	CodeCode Available	3
OmniAudio: Generating Spatial Audio from 360-Degree Video	Apr 21, 2025	Audio Generation	CodeCode Available	3
TAPIP3D: Tracking Any Point in Persistent 3D Geometry	Apr 20, 2025	3D geometryDepth And Camera Motion	CodeCode Available	3
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D	Apr 19, 2025	DecoderObject Localization	CodeCode Available	3
Generative AI Act II: Test Time Scaling Drives Cognition Engineering	Apr 18, 2025	Prompt Engineering	CodeCode Available	3
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models	Apr 18, 2025	Feature Upsampling	CodeCode Available	3
Event-Enhanced Blurry Video Super-Resolution	Apr 17, 2025	DeblurringMotion Estimation	CodeCode Available	3
IMAGGarment-1: Fine-Grained Garment Generation for Controllable Fashion Design	Apr 17, 2025		CodeCode Available	3
Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts	Apr 17, 2025	Denoising	CodeCode Available	3