The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 15201–15250 of 474278 papers

Title	Date	Tasks	Status	Hype
Draft-based Approximate Inference for LLMs	Jun 10, 2025		CodeCode Available	1
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval	Jun 10, 2025	Image CaptioningRetrieval	CodeCode Available	1
Intention-Conditioned Flow Occupancy Models	Jun 10, 2025	Reinforcement Learning (RL)	CodeCode Available	1
scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data	Jun 10, 2025	BenchmarkingData Augmentation	CodeCode Available	1
KARMA: A Multilevel Decomposition Hybrid Mamba Framework for Multivariate Long-Term Time Series Forecasting	Jun 10, 2025	Computational EfficiencyMamba	CodeCode Available	1
RS-MTDF: Multi-Teacher Distillation and Fusion for Remote Sensing Semi-Supervised Semantic Segmentation	Jun 10, 2025	Semantic SegmentationSemi-Supervised Semantic Segmentation	CodeCode Available	1
RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling	Jun 10, 2025	Computational EfficiencyReinforcement Learning (RL)	CodeCode Available	1
SDMPrune: Self-Distillation MLP Pruning for Efficient Large Language Models	Jun 10, 2025		CodeCode Available	1
SLEEPYLAND: trust begins with fair evaluation of automatic sleep staging models	Jun 10, 2025	EEGSleep Staging	CodeCode Available	1
FaithfulRAG: Fact-Level Conflict Modeling for Context-Faithful Retrieval-Augmented Generation	Jun 10, 2025	RAGRetrieval	CodeCode Available	1
Monocular 3D Hand Pose Estimation with Implicit Camera Alignment	Jun 10, 2025	3D Hand Pose EstimationHand Pose Estimation	CodeCode Available	1
syren-baryon: Analytic emulators for the impact of baryons on the matter power spectrum	Jun 10, 2025	Symbolic Regression	CodeCode Available	1
SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner	Jun 10, 2025	test driven development	CodeCode Available	1
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning	Jun 10, 2025	Knowledge DistillationMath	CodeCode Available	1
Know-MRI: A Knowledge Mechanisms Revealer&Interpreter for Large Language Models	Jun 10, 2025		CodeCode Available	1
PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers and Pseudo Anomalies	Jun 10, 2025	Adversarial RobustnessAnomaly Detection	CodeCode Available	1
Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion	Jun 10, 2025		CodeCode Available	1
Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs	Jun 10, 2025		CodeCode Available	1
Ambient Diffusion Omni: Training Good Models with Bad Data	Jun 10, 2025		CodeCode Available	1
ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization	Jun 10, 2025		CodeCode Available	1
UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench	Jun 10, 2025	Code Generation	CodeCode Available	1
Re4MPC: Reactive Nonlinear MPC for Multi-model Motion Planning via Deep Reinforcement Learning	Jun 10, 2025	Decision MakingDeep Reinforcement Learning	CodeCode Available	1
Adapting Vision-Language Foundation Model for Next Generation Medical Ultrasound Image Analysis	Jun 10, 2025	Domain AdaptationLarge Language Model	CodeCode Available	1
mLaSDI: Multi-stage latent space dynamics identification	Jun 10, 2025		CodeCode Available	1
DRAGged into Conflicts: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs	Jun 10, 2025	RAGRetrieval-augmented Generation	CodeCode Available	1
InceptionMamba: An Efficient Hybrid Network with Large Band Convolution and Bottleneck Mamba	Jun 10, 2025	Computational Efficiencyimage-classification	CodeCode Available	1
Token Perturbation Guidance for Diffusion Models	Jun 10, 2025		CodeCode Available	1
PairEdit: Learning Semantic Variations for Exemplar-based Image Editing	Jun 9, 2025	text-guided-image-editing	CodeCode Available	1
From Debate to Equilibrium: Belief-Driven Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium	Jun 9, 2025	Hierarchical Reinforcement Learning	CodeCode Available	1
Multiple Object Stitching for Unsupervised Representation Learning	Jun 9, 2025	Contrastive LearningObject	CodeCode Available	1
Improving large language models with concept-aware fine-tuning	Jun 9, 2025	Protein DesignText Summarization	CodeCode Available	1
Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models	Jun 9, 2025	Multi-agent Reinforcement LearningSafety Alignment	CodeCode Available	1
Premise Selection for a Lean Hammer	Jun 9, 2025		CodeCode Available	1
CyberV: Cybernetics for Test-time Scaling in Video Understanding	Jun 9, 2025	Video Understanding	CodeCode Available	1
SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence	Jun 9, 2025		CodeCode Available	1
SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design	Jun 9, 2025	Code GenerationRAG	CodeCode Available	1
Curriculum Learning With Counterfactual Group Relative Policy Advantage For Multi-Agent Reinforcement Learning	Jun 9, 2025	counterfactualMulti-agent Reinforcement Learning	CodeCode Available	1
RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards	Jun 9, 2025	Safety Alignment	CodeCode Available	1
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers	Jun 9, 2025	Attribute	CodeCode Available	1
Spatio-Temporal State Space Model For Efficient Event-Based Optical Flow	Jun 9, 2025	Computational EfficiencyEvent-based Optical Flow	CodeCode Available	1
Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning	Jun 9, 2025		CodeCode Available	1
RADAR: Benchmarking Language Models on Imperfect Tabular Data	Jun 9, 2025	BenchmarkingMissing Values	CodeCode Available	1
Egocentric Event-Based Vision for Ping Pong Ball Trajectory Prediction	Jun 9, 2025	Event-based visionTrajectory Forecasting	CodeCode Available	1
C3S3: Complementary Competition and Contrastive Selection for Semi-Supervised Medical Image Segmentation	Jun 9, 2025	Contrastive LearningDiagnostic	CodeCode Available	1
Generative Modeling of Weights: Generalization or Memorization?	Jun 9, 2025	MemorizationVideo Generation	CodeCode Available	1
Compound AI Systems Optimization: A Survey of Methods, Challenges, and Future Directions	Jun 9, 2025	Reinforcement Learning (RL)	CodeCode Available	1
Diffusion Sequence Models for Enhanced Protein Representation and Generation	Jun 9, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
Flowing Datasets with Wasserstein over Wasserstein Gradient Flows	Jun 9, 2025	Dataset DistillationDomain Adaptation	CodeCode Available	1
Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding	Jun 9, 2025		CodeCode Available	1
CheMatAgent: Enhancing LLMs for Chemistry and Materials Science through Tree-Search Based Tool Learning	Jun 9, 2025	Information Retrieval	CodeCode Available	1