The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5101–5150 of 661570 papers

Title	Date	Tasks	Status	Hype
CGVQM+D: Computer Graphics Video Quality Metric and Dataset	Jun 13, 2025	DenoisingNovel View Synthesis	CodeCode Available	2
Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders	Jun 13, 2025	Speech Enhancement	CodeCode Available	2
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes	Jun 13, 2025	Linear evaluationSelf-Supervised Learning	CodeCode Available	2
Execution Guided Line-by-Line Code Generation	Jun 12, 2025	Code Generation	CodeCode Available	2
SDialog: A Python Toolkit for Synthetic Dialogue Generation and Analysis	Jun 12, 2025	BenchmarkingDialogue Generation	CodeCode Available	2
ConTextTab: A Semantics-Aware Tabular In-Context Learner	Jun 12, 2025	In-Context LearningWorld Knowledge	CodeCode Available	2
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science	Jun 12, 2025	Code GenerationLarge Language Model	CodeCode Available	2
Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs	Jun 12, 2025	PhilosophyPrompt Engineering	CodeCode Available	2
CreatiPoster: Towards Editable and Controllable Multi-Layer Graphic Design Generation	Jun 12, 2025		CodeCode Available	2
VideoDeepResearch: Long Video Understanding With Agentic Tool Using	Jun 12, 2025	MMEVideo MME	CodeCode Available	2
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs	Jun 12, 2025	Diversity	CodeCode Available	2
TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning	Jun 12, 2025	Answer GenerationChunking	CodeCode Available	2
QuadricFormer: Scene as Superquadrics for 3D Semantic Occupancy Prediction	Jun 12, 2025	3D Semantic Occupancy PredictionAutonomous Driving	CodeCode Available	2
SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks	Jun 12, 2025	GitHub issue resolutionvalid	CodeCode Available	2
OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems	Jun 12, 2025		CodeCode Available	2
ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark	Jun 12, 2025		CodeCode Available	2
GLAP: General contrastive audio-text pretraining across domains and languages	Jun 12, 2025	AudioCapsKeyword Spotting	CodeCode Available	2
SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending	Jun 11, 2025	Hierarchical Reinforcement LearningHumanoid Control	CodeCode Available	2
A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy	Jun 11, 2025		CodeCode Available	2
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning	Jun 11, 2025	Medical Question AnsweringQuestion Answering	CodeCode Available	2
TaskCraft: Automated Generation of Agentic Tasks	Jun 11, 2025		CodeCode Available	2
ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model	Jun 11, 2025	cross-modal alignmentDescriptive	CodeCode Available	2
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing	Jun 11, 2025	Multimodal ReasoningSpatial Reasoning	CodeCode Available	2
Marrying Autoregressive Transformer and Diffusion with Multi-Reference Autoregression	Jun 11, 2025	Image Generation	CodeCode Available	2
VerIF: Verification Engineering for Reinforcement Learning in Instruction Following	Jun 11, 2025	Instruction Followingreinforcement-learning	CodeCode Available	2
IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments	Jun 11, 2025	Benchmarking	CodeCode Available	2
Tightly-Coupled LiDAR-IMU-Leg Odometry with Online Learned Leg Kinematics Incorporating Foot Tactile Information	Jun 11, 2025		CodeCode Available	2
Urban1960SatSeg: Unsupervised Semantic Segmentation of Mid-20^th century Urban Landscapes with Satellite Imageries	Jun 11, 2025	SegmentationSelf-Supervised Learning	CodeCode Available	2
UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting	Jun 11, 2025	DiversityRepresentation Learning	CodeCode Available	2
CoRT: Code-integrated Reasoning within Thinking	Jun 11, 2025	Mathematical Reasoning	CodeCode Available	2
Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning	Jun 11, 2025	Image CaptioningMath	CodeCode Available	2
CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models	Jun 11, 2025	counterfactualDescriptive	CodeCode Available	2
Do MIL Models Transfer?	Jun 10, 2025	Multiple Instance LearningTransfer Learning	CodeCode Available	2
Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability	Jun 10, 2025	Optical Character Recognition (OCR)	CodeCode Available	2
Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning	Jun 10, 2025	Model SelectionReinforcement Learning (RL)	CodeCode Available	2
Solving the Job Shop Scheduling Problem with Graph Neural Networks: A Customizable Reinforcement Learning Environment	Jun 10, 2025	Combinatorial OptimizationImitation Learning	CodeCode Available	2
AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions	Jun 10, 2025	Math	CodeCode Available	2
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better	Jun 10, 2025	Image Generation	CodeCode Available	2
StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams	Jun 10, 2025	3DGS3D Reconstruction	CodeCode Available	2
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation	Jun 10, 2025	Image-text RetrievalQuestion Answering	CodeCode Available	2
Segment This Thing: Foveated Tokenization for Efficient Point-Prompted Segmentation	Jun 10, 2025	FoveationImage Segmentation	CodeCode Available	2
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning	Jun 10, 2025	4kGPU	CodeCode Available	2
ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering	Jun 10, 2025	Scheduling	CodeCode Available	2
FedRAG: A Framework for Fine-Tuning Retrieval-Augmented Generation Systems	Jun 10, 2025	RAGRetrieval	CodeCode Available	2
Snap-and-tune: combining deep learning and test-time optimization for high-fidelity cardiovascular volumetric meshing	Jun 9, 2025		CodeCode Available	2
Open World Scene Graph Generation using Vision Language Models	Jun 9, 2025	Graph GenerationScene Graph Generation	CodeCode Available	2
CausalPFN: Amortized Causal Effect Estimation via In-Context Learning	Jun 9, 2025	Decision MakingHeterogeneous Treatment Effect Estimation	CodeCode Available	2
Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions	Jun 9, 2025	Large Language ModelReinforcement Learning (RL)	CodeCode Available	2
FunDiff: Diffusion Models over Function Spaces for Physics-Informed Generative Modeling	Jun 9, 2025	Density Estimation	CodeCode Available	2
Play to Generalize: Learning to Reason Through Game Play	Jun 9, 2025	Domain GeneralizationMath	CodeCode Available	2