The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5076–5100 of 661570 papers

Title	Date	Tasks	Status	Hype
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models	Jun 18, 2025	Audio captioningLarge Language Model	CodeCode Available	2
Descriptor-based Foundation Models for Molecular Property Prediction	Jun 18, 2025	Molecular Property PredictionPrediction	CodeCode Available	2
HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges	Jun 18, 2025	Combinatorial Optimization	CodeCode Available	2
cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree	Jun 18, 2025	ChunkingCode Generation	CodeCode Available	2
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs	Jun 17, 2025		CodeCode Available	2
Comprehensive Verilog Design Problems: A Next-Generation Benchmark Dataset for Evaluating Large Language Models and Agents on RTL Design and Verification	Jun 17, 2025	Code Generation	CodeCode Available	2
Essential-Web v1.0: 24T tokens of organized web data	Jun 17, 2025	Math	CodeCode Available	2
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents	Jun 17, 2025		CodeCode Available	2
BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation Models	Jun 17, 2025	BenchmarkingLanguage Modeling	CodeCode Available	2
Test3R: Learning to Reconstruct 3D at Test Time	Jun 16, 2025	3D ReconstructionDepth Estimation	CodeCode Available	2
TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning	Jun 16, 2025	Reinforcement Learning (RL)Time Series	CodeCode Available	2
LLM2Rec: Large Language Models Are Powerful Embedding Models for Sequential Recommendation	Jun 16, 2025	Collaborative FilteringSequential Recommendation	CodeCode Available	2
SuperPoint-SLAM3: Augmenting ORB-SLAM3 with Deep Features, Adaptive NMS, and Learning-Based Loop Closure	Jun 16, 2025	Simultaneous Localization and Mapping	CodeCode Available	2
A Comprehensive Survey on Continual Learning in Generative Models	Jun 16, 2025	Continual LearningSurvey	CodeCode Available	2
DETRPose: Real-time end-to-end transformer model for multi-person pose estimation	Jun 16, 2025	2D Pose EstimationDecoder	CodeCode Available	2
Focusing on Tracks for Online Multi-Object Tracking	Jun 15, 2025	global-optimizationMulti-Object Tracking	CodeCode Available	2
Humanity's Last Code Exam: Can Advanced LLMs Conquer Human's Hardest Code Competition?	Jun 15, 2025	Code Generation	CodeCode Available	2
Improving spliced alignment by modeling splice sites with deep learning	Jun 15, 2025		CodeCode Available	2
Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language Models	Jun 15, 2025	Reinforcement Learning (RL)	CodeCode Available	2
QFFT, Question-Free Fine-Tuning for Adaptive Reasoning	Jun 15, 2025		CodeCode Available	2
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes	Jun 13, 2025	Linear evaluationSelf-Supervised Learning	CodeCode Available	2
Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders	Jun 13, 2025	Speech Enhancement	CodeCode Available	2
CGVQM+D: Computer Graphics Video Quality Metric and Dataset	Jun 13, 2025	DenoisingNovel View Synthesis	CodeCode Available	2
SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks	Jun 13, 2025	BenchmarkingLarge Language Model	CodeCode Available	2
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search	Jun 13, 2025	Mathreinforcement-learning	CodeCode Available	2