The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5051–5100 of 661570 papers

Title	Date	Tasks	Status	Hype
Learning to See in the Extremely Dark	Jun 26, 2025	DenoisingExposure Correction	CodeCode Available	2
WAFT: Warping-Alone Field Transforms for Optical Flow	Jun 26, 2025	Optical Flow EstimationZero-shot Generalization	CodeCode Available	2
Stochastic Parameter Decomposition	Jun 25, 2025		CodeCode Available	2
Language Modeling by Language Models	Jun 25, 2025	Code GenerationLanguage Modeling	CodeCode Available	2
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling	Jun 25, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
Video Compression for Spatiotemporal Earth System Data	Jun 24, 2025	Earth ObservationVideo Compression	CodeCode Available	2
An ab initio foundation model of wavefunctions that accurately describes chemical bond breaking	Jun 24, 2025		CodeCode Available	2
ConStellaration: A dataset of QI-like stellarator plasma boundaries and optimization benchmarks	Jun 24, 2025		CodeCode Available	2
MegaFold: System-Level Optimizations for Accelerating Protein Structure Prediction Models	Jun 24, 2025	GPUProtein Folding	CodeCode Available	2
PocketVina Enables Scalable and Highly Accurate Physically Valid Docking through Multi-Pocket Conditioning	Jun 24, 2025	BenchmarkingDrug Discovery	CodeCode Available	2
AnalogNAS-Bench: A NAS Benchmark for Analog In-Memory Computing	Jun 23, 2025	Neural Architecture SearchQuantization	CodeCode Available	2
Thought Anchors: Which LLM Reasoning Steps Matter?	Jun 23, 2025	counterfactualSentence	CodeCode Available	2
Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning	Jun 23, 2025	GPULarge Language Model	CodeCode Available	2
Pre-Trained LLM is a Semantic-Aware and Generalizable Segmentation Booster	Jun 22, 2025	DecoderImage Segmentation	CodeCode Available	2
Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities	Jun 22, 2025	Reinforcement Learning (RL)	CodeCode Available	2
TAB: Unified Benchmarking of Time Series Anomaly Detection Methods	Jun 22, 2025	Anomaly DetectionBenchmarking	CodeCode Available	2
From Tiny Machine Learning to Tiny Deep Learning: A Survey	Jun 21, 2025	AutoMLModel Optimization	CodeCode Available	2
Consistent Sampling and Simulation: Molecular Dynamics with Energy-Based Diffusion Models	Jun 20, 2025		CodeCode Available	2
Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation	Jun 20, 2025	Scene Generation	CodeCode Available	2
MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents	Jun 20, 2025	Diversity	CodeCode Available	2
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching	Jun 20, 2025	SchedulingSpeech Synthesis	CodeCode Available	2
RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and Tracking	Jun 20, 2025	6D Pose EstimationObject	CodeCode Available	2
Watermarking Autoregressive Image Generation	Jun 19, 2025	Image GenerationLanguage Modeling	CodeCode Available	2
DiscoSG: Towards Discourse-Level Text Scene Graph Parsing through Iterative Graph Refinement	Jun 18, 2025	Graph GenerationHallucination	CodeCode Available	2
Descriptor-based Foundation Models for Molecular Property Prediction	Jun 18, 2025	Molecular Property PredictionPrediction	CodeCode Available	2
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models	Jun 18, 2025	Audio captioningLarge Language Model	CodeCode Available	2
HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges	Jun 18, 2025	Combinatorial Optimization	CodeCode Available	2
SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning	Jun 18, 2025	Caption GenerationDescriptive	CodeCode Available	2
cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree	Jun 18, 2025	ChunkingCode Generation	CodeCode Available	2
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs	Jun 17, 2025		CodeCode Available	2
Comprehensive Verilog Design Problems: A Next-Generation Benchmark Dataset for Evaluating Large Language Models and Agents on RTL Design and Verification	Jun 17, 2025	Code Generation	CodeCode Available	2
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents	Jun 17, 2025		CodeCode Available	2
Essential-Web v1.0: 24T tokens of organized web data	Jun 17, 2025	Math	CodeCode Available	2
BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation Models	Jun 17, 2025	BenchmarkingLanguage Modeling	CodeCode Available	2
TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning	Jun 16, 2025	Reinforcement Learning (RL)Time Series	CodeCode Available	2
DETRPose: Real-time end-to-end transformer model for multi-person pose estimation	Jun 16, 2025	2D Pose EstimationDecoder	CodeCode Available	2
A Comprehensive Survey on Continual Learning in Generative Models	Jun 16, 2025	Continual LearningSurvey	CodeCode Available	2
SuperPoint-SLAM3: Augmenting ORB-SLAM3 with Deep Features, Adaptive NMS, and Learning-Based Loop Closure	Jun 16, 2025	Simultaneous Localization and Mapping	CodeCode Available	2
LLM2Rec: Large Language Models Are Powerful Embedding Models for Sequential Recommendation	Jun 16, 2025	Collaborative FilteringSequential Recommendation	CodeCode Available	2
Test3R: Learning to Reconstruct 3D at Test Time	Jun 16, 2025	3D ReconstructionDepth Estimation	CodeCode Available	2
Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language Models	Jun 15, 2025	Reinforcement Learning (RL)	CodeCode Available	2
Focusing on Tracks for Online Multi-Object Tracking	Jun 15, 2025	global-optimizationMulti-Object Tracking	CodeCode Available	2
Humanity's Last Code Exam: Can Advanced LLMs Conquer Human's Hardest Code Competition?	Jun 15, 2025	Code Generation	CodeCode Available	2
Improving spliced alignment by modeling splice sites with deep learning	Jun 15, 2025		CodeCode Available	2
QFFT, Question-Free Fine-Tuning for Adaptive Reasoning	Jun 15, 2025		CodeCode Available	2
Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis	Jun 13, 2025	Autonomous DrivingAutonomous Vehicles	CodeCode Available	2
BraTS orchestrator : Democratizing and Disseminating state-of-the-art brain tumor image analysis	Jun 13, 2025	Brain Tumor SegmentationTumor Segmentation	CodeCode Available	2
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search	Jun 13, 2025	Mathreinforcement-learning	CodeCode Available	2
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes	Jun 13, 2025	Linear evaluationSelf-Supervised Learning	CodeCode Available	2
SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks	Jun 13, 2025	BenchmarkingLarge Language Model	CodeCode Available	2