The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 6176–6200 of 474278 papers

Title	Date	Tasks	Status	Hype
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning	Feb 10, 2025	MathMathematical Reasoning	CodeCode Available	2
TimeKAN: KAN-based Frequency Decomposition Learning Architecture for Long-term Time Series Forecasting	Feb 10, 2025	Representation LearningTime Series	CodeCode Available	2
Skill Expansion and Composition in Parameter Space	Feb 9, 2025	D4RL	CodeCode Available	2
Saving 77% of the Parameters in Large Language Models Technical Report	Feb 9, 2025	GPUText Generation	CodeCode Available	2
3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly	Feb 9, 2025	Anomaly DetectionUnsupervised Anomaly Detection	CodeCode Available	2
Knowledge Graph-Guided Retrieval Augmented Generation	Feb 8, 2025	DiversityHallucination	CodeCode Available	2
Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A Survey	Feb 8, 2025	FairnessRAG	CodeCode Available	2
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging	Feb 8, 2025	Code GenerationHumanEval	CodeCode Available	2
Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark	Feb 8, 2025	Knowledge DistillationObject Tracking	CodeCode Available	2
Differentially Private Synthetic Data via APIs 3: Using Simulators Instead of Foundation Model	Feb 8, 2025	Image Generation	CodeCode Available	2
GaussRender: Learning 3D Occupancy with Gaussian Rendering	Feb 7, 2025	3D geometryAutonomous Vehicles	CodeCode Available	2
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations	Feb 7, 2025	GPUQuantization	CodeCode Available	2
Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures	Feb 7, 2025	Mathematical Problem-Solvingreinforcement-learning	CodeCode Available	2
SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning	Feb 7, 2025		CodeCode Available	2
GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?	Feb 7, 2025	8kInformation Retrieval	CodeCode Available	2
MHAF-YOLO: Multi-Branch Heterogeneous Auxiliary Fusion YOLO for accurate object detection	Feb 7, 2025	object-detectionObject Detection	CodeCode Available	2
NoLiMa: Long-Context Evaluation Beyond Literal Matching	Feb 7, 2025		CodeCode Available	2
ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization	Feb 6, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
Training Language Models to Reason Efficiently	Feb 6, 2025	Reinforcement Learning (RL)	CodeCode Available	2
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion	Feb 6, 2025	image-classificationImage Classification	CodeCode Available	2
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models	Feb 6, 2025		CodeCode Available	2
SoK: Benchmarking Poisoning Attacks and Defenses in Federated Learning	Feb 6, 2025	BenchmarkingData Poisoning	CodeCode Available	2
WaferLLM: Large Language Model Inference at Wafer Scale	Feb 6, 2025	GPULanguage Modeling	CodeCode Available	2
CTR-Driven Advertising Image Generation with Multimodal Large Language Models	Feb 5, 2025	Image GenerationReinforcement Learning (RL)	CodeCode Available	2
Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation	Feb 5, 2025	BenchmarkingLarge Language Model	CodeCode Available	2