The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5251–5300 of 661570 papers

Title	Date	Tasks	Status	Hype
Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models	May 27, 2025	Concept Alignmentobject-detection	CodeCode Available	2
SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation	May 27, 2025	Object TrackingSegmentation	CodeCode Available	2
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment	May 27, 2025	Adversarial AttackClustering	CodeCode Available	2
TimePro: Efficient Multivariate Long-term Time Series Forecasting with Variable- and Time-Aware Hyper-state	May 27, 2025	MambaTime Series	CodeCode Available	2
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing	May 27, 2025	Math	CodeCode Available	2
Improved Representation Steering for Language Models	May 27, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
SPA-RL: Reinforcing LLM Agents via Stepwise Progress Attribution	May 27, 2025	Reinforcement Learning (RL)	CodeCode Available	2
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?	May 27, 2025	Multimodal Reasoning	CodeCode Available	2
DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue	May 26, 2025	DiagnosticQuestion Answering	CodeCode Available	2
WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference	May 26, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
One-shot Entropy Minimization	May 26, 2025	reinforcement-learningReinforcement Learning	CodeCode Available	2
Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration	May 26, 2025	Domain GeneralizationHallucination	CodeCode Available	2
Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression	May 26, 2025	Zero-shot Generalization	CodeCode Available	2
WeatherEdit: Controllable Weather Editing with 4D Gaussian Field	May 26, 2025	3D Generation3DGS	CodeCode Available	2
EmoSphere-SER: Enhancing Speech Emotion Recognition Through Spherical Representation with Auxiliary Classification	May 26, 2025	Emotion Recognitionregression	CodeCode Available	2
Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning	May 26, 2025	Decision MakingHierarchical Reinforcement Learning	CodeCode Available	2
AniCrafter: Customizing Realistic Human-Centric Animation via Avatar-Background Conditioning in Video Diffusion Models	May 26, 2025		CodeCode Available	2
The UD-NewsCrawl Treebank: Reflections and Challenges from a Large-scale Tagalog Syntactic Annotation Project	May 26, 2025		CodeCode Available	2
A Lightweight Hybrid Dual Channel Speech Enhancement System under Low-SNR Conditions	May 26, 2025	Speech Enhancement	CodeCode Available	2
SAEs Are Good for Steering -- If You Select the Right Features	May 26, 2025		CodeCode Available	2
CSTrack: Enhancing RGB-X Tracking via Compact Spatiotemporal Features	May 26, 2025		CodeCode Available	2
Chain-of-Thought for Autonomous Driving: A Comprehensive Survey and Future Prospects	May 26, 2025	Autonomous DrivingLogical Reasoning	CodeCode Available	2
Training-Free Multi-Step Audio Source Separation	May 26, 2025	Audio Source SeparationDenoising	CodeCode Available	2
FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching	May 26, 2025	QuantizationSpeech Enhancement	CodeCode Available	2
MASKSEARCH: A Universal Pre-Training Framework to Enhance Agentic Search Capability	May 26, 2025	Multi-hop Question AnsweringQuestion Answering	CodeCode Available	2
DiSA: Diffusion Step Annealing in Autoregressive Image Generation	May 26, 2025	DenoisingImage Generation	CodeCode Available	2
Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities	May 26, 2025	Knowledge GraphsNatural Language Understanding	CodeCode Available	2
MAS-Zero: Designing Multi-Agent Systems with Zero Supervision	May 26, 2025	MathProblem Decomposition	CodeCode Available	2
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond	May 26, 2025	Logical ReasoningReinforcement Learning (RL)	CodeCode Available	2
Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment	May 26, 2025	text-to-speechText to Speech	CodeCode Available	2
The Missing Point in Vision Transformers for Universal Image Segmentation	May 26, 2025	Image SegmentationInstance Segmentation	CodeCode Available	2
MFA-KWS: Effective Keyword Spotting with Multi-head Frame-asynchronous Decoding	May 26, 2025	Keyword Spotting	CodeCode Available	2
Jodi: Unification of Visual Generation and Understanding via Joint Modeling	May 25, 2025		CodeCode Available	2
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems	May 25, 2025		CodeCode Available	2
I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts	May 25, 2025	Mixture-of-Expertsmultimodal interaction	CodeCode Available	2
Benchmarking Laparoscopic Surgical Image Restoration and Beyond	May 25, 2025	BenchmarkingImage Restoration	CodeCode Available	2
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use	May 25, 2025	Multimodal ReasoningQuestion Answering	CodeCode Available	2
VPGS-SLAM: Voxel-based Progressive 3D Gaussian SLAM in Large-Scale Scenes	May 25, 2025	3DGS	CodeCode Available	2
Shifting AI Efficiency From Model-Centric to Data-Centric Compression	May 25, 2025	Position	CodeCode Available	2
Improved Immiscible Diffusion: Accelerate Diffusion Training by Reducing Its Miscibility	May 24, 2025	Denoising	CodeCode Available	2
LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOS	May 24, 2025		CodeCode Available	2
Using Large Language Models to Tackle Fundamental Challenges in Graph Learning: A Comprehensive Survey	May 24, 2025	Graph Learning	CodeCode Available	2
CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions	May 24, 2025	Benchmarking	CodeCode Available	2
Spiking Transformers Need High Frequency Information	May 24, 2025	Avg	CodeCode Available	2
Geometry Aware Operator Transformer as an Efficient and Accurate Neural Surrogate for PDEs on Arbitrary Domains	May 24, 2025	Computational EfficiencyOperator learning	CodeCode Available	2
VeriThinker: Learning to Verify Makes Reasoning Model Efficient	May 23, 2025	model	CodeCode Available	2
Managing FAIR Knowledge Graphs as Polyglot Data End Points: A Benchmark based on the rdf2pg Framework and Plant Biology Data	May 23, 2025	Knowledge GraphsManagement	CodeCode Available	2
MetaBox-v2: A Unified Benchmark Platform for Meta-Black-Box Optimization	May 23, 2025	Meta-Learning	CodeCode Available	2
ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback	May 23, 2025		CodeCode Available	2
DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding	May 23, 2025	Language ModelingLanguage Modelling	CodeCode Available	2