The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 6126–6150 of 474278 papers

Title	Date	Tasks	Status	Hype
Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems	Feb 16, 2025	Open-Domain Question AnsweringQuestion Answering	CodeCode Available	2
NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM	Feb 16, 2025	NavigateRAG	CodeCode Available	2
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training	Feb 16, 2025		CodeCode Available	2
FinMTEB: Finance Massive Text Embedding Benchmark	Feb 16, 2025	ArticlesSemantic Textual Similarity	CodeCode Available	2
RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation	Feb 16, 2025	graph constructionKnowledge Graphs	CodeCode Available	2
Hierarchical Expert Prompt for Large-Language-Model: An Approach Defeat Elite AI in TextStarCraft II for the First Time	Feb 16, 2025	Decision MakingLanguage Modeling	CodeCode Available	2
MasRouter: Learning to Route LLMs for Multi-Agent Systems	Feb 16, 2025	HumanEvalmbpp	CodeCode Available	2
SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding	Feb 15, 2025	Question AnsweringStreaming video understanding	CodeCode Available	2
D-CIPHER: Dynamic Collaborative Intelligent Multi-Agent System with Planner and Heterogeneous Executors for Offensive Security	Feb 15, 2025	Task Planning	CodeCode Available	2
MonoForce: Learnable Image-conditioned Physics Engine	Feb 14, 2025	Model Predictive ControlTrajectory Prediction	CodeCode Available	2
A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations	Feb 14, 2025	Survey	CodeCode Available	2
Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning	Feb 14, 2025	Reinforcement Learning (RL)Skills Assessment	CodeCode Available	2
Process Reward Models for LLM Agents: Practical Framework and Directions	Feb 14, 2025		CodeCode Available	2
Compression-Aware One-Step Diffusion Model for JPEG Artifact Removal	Feb 14, 2025	DenoisingImage Restoration	CodeCode Available	2
CoSER: Coordinating LLM-Based Persona Simulation of Established Roles	Feb 13, 2025		CodeCode Available	2
KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAG	Feb 13, 2025	Knowledge GraphsLarge Language Model	CodeCode Available	2
Unlocking the Potential of Classic GNNs for Graph-level Tasks: Simple Architectures Meet Excellence	Feb 13, 2025	Graph ClassificationGraph Property Prediction	CodeCode Available	2
DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra	Feb 13, 2025	DecoderDe novo molecule generation from MS/MS spectrum (bonus chemical formulae)	CodeCode Available	2
Diffusion Models for Molecules: A Survey of Methods and Tasks	Feb 13, 2025	DiversityDrug Discovery	CodeCode Available	2
TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument	Feb 13, 2025	Audio GenerationDecoder	CodeCode Available	2
A Judge-free LLM Open-ended Generation Benchmark Based on the Distributional Hypothesis	Feb 13, 2025	Text Generation	CodeCode Available	2
Digi-Q: Learning Q-Value Functions for Training Device-Control Agents	Feb 13, 2025	Q-LearningReinforcement Learning (RL)	CodeCode Available	2
Harnessing Vision Models for Time Series Analysis: A Survey	Feb 13, 2025	SurveyTime Series	CodeCode Available	2
DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References	Feb 13, 2025	Human-Object Interaction DetectionImitation Learning	CodeCode Available	2
CoT-Valve: Length-Compressible Chain-of-Thought Tuning	Feb 13, 2025	GSM8K	CodeCode Available	2