The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 6101–6150 of 661570 papers

Title	Date	Tasks	Status	Hype
Rethinking Diverse Human Preference Learning through Principal Component Analysis	Feb 18, 2025		CodeCode Available	2
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning	Feb 18, 2025	Math	CodeCode Available	2
H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models, Including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking	Feb 18, 2025		CodeCode Available	2
Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors	Feb 18, 2025	Code GenerationKnowledge Tracing	CodeCode Available	2
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization	Feb 18, 2025	Image RetrievalQuestion Answering	CodeCode Available	2
WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects	Feb 18, 2025	Machine Translation	CodeCode Available	2
UXAgent: An LLM Agent-Based Usability Testing Framework for Web Design	Feb 18, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
A Survey of Personalized Large Language Models: Progress and Future Directions	Feb 17, 2025	Emotion RecognitionGeneral Knowledge	CodeCode Available	2
SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs	Feb 17, 2025	parameter-efficient fine-tuning	CodeCode Available	2
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation	Feb 17, 2025		CodeCode Available	2
Continuous Diffusion Model for Language Modeling	Feb 17, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
PUGS: Zero-shot Physical Understanding with Gaussian Splatting	Feb 17, 2025	Friction	CodeCode Available	2
SQL-o1: A Self-Reward Heuristic Dynamic Search Method for Text-to-SQL	Feb 17, 2025	Few-Shot LearningHeuristic Search	CodeCode Available	2
BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages	Feb 17, 2025	Emotion Recognition	CodeCode Available	2
JoLT: Joint Probabilistic Predictions on Tabular Data Using LLMs	Feb 17, 2025	ImputationIn-Context Learning	CodeCode Available	2
Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration	Feb 17, 2025		CodeCode Available	2
Without Paired Labeled Data: An End-to-End Self-Supervised Paradigm for UAV-View Geo-Localization	Feb 17, 2025	Computational EfficiencyContrastive Learning	CodeCode Available	2
Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More	Feb 17, 2025		CodeCode Available	2
Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment	Feb 17, 2025	HallucinationLogical Reasoning	CodeCode Available	2
Idiosyncrasies in Large Language Models	Feb 17, 2025		CodeCode Available	2
Diffusion Models without Classifier-free Guidance	Feb 17, 2025	Conditional Image GenerationImage Generation	CodeCode Available	2
LLM Agents Making Agent Tools	Feb 17, 2025		CodeCode Available	2
X-IL: Exploring the Design Space of Imitation Learning Policies	Feb 17, 2025	Imitation LearningMamba	CodeCode Available	2
Image Inversion: A Survey from GANs to Diffusion and Beyond	Feb 17, 2025	Generative Adversarial NetworkStyle Transfer	CodeCode Available	2
Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening	Feb 17, 2025	Denoising	CodeCode Available	2
Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems	Feb 16, 2025	Open-Domain Question AnsweringQuestion Answering	CodeCode Available	2
FinMTEB: Finance Massive Text Embedding Benchmark	Feb 16, 2025	ArticlesSemantic Textual Similarity	CodeCode Available	2
NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM	Feb 16, 2025	NavigateRAG	CodeCode Available	2
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training	Feb 16, 2025		CodeCode Available	2
Hierarchical Expert Prompt for Large-Language-Model: An Approach Defeat Elite AI in TextStarCraft II for the First Time	Feb 16, 2025	Decision MakingLanguage Modeling	CodeCode Available	2
MasRouter: Learning to Route LLMs for Multi-Agent Systems	Feb 16, 2025	HumanEvalmbpp	CodeCode Available	2
RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation	Feb 16, 2025	graph constructionKnowledge Graphs	CodeCode Available	2
D-CIPHER: Dynamic Collaborative Intelligent Multi-Agent System with Planner and Heterogeneous Executors for Offensive Security	Feb 15, 2025	Task Planning	CodeCode Available	2
SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding	Feb 15, 2025	Question AnsweringStreaming video understanding	CodeCode Available	2
Process Reward Models for LLM Agents: Practical Framework and Directions	Feb 14, 2025		CodeCode Available	2
A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations	Feb 14, 2025	Survey	CodeCode Available	2
MonoForce: Learnable Image-conditioned Physics Engine	Feb 14, 2025	Model Predictive ControlTrajectory Prediction	CodeCode Available	2
Compression-Aware One-Step Diffusion Model for JPEG Artifact Removal	Feb 14, 2025	DenoisingImage Restoration	CodeCode Available	2
Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning	Feb 14, 2025	Reinforcement Learning (RL)Skills Assessment	CodeCode Available	2
DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References	Feb 13, 2025	Human-Object Interaction DetectionImitation Learning	CodeCode Available	2
CoSER: Coordinating LLM-Based Persona Simulation of Established Roles	Feb 13, 2025		CodeCode Available	2
DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra	Feb 13, 2025	DecoderDe novo molecule generation from MS/MS spectrum (bonus chemical formulae)	CodeCode Available	2
Digi-Q: Learning Q-Value Functions for Training Device-Control Agents	Feb 13, 2025	Q-LearningReinforcement Learning (RL)	CodeCode Available	2
Diffusion Models for Molecules: A Survey of Methods and Tasks	Feb 13, 2025	DiversityDrug Discovery	CodeCode Available	2
A Judge-free LLM Open-ended Generation Benchmark Based on the Distributional Hypothesis	Feb 13, 2025	Text Generation	CodeCode Available	2
CoT-Valve: Length-Compressible Chain-of-Thought Tuning	Feb 13, 2025	GSM8K	CodeCode Available	2
Harnessing Vision Models for Time Series Analysis: A Survey	Feb 13, 2025	SurveyTime Series	CodeCode Available	2
KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAG	Feb 13, 2025	Knowledge GraphsLarge Language Model	CodeCode Available	2
TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument	Feb 13, 2025	Audio GenerationDecoder	CodeCode Available	2
Unlocking the Potential of Classic GNNs for Graph-level Tasks: Simple Architectures Meet Excellence	Feb 13, 2025	Graph ClassificationGraph Property Prediction	CodeCode Available	2