The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 6101–6125 of 474278 papers

Title	Date	Tasks	Status	Hype
Rethinking Diverse Human Preference Learning through Principal Component Analysis	Feb 18, 2025		CodeCode Available	2
UXAgent: An LLM Agent-Based Usability Testing Framework for Web Design	Feb 18, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models, Including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking	Feb 18, 2025		CodeCode Available	2
WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects	Feb 18, 2025	Machine Translation	CodeCode Available	2
CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation	Feb 18, 2025	Image GenerationText to Image Generation	CodeCode Available	2
VUS: Effective and Efficient Accuracy Measures for Time-Series Anomaly Detection	Feb 18, 2025	Anomaly DetectionInformation Retrieval	CodeCode Available	2
Electron flow matching for generative reaction mechanism prediction obeying conservation laws	Feb 18, 2025	Prediction	CodeCode Available	2
BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages	Feb 17, 2025	Emotion Recognition	CodeCode Available	2
X-IL: Exploring the Design Space of Imitation Learning Policies	Feb 17, 2025	Imitation LearningMamba	CodeCode Available	2
Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening	Feb 17, 2025	Denoising	CodeCode Available	2
Image Inversion: A Survey from GANs to Diffusion and Beyond	Feb 17, 2025	Generative Adversarial NetworkStyle Transfer	CodeCode Available	2
SQL-o1: A Self-Reward Heuristic Dynamic Search Method for Text-to-SQL	Feb 17, 2025	Few-Shot LearningHeuristic Search	CodeCode Available	2
Idiosyncrasies in Large Language Models	Feb 17, 2025		CodeCode Available	2
Diffusion Models without Classifier-free Guidance	Feb 17, 2025	Conditional Image GenerationImage Generation	CodeCode Available	2
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation	Feb 17, 2025		CodeCode Available	2
SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs	Feb 17, 2025	parameter-efficient fine-tuning	CodeCode Available	2
JoLT: Joint Probabilistic Predictions on Tabular Data Using LLMs	Feb 17, 2025	ImputationIn-Context Learning	CodeCode Available	2
Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration	Feb 17, 2025		CodeCode Available	2
LLM Agents Making Agent Tools	Feb 17, 2025		CodeCode Available	2
PUGS: Zero-shot Physical Understanding with Gaussian Splatting	Feb 17, 2025	Friction	CodeCode Available	2
A Survey of Personalized Large Language Models: Progress and Future Directions	Feb 17, 2025	Emotion RecognitionGeneral Knowledge	CodeCode Available	2
Continuous Diffusion Model for Language Modeling	Feb 17, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment	Feb 17, 2025	HallucinationLogical Reasoning	CodeCode Available	2
Without Paired Labeled Data: An End-to-End Self-Supervised Paradigm for UAV-View Geo-Localization	Feb 17, 2025	Computational EfficiencyContrastive Learning	CodeCode Available	2
Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More	Feb 17, 2025		CodeCode Available	2