SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 61016125 of 474278 papers

TitleStatusHype
Rethinking Diverse Human Preference Learning through Principal Component AnalysisCode2
UXAgent: An LLM Agent-Based Usability Testing Framework for Web DesignCode2
H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models, Including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash ThinkingCode2
WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & DialectsCode2
CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image GenerationCode2
VUS: Effective and Efficient Accuracy Measures for Time-Series Anomaly DetectionCode2
Electron flow matching for generative reaction mechanism prediction obeying conservation lawsCode2
BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 LanguagesCode2
X-IL: Exploring the Design Space of Imitation Learning PoliciesCode2
Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory SharpeningCode2
Image Inversion: A Survey from GANs to Diffusion and BeyondCode2
SQL-o1: A Self-Reward Heuristic Dynamic Search Method for Text-to-SQLCode2
Idiosyncrasies in Large Language ModelsCode2
Diffusion Models without Classifier-free GuidanceCode2
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and GenerationCode2
SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMsCode2
JoLT: Joint Probabilistic Predictions on Tabular Data Using LLMsCode2
Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI CollaborationCode2
LLM Agents Making Agent ToolsCode2
PUGS: Zero-shot Physical Understanding with Gaussian SplattingCode2
A Survey of Personalized Large Language Models: Progress and Future DirectionsCode2
Continuous Diffusion Model for Language ModelingCode2
Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and AmendmentCode2
Without Paired Labeled Data: An End-to-End Self-Supervised Paradigm for UAV-View Geo-LocalizationCode2
Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters MoreCode2
Show:102550
← PrevPage 245 of 18972Next →