SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1000110025 of 474278 papers

TitleStatusHype
Verif.ai: Towards an Open-Source Scientific Generative Question-Answering System with Referenced and Verifiable AnswersCode2
Debating with More Persuasive LLMs Leads to More Truthful AnswersCode2
Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction FollowingCode2
CLIPZyme: Reaction-Conditioned Virtual Screening of EnzymesCode2
On the Efficacy of Eviction Policy for Key-Value Constrained Generative Language Model InferenceCode2
DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion TransformerCode2
Sandwiched Compression: Repurposing Standard Codecs with Neural Network WrappersCode2
How Well Can LLMs Negotiate? NegotiationArena Platform and AnalysisCode2
Scalable Diffusion Models with State Space BackboneCode2
Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion ModelsCode2
Accurate LoRA-Finetuning Quantization of LLMs via Information RetentionCode2
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular FusionCode2
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement LearningCode2
PLAPT: Protein-Ligand Binding Affinity Prediction Using Pretrained TransformersCode2
Let Your Graph Do the Talking: Encoding Structured Data for LLMsCode2
Time Series Diffusion in the Frequency DomainCode2
Paralinguistics-Aware Speech-Empowered Large Language Models for Natural ConversationCode2
Dirichlet Flow Matching with Applications to DNA Sequence DesignCode2
Learning to Route Among Specialized Experts for Zero-Shot GeneralizationCode2
Mamba-ND: Selective State Space Modeling for Multi-Dimensional DataCode2
JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMsCode2
InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph PriorCode2
Closing the Gap Between SGP4 and High-Precision Propagation via Differentiable ProgrammingCode2
A Comprehensive Survey of Cross-Domain Policy Transfer for Embodied AgentsCode2
Can Large Language Model Agents Simulate Human Trust Behavior?Code2
Show:102550
← PrevPage 401 of 18972Next →