SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 32513275 of 661570 papers

TitleStatusHype
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2Code3
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use CapabilitiesCode3
1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality DataCode3
Compact 3D Gaussian Splatting for Static and Dynamic Radiance FieldsCode3
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for MedicineCode3
Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight DetectionCode3
Data Poisoning in LLMs: Jailbreak-Tuning and Scaling LawsCode3
Zero-Shot Surgical Tool Segmentation in Monocular Video Using Segment Anything Model 2Code3
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language ModelsCode3
RAGEval: Scenario Specific RAG Evaluation Dataset Generation FrameworkCode3
Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character NamesCode3
UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified ModelCode3
multiGradICON: A Foundation Model for Multimodal Medical Image RegistrationCode3
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated CapabilitiesCode3
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous DrivingCode3
Comgra: A Tool for Analyzing and Debugging Neural NetworksCode3
Beat this! Accurate beat tracking without DBN postprocessingCode3
Large Language Monkeys: Scaling Inference Compute with Repeated SamplingCode3
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language ModelsCode3
Hyper-parameter tuning for text guided image editingCode3
ReLiK: Retrieve and LinK, Fast and Accurate Entity Linking and Relation Extraction on an Academic BudgetCode3
Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object DetectionCode3
Theia: Distilling Diverse Vision Foundation Models for Robot LearningCode3
RelBench: A Benchmark for Deep Learning on Relational DatabasesCode3
rLLM: Relational Table Learning with LLMsCode3
Show:102550
← PrevPage 131 of 26463Next →