SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 40264050 of 661570 papers

TitleStatusHype
MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible PipelineCode3
RoHM: Robust Human Motion Reconstruction via DiffusionCode3
AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics PerceptionCode3
Small LLMs Are Weak Tool Learners: A Multi-LLM AgentCode3
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMsCode3
INTERS: Unlocking the Power of Large Language Models in Search with Instruction TuningCode3
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMsCode3
GroundingGPT:Language Enhanced Multi-modal Grounding ModelCode3
Deep learning in motion deblurring: current status, benchmarks and future prospectsCode3
AutoAct: Automatic Agent Learning from Scratch for QA via Self-PlanningCode3
Evaluating Language Model Agency through NegotiationsCode3
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language ModelsCode3
RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust AdaptationCode3
MoE-Mamba: Efficient Selective State Space Models with Mixture of ExpertsCode3
Universal Time-Series Representation Learning: A SurveyCode3
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D GenerationCode3
Improved motif-scaffolding with SE(3) flow matchingCode3
EAT: Self-Supervised Pre-Training with Efficient Audio TransformerCode3
DiarizationLM: Speaker Diarization Post-Processing with Large Language ModelsCode3
Denoising Vision TransformersCode3
Pheme: Efficient and Conversational Speech GenerationCode3
The Rise of Diffusion Models in Time-Series ForecastingCode3
Text2MDT: Extracting Medical Decision Trees from Medical TextsCode3
DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge DetectionCode3
Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN TicketCode3
Show:102550
← PrevPage 162 of 26463Next →