SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 751775 of 177339 papers

TitleStatusHype
Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic SystemsCode5
LLM.int8(): 8-bit Matrix Multiplication for Transformers at ScaleCode5
SpeechGPT-Gen: Scaling Chain-of-Information Speech GenerationCode5
MMBench: Is Your Multi-modal Model an All-around Player?Code5
TAPVid-3D: A Benchmark for Tracking Any Point in 3DCode5
Retrieval-Augmented Generation for AI-Generated Content: A SurveyCode5
Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec modelsCode5
Improved Distribution Matching Distillation for Fast Image SynthesisCode5
Large Language Model based Multi-Agents: A Survey of Progress and ChallengesCode5
Autoregressive Model Beats Diffusion: Llama for Scalable Image GenerationCode5
Mora: Enabling Generalist Video Generation via A Multi-Agent FrameworkCode5
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge AdaptationCode5
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language ModelsCode5
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language TasksCode5
Diffusion for World Modeling: Visual Details Matter in AtariCode5
Flashlight: Enabling Innovation in Tools for Machine LearningCode5
Astraios: Parameter-Efficient Instruction Tuning Code Large Language ModelsCode5
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You ThinkCode5
BootsTAP: Bootstrapped Training for Tracking-Any-PointCode5
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and DatasetCode5
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual InversionCode5
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video GenerationCode5
OffsetBias: Leveraging Debiased Data for Tuning EvaluatorsCode5
Meta-World+: An Improved, Standardized, RL BenchmarkCode5
MONAI: An open-source framework for deep learning in healthcareCode5
Show:102550
← PrevPage 31 of 7094Next →