SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 851900 of 659983 papers

TitleStatusHype
Enabling Auditory Large Language Models for Automatic Speech Quality EvaluationCode5
Underwater Camouflaged Object Tracking Meets Vision-Language SAM2Code5
Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec modelsCode5
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive DiffusionCode5
FuXi-2.0: Advancing machine learning weather forecasting model for practical applicationsCode5
SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoningCode5
MarS: a Financial Market Simulation Engine Powered by Generative Foundation ModelCode5
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world VideosCode5
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View SynthesisCode5
rerankers: A Lightweight Python Library to Unify Ranking MethodsCode5
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language ModelingCode5
OmniRe: Omni Urban Scene ReconstructionCode5
3D Reconstruction with Spatial MemoryCode5
Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model LearningCode5
Unleashing the Potential of SAM2 for Biomedical Images and Videos: A SurveyCode5
Show-o: One Single Transformer to Unify Multimodal Understanding and GenerationCode5
Jamba-1.5: Hybrid Transformer-Mamba Models at ScaleCode5
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language ModelsCode5
The Vizier Gaussian Process Bandit AlgorithmCode5
Multi-Agent Reinforcement Learning for Autonomous Driving: A SurveyCode5
Automated Design of Agentic SystemsCode5
RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented GenerationCode5
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMsCode5
ControlNeXt: Powerful and Efficient Control for Image and Video GenerationCode5
A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going?Code5
SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and MoreCode5
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model ParametersCode5
Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image PyramidCode5
Active Learning for Neural PDE SolversCode5
Penzai + Treescope: A Toolkit for Interpreting, Visualizing, and Editing Models As DataCode5
MuJoCo MPC for Humanoid Control: Evaluation on HumanoidBenchCode5
Segment Anything for Videos: A Systematic SurveyCode5
Tora: Trajectory-oriented Diffusion Transformer for Video GenerationCode5
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-BudgetCode5
CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion ModelsCode5
Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic SystemsCode5
IMAGDressing-v1: Customizable Virtual DressingCode5
VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification BenchmarkCode5
Semantic Operators: A Declarative Model for Rich, AI-based Data ProcessingCode5
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive RetrievalCode5
GRUtopia: Dream General Robots in a City at ScaleCode5
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank GradientsCode5
OffsetBias: Leveraging Debiased Data for Tuning EvaluatorsCode5
Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AICode5
TAPVid-3D: A Benchmark for Tracking Any Point in 3DCode5
Fast On-device LLM Inference with NPUsCode5
Structural Generalization in Autonomous Cyber Incident Response with Message-Passing Neural Networks and Reinforcement LearningCode5
Learning to (Learn at Test Time): RNNs with Expressive Hidden StatesCode5
BM25S: Orders of magnitude faster lexical search via eager sparse scoringCode5
Fake News Detection: It's All in the Data!Code5
Show:102550
← PrevPage 18 of 13200Next →