SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 31513175 of 177340 papers

TitleStatusHype
EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and DereverberationCode3
Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text IntegrationCode3
AgentTuning: Enabling Generalized Agent Abilities for LLMsCode3
Hawk: Learning to Understand Open-World Video AnomaliesCode3
PhoWhisper: Automatic Speech Recognition for VietnameseCode3
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMsCode3
How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with Segment Anything ModelCode3
Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement LearningCode3
Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object DetectionCode3
Exploring the Performance Improvement of Tensor Processing Engines through Transformation in the Bit-weight Dimension of MACsCode3
DRCT: Saving Image Super-resolution away from Information BottleneckCode3
TopoX: A Suite of Python Packages for Machine Learning on Topological DomainsCode3
OSUM: Advancing Open Speech Understanding Models with Limited Resources in AcademiaCode3
Emu3: Next-Token Prediction is All You NeedCode3
Multi-SWE-bench: A Multilingual Benchmark for Issue ResolvingCode3
MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware DiffusionCode3
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop QueriesCode3
NerfAcc: A General NeRF Acceleration ToolboxCode3
Llemma: An Open Language Model For MathematicsCode3
Datasets: A Community Library for Natural Language ProcessingCode3
Tri-Perspective View for Vision-Based 3D Semantic Occupancy PredictionCode3
ResNeSt: Split-Attention NetworksCode3
MedSegDiff-V2: Diffusion based Medical Image Segmentation with TransformerCode3
IEPile: Unearthing Large-Scale Schema-Based Information Extraction CorpusCode3
StableToolBench-MirrorAPI: Modeling Tool Environments as Mirrors of 7,000+ Real-World APIsCode3
Show:102550
← PrevPage 127 of 7094Next →