SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1067610700 of 177340 papers

TitleStatusHype
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMsCode2
Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative DecodingCode2
Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning GapCode2
DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D ReassemblyCode2
KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking TechniquesCode2
Scalable Spatiotemporal Prediction with Bayesian Neural FieldsCode2
BirdSet: A Large-Scale Dataset for Audio Classification in Avian BioacousticsCode2
View-decoupled Transformer for Person Re-identification under Aerial-ground Camera NetworkCode2
Volumetric Environment Representation for Vision-Language NavigationCode2
CoverUp: Effective High Coverage Test Generation for PythonCode2
MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image FusionCode2
FreqMamba: Viewing Mamba from a Frequency Perspective for Image DerainingCode2
Drones Help Drones: A Collaborative Framework for Multi-Drone Object Trajectory Prediction and BeyondCode2
Residual-Conditioned Optimal Transport: Towards Structure-Preserving Unpaired and Paired Image RestorationCode2
Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?Code2
Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal SlicesCode2
OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in CodeCode2
TransVIP: Speech to Speech Translation System with Voice and Isochrony PreservationCode2
UnifiedQA-v2: Stronger Generalization via Broader Cross-Format TrainingCode2
Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space ModelCode2
A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech TranslationCode2
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language ModelsCode2
VoxAct-B: Voxel-Based Acting and Stabilizing Policy for Bimanual ManipulationCode2
HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible GuidanceCode2
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language ModelsCode2
Show:102550
← PrevPage 428 of 7094Next →