SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 28762900 of 661570 papers

TitleStatusHype
Curie: Toward Rigorous and Automated Scientific Experimentation with AI AgentsCode3
MEMORYLLM: Towards Self-Updatable Large Language ModelsCode3
BatchTopK Sparse AutoencodersCode3
On the Efficiency of NLP-Inspired Methods for Tabular Deep LearningCode3
Large Language Models Are Human-Level Prompt EngineersCode3
Zero-Shot Text-to-Image GenerationCode3
ShapeLLM: Universal 3D Object Understanding for Embodied InteractionCode3
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical ReasoningCode3
LLaRA: Supercharging Robot Learning Data for Vision-Language PolicyCode3
The Unreasonable Effectiveness of Deep Features as a Perceptual MetricCode3
Cross-Modal Causal Intervention for Medical Report GenerationCode3
Evaluating Large Language Models for Radiology Natural Language ProcessingCode3
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and RefinementCode3
Neuron-Level Sequential Editing for Large Language ModelsCode3
The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research IdeasCode3
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language ModelCode3
SALMONN: Towards Generic Hearing Abilities for Large Language ModelsCode3
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion ModelCode3
PipeOffload: Improving Scalability of Pipeline Parallelism with Memory OptimizationCode3
OVLW-DETR: Open-Vocabulary Light-Weighted Detection TransformerCode3
Observation-Centric SORT: Rethinking SORT for Robust Multi-Object TrackingCode3
TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous DrivingCode3
EfficientVMamba: Atrous Selective Scan for Light Weight Visual MambaCode3
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use CapabilitiesCode3
Accelerating Diffusion Transformers with Dual Feature CachingCode3
Show:102550
← PrevPage 116 of 26463Next →