SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 80018025 of 177340 papers

TitleStatusHype
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMsCode2
Evaluating Quantized Large Language ModelsCode2
Edu-ConvoKit: An Open-Source Library for Education Conversation DataCode2
Calibrated Self-Rewarding Vision Language ModelsCode2
PERT: Pre-training BERT with Permuted Language ModelCode2
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech EnhancementCode2
Training Diffusion Models with Reinforcement LearningCode2
GoLLIE: Annotation Guidelines improve Zero-Shot Information-ExtractionCode2
All in One: Exploring Unified Video-Language Pre-trainingCode2
A Survey on Multimodal Large Language Models for Autonomous DrivingCode2
Towards A Unified Conformer Structure: from ASR to ASV TaskCode2
DocPrompting: Generating Code by Retrieving the DocsCode2
AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic SegmentationCode2
Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM AgentsCode2
Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical PerspectivesCode2
Unsupervised Representation Learning from Pre-trained Diffusion Probabilistic ModelsCode2
TGL: A General Framework for Temporal GNN Training on Billion-Scale GraphsCode2
Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor ProgramsCode2
Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case StudyCode2
REEF: Representation Encoding Fingerprints for Large Language ModelsCode2
Modeling the Label Distributions for Weakly-Supervised Semantic SegmentationCode2
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion ModelCode2
Large language models surpass human experts in predicting neuroscience resultsCode2
Owl-1: Omni World Model for Consistent Long Video GenerationCode2
Diving Deeper Into Pedestrian Behavior Understanding: Intention Estimation, Action Prediction, and Event Risk AssessmentCode2
Show:102550
← PrevPage 321 of 7094Next →