SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 39263950 of 177340 papers

TitleStatusHype
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and TranslationCode3
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI SystemsCode3
AutoScraper: A Progressive Understanding Web Agent for Web Scraper GenerationCode3
Anatomy-informed Data Augmentation for Enhanced Prostate Cancer DetectionCode3
Improving Model Evaluation using SMART Filtering of Benchmark DatasetsCode3
ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU MemoryCode3
A new face swap method for image and video domains: a technical reportCode3
MooER: LLM-based Speech Recognition and Translation Models from Moore ThreadsCode3
Reinforcement Learning Enhanced LLMs: A SurveyCode3
PF3plat: Pose-Free Feed-Forward 3D Gaussian SplattingCode3
RT-DETRv3: Real-time End-to-End Object Detection with Hierarchical Dense Positive SupervisionCode3
AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics PerceptionCode3
One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single PromptCode3
An Imitative Reinforcement Learning Framework for Autonomous DogfightCode3
FusionBench: A Comprehensive Benchmark of Deep Model FusionCode3
FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language ModelsCode3
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought ReasoningCode3
From Panels to Prose: Generating Literary Narratives from ComicsCode3
TorchCP: A Python Library for Conformal PredictionCode3
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language ProcessingCode3
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRsCode3
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary ResolutionCode3
X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics and Molecular DesignCode3
Segment Any Medical Model ExtendedCode3
An Image is Worth 32 Tokens for Reconstruction and GenerationCode3
Show:102550
← PrevPage 158 of 7094Next →