SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 11761200 of 177339 papers

TitleStatusHype
Deep Lake: a Lakehouse for Deep LearningCode5
MARIO Eval: Evaluate Your Math LLM with your Math LLM--A mathematical dataset evaluation toolkitCode5
YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-DictionaryCode5
Efficient Diffusion Model for Image Restoration by Residual ShiftingCode5
τ^2-Bench: Evaluating Conversational Agents in a Dual-Control EnvironmentCode5
DUSt3R: Geometric 3D Vision Made EasyCode5
Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar ModelingCode5
EnvPool: A Highly Parallel Reinforcement Learning Environment Execution EngineCode5
ProPainter: Improving Propagation and Transformer for Video InpaintingCode5
MedRAX: Medical Reasoning Agent for Chest X-rayCode5
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt CompressionCode5
TextMonkey: An OCR-Free Large Multimodal Model for Understanding DocumentCode5
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use CasesCode5
Self-Instruct: Aligning Language Models with Self-Generated InstructionsCode5
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT ModelCode4
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech SynthesisCode4
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision ApplicationsCode4
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and SegmentationCode4
Baichuan 2: Open Large-scale Language ModelsCode4
SEED-Story: Multimodal Long Story Generation with Large Language ModelCode4
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality DocumentsCode4
Otter: A Multi-Modal Model with In-Context Instruction TuningCode4
Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented GenerationCode4
Safurai 001: New Qualitative Approach for Code LLM EvaluationCode4
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language ModelsCode4
Show:102550
← PrevPage 48 of 7094Next →