SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 51015125 of 661570 papers

TitleStatusHype
Efficient Speech Enhancement via Embeddings from Pre-trained Generative AudioencodersCode2
CGVQM+D: Computer Graphics Video Quality Metric and DatasetCode2
Statistical Machine Learning for Astronomy -- A TextbookCode2
CreatiPoster: Towards Editable and Controllable Multi-Layer Graphic Design GenerationCode2
VideoDeepResearch: Long Video Understanding With Agentic Tool UsingCode2
SDialog: A Python Toolkit for Synthetic Dialogue Generation and AnalysisCode2
Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMsCode2
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMsCode2
ChineseHarm-Bench: A Chinese Harmful Content Detection BenchmarkCode2
GLAP: General contrastive audio-text pretraining across domains and languagesCode2
ConTextTab: A Semantics-Aware Tabular In-Context LearnerCode2
SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation BenchmarksCode2
Execution Guided Line-by-Line Code GenerationCode2
AutoMind: Adaptive Knowledgeable Agent for Automated Data ScienceCode2
TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document ReasoningCode2
OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization ProblemsCode2
QuadricFormer: Scene as Superquadrics for 3D Semantic Occupancy PredictionCode2
ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single ModelCode2
SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill BlendingCode2
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual DrawingCode2
CoRT: Code-integrated Reasoning within ThinkingCode2
CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video ModelsCode2
TaskCraft: Automated Generation of Agentic TasksCode2
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical ReasoningCode2
IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic EnvironmentsCode2
Show:102550
← PrevPage 205 of 26463Next →