SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 68016825 of 474278 papers

TitleStatusHype
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame SelectionCode2
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space DualityCode2
Open-Vocabulary Online Semantic Mapping for SLAMCode2
AnyText2: Visual Text Generation and Editing With Customizable AttributesCode2
Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAICode2
Zero-Shot Coreset Selection: Efficient Pruning for Unlabeled DataCode2
RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human expertsCode2
MovieBench: A Hierarchical Movie Level Dataset for Long Video GenerationCode2
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language ModelsCode2
ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow DataCode2
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AICode2
Natural Language Reinforcement LearningCode2
MMGenBench: Evaluating the Limits of LMMs from the Text-to-Image Generation PerspectiveCode2
EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the WildCode2
CodeSAM: Source Code Representation Learning by Infusing Self-Attention with Multi-Code-View GraphsCode2
BiomedCoOp: Learning to Prompt for Biomedical Vision-Language ModelsCode2
FunctionChat-Bench: Comprehensive Evaluation of Language Models' Generative Capabilities in Korean Tool-use DialogsCode2
Empower Structure-Based Molecule Optimization with Gradient Guided Bayesian Flow NetworksCode2
Quantized symbolic time series approximationCode2
Disentangling Memory and Reasoning Ability in Large Language ModelsCode2
DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous DrivingCode2
RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image GenerationCode2
Find Any Part in 3DCode2
SimPhony: A Device-Circuit-Architecture Cross-Layer Modeling and Simulation Framework for Heterogeneous Electronic-Photonic AI SystemCode2
Practical Compact Deep Compressed SensingCode2
Show:102550
← PrevPage 273 of 18972Next →