SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 24512475 of 661570 papers

TitleStatusHype
SealQA: Raising the Bar for Reasoning in Search-Augmented Language ModelsCode3
EXP-Bench: Can AI Conduct AI Research Experiments?Code3
MathArena: Evaluating LLMs on Uncontaminated Math CompetitionsCode3
TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context LearningCode3
BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM ModelCode3
MAGREF: Masked Guidance for Any-Reference Video GenerationCode3
EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-JudgeCode3
KVzip: Query-Agnostic KV Cache Compression with Context ReconstructionCode3
Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action ModelsCode3
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement LearningCode3
NeuralOM: Neural Ocean Model for Subseasonal-to-Seasonal SimulationCode3
syftr: Pareto-Optimal Generative AICode3
Iterative Self-Incentivization Empowers Large Language Models as Agentic SearchersCode3
Learning to Reason without External RewardsCode3
PCDCNet: A Surrogate Model for Air Quality Forecasting with Physical-Chemical Dynamics and ConstraintsCode3
VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and ExtrapolationCode3
VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D ReconstructionCode3
FruitNeRF++: A Generalized Multi-Fruit Counting Method Utilizing Contrastive Learning and Neural Radiance FieldsCode3
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative PipelineCode3
InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic ChartsCode3
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization DataCode3
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement LearningCode3
ChartGalaxy: A Dataset for Infographic Chart Understanding and GenerationCode3
Distilling LLM Agent into Small Models with Retrieval and Code ToolsCode3
OrionBench: A Benchmark for Chart and Human-Recognizable Object Detection in InfographicsCode3
Show:102550
← PrevPage 99 of 26463Next →