SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 49515000 of 661570 papers

TitleStatusHype
A Survey on Efficient Vision-Language-Action Models2
End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning2
X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests2
On the Design of One-step Diffusion via Shortcutting Flow Paths2
Kanade: A Simple Disentangled Tokenizer for Spoken Language Modeling2
Residual Context Diffusion Language Models2
Shaping capabilities with token-level data filtering2
Exploring Reasoning Reward Model for Agents2
Drive-JEPA: Video JEPA Meets Multimodal Trajectory Distillation for End-to-End Driving2
Wikontic: Constructing Wikidata-Aligned, Ontology-Aware Knowledge Graphs with Large Language Models2
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation2
Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models2
Efficient Autoregressive Video Diffusion with Dummy Head2
WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models2
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning2
Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing2
Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding2
Innovator-VL: A Multimodal Large Language Model for Scientific Discovery2
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision2
Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models2
daVinci-Dev: Agent-native Mid-training for Software Engineering2
Towards Pixel-Level VLM Perception via Simple Points Prediction2
Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLMs2
Unleashing Scientific Reasoning for Bio-experimental Protocol Generation via Structured Component-based Reward Mechanism2
DeFM: Learning Foundation Representations from Depth for Robotics2
Self-Refining Video Sampling2
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding2
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control2
SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning2
BuildArena: A Physics-Aligned Interactive Benchmark of LLMs for Engineering Construction2
Q-learning with Adjoint Matching2
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding2
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model2
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents2
Boosting Generative Image Modeling via Joint Image-Feature Synthesis2
GutenOCR: A Grounded Vision-Language Front-End for Documents2
SciArena: An Open Evaluation Platform for Non-Verifiable Scientific Literature-Grounded Tasks2
BPMN Assistant: An LLM-Based Approach to Business Process Modeling2
Rethinking Video Generation Model for the Embodied World2
Adaptive Multi-Agent Reasoning via Automated Workflow GenerationCode2
SystolicAttention: Fusing FlashAttention within a Single Systolic ArrayCode2
CharaConsist: Fine-Grained Consistent Character GenerationCode2
Alleviating Textual Reliance in Medical Language-guided Segmentation via Prototype-driven Semantic ApproximationCode2
Seq vs Seq: An Open Suite of Paired Encoders and DecodersCode2
DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil EngineeringCode2
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMsCode2
MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group QuantizationCode2
Vision Language Action Models in Robotic Manipulation: A Systematic ReviewCode2
I^2-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene ForecastingCode2
CompassJudger-2: Towards Generalist Judge Model via Verifiable RewardsCode2
Show:102550
← PrevPage 100 of 13232Next →