SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1460114650 of 474278 papers

TitleStatusHype
Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper1
Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference1
ResearchGym: Evaluating Language Model Agents on Real-World AI Research1
MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents1
World Models That Know When They Don't Know - Controllable Video Generation with Calibrated Uncertainty1
Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition1
AlphaApollo: A System for Deep Agentic Reasoning1
TinyNav: End-to-End TinyML for Real-Time Autonomous Navigation on Microcontrollers1
Reward Prediction with Factorized World States1
From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors1
Video-Based Reward Modeling for Computer-Use Agents1
Monocular Normal Estimation via Shading Sequence Estimation1
Can Vision-Language Models Solve the Shell Game?1
Scale Space Diffusion1
TeamHOI: Learning a Unified Policy for Cooperative Human-Object Interactions with Any Team Size1
OVGGT: O(1) Constant-Cost Streaming Visual Geometry Transformer1
Modular Neural Image Signal Processing1
FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use1
VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning?1
CARE-Edit: Condition-Aware Routing of Experts for Contextual Image Editing1
RedSage: A Cybersecurity Generalist LLM1
In-Context Reinforcement Learning for Tool Use in Large Language Models1
LatentMem: Customizing Latent Memory for Multi-Agent Systems1
FineRMoE: Dimension Expansion for Finer-Grained Expert with Its Upcycling Approach1
\$OneMillion-Bench: How Far are Language Agents from Human Experts?1
π-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs1
NaviDriveVLM: Decoupling High-Level Reasoning and Motion Planning for Autonomous Driving1
HiconAgent: History Context-aware Policy Optimization for GUI Agents1
TAPFormer: Robust Arbitrary Point Tracking via Transient Asynchronous Fusion of Frames and Events1
Video2Layout: Recall and Reconstruct Metric-Grounded Cognitive Map for Spatial Reasoning1
DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving1
CaTok: Taming Mean Flows for One-Dimensional Causal Image Tokenization1
PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction1
LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference1
FlashPrefill: Instantaneous Pattern Discovery and Thresholding for Ultra-Fast Long-Context Prefilling1
UniVBench: Towards Unified Evaluation for Video Foundation Models1
U6G XL-MIMO Radiomap Prediction: Multi-Config Dataset and Beam Map Approach1
CASA: Cross-Attention over Self-Attention for Efficient Vision-Language Fusion1
Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model1
-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space1
BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning1
KLASS: KL-Guided Fast Inference in Masked Diffusion Models1
DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval1
Graph2Eval: Automatic Multimodal Task Generation for Agents via Knowledge Graphs1
Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline1
VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL1
LLEMA: Evolutionary Search with LLMs for Multi-Objective Materials Discovery1
Factuality Matters: When Image Generation and Editing Meet Structured Visuals1
ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL Problems1
ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors1
Show:102550
← PrevPage 293 of 9486Next →