SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 48014850 of 661570 papers

TitleStatusHype
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings2
Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation2
XSkill: Continual Learning from Experience and Skills in Multimodal Agents2
OmniForcing: Unleashing Real-time Joint Audio-Visual Generation2
IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse2
OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams2
NeuralOS: Towards Simulating Operating Systems via Neural Generative Models2
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training2
Mobile-GS: Real-time Gaussian Splatting for Mobile Devices2
Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct2
ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation2
Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention2
Streaming Autoregressive Video Generation via Diagonal Distillation2
LLM2Vec-Gen: Generative Embeddings from Large Language Models2
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data2
ID-LoRA: Identity-Driven Audio-Video Personalization with In-Context LoRA2
Robot Control Stack: A Lean Ecosystem for Robot Learning at Scale2
Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports2
HiAR: Efficient Autoregressive Long Video Generation via Hierarchical Denoising2
Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding2
OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning2
WildActor: Unconstrained Identity-Preserving Video Generation2
ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning2
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion2
Physical Simulator In-the-Loop Video Generation2
Lost in Stories: Consistency Bugs in Long Story Generation by LLMs2
NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation2
MorphAny3D: Unleashing the Power of Structured Latent in 3D Morphing2
From Word to World: Can Large Language Models be Implicit Text-based World Models?2
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs2
Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator2
Track4World: Feedforward World-centric Dense 3D Tracking of All Pixels2
RealWonder: Real-Time Physical Action-Conditioned Video Generation2
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding2
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs2
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation2
EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding2
CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video2
Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models2
RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies2
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents2
VidEoMT: Your ViT is Secretly Also a Video Segmentation Model2
Phi-4-reasoning-vision-15B Technical Report2
Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling2
SimRecon: SimReady Compositional Scene Reconstruction from Real Videos2
Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle2
InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation2
Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization2
Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing2
HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images2
Show:102550
← PrevPage 97 of 13232Next →