SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 23512400 of 659983 papers

TitleStatusHype
HY-WU (Part I): An Extensible Functional Neural Memory Framework and An Instantiation in Text-Guided Image Editing3
LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory3
Human3R: Everyone Everywhere All at Once3
Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing3
tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction3
EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering3
FireRed-OCR Technical Report3
Latent Diffusion Model without Variational Autoencoder3
RLP: Reinforcement as a Pretraining Objective3
Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision3
GEM: A Gym for Agentic LLMs3
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence3
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution3
Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding3
EO-1: An Open Unified Embodied Foundation Model for General Robot Control3
Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering3
A Survey of Data Agents: Emerging Paradigm or Overstated Hype?3
Much Ado About Noising: Dispelling the Myths of Generative Robotic Control3
JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation3
pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation3
PartUV: Part-Based UV Unwrapping of 3D Meshes3
AnyUp: Universal Feature Upsampling3
LLaDA2.1: Speeding Up Text Diffusion via Token Editing3
DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing3
Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution3
LLM-in-Sandbox Elicits General Agentic Intelligence3
DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation3
SceneSmith: Agentic Generation of Simulation-Ready Indoor Scenes3
Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making3
Yunjue Agent Tech Report: A Fully Reproducible, Zero-Start In-Situ Self-Evolving Agent System for Open-Ended Tasks3
Simulating the Visual World with Artificial Intelligence: A Roadmap3
Scaling Multiagent Systems with Process Rewards3
SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents3
HY3D-Bench: Generation of 3D Assets3
CL-bench: A Benchmark for Context Learning3
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents3
Making Avatars Interact: Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars3
EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling3
A Survey of Token Compression for Efficient Multimodal Large Language Models3
LongCat-Flash-Thinking-2601 Technical Report3
MetricAnything: Scaling Metric Depth Pretraining with Noisy Heterogeneous Sources3
Deep Delta Learning3
JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion3
DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion3
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows3
Self-Distillation Enables Continual Learning3
Geometry-Grounded Gaussian Splatting3
VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency3
AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security3
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience3
Show:102550
← PrevPage 48 of 13200Next →