SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1475114800 of 474278 papers

TitleStatusHype
When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning1
PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling1
GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent1
Block-Recurrent Dynamics in Vision Transformers1
QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models1
Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry1
How to Take a Memorable Picture? Empowering Users with Actionable Feedback1
CooperBench: Why Coding Agents Cannot be Your Teammates Yet1
FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation1
3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model1
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers1
Next Embedding Prediction Makes World Models Stronger1
Scale Space Diffusion1
Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability1
GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts1
Monocular Normal Estimation via Shading Sequence Estimation1
DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning1
P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads1
Mixture of Style Experts for Diverse Image Stylization1
SynthVerse: A Large-Scale Diverse Synthetic Dataset for Point Tracking1
FeatureBench: Benchmarking Agentic Coding for Complex Feature Development1
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models1
Factuality Matters: When Image Generation and Editing Meet Structured Visuals1
TSRBench: A Comprehensive Multi-task Multi-modal Time Series Reasoning Benchmark for Generalist Models1
HalluHard: A Hard Multi-Turn Hallucination Benchmark1
TensorLens: End-to-End Transformer Analysis via High-Order Attention Tensors1
TADA! Tuning Audio Diffusion Models through Activation Steering1
Coarse-Guided Visual Generation via Weighted h-Transform Sampling1
EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation1
HoneyBee: Data Recipes for Vision-Language Reasoners1
MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier1
ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL Problems1
MIND-V: Hierarchical World Model for Long-Horizon Robotic Manipulation with RL-based Physical Alignment1
Fast KVzip: Efficient and Accurate LLM Inference with Gated KV Eviction1
ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors1
Graph2Eval: Automatic Multimodal Task Generation for Agents via Knowledge Graphs1
Language Models are Injective and Hence Invertible1
RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback1
DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval1
DeepSight: An All-in-One LM Safety Toolkit1
Privileged Information Distillation for Language Models1
Dropping Anchor and Spherical Harmonics for Sparse-view Gaussian Splatting1
Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs1
Enhancing Multi-Image Understanding through Delimiter Token Scaling1
CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models1
A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)1
UniVBench: Towards Unified Evaluation for Video Foundation Models1
Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data1
UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos1
FlashPrefill: Instantaneous Pattern Discovery and Thresholding for Ultra-Fast Long-Context Prefilling1
Show:102550
← PrevPage 296 of 9486Next →