SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 91519175 of 177340 papers

TitleStatusHype
FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language ModelsCode2
GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI AgentCode2
Ranked Entropy Minimization for Continual Test-Time AdaptationCode2
Training Long-Context LLMs Efficiently via Chunk-wise OptimizationCode2
Training-Free Multi-Step Audio Source SeparationCode2
Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement LearningCode2
WeatherEdit: Controllable Weather Editing with 4D Gaussian FieldCode2
HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex MotionsCode2
Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion ModulationCode2
TC-GS: A Faster Gaussian Splatting Module Utilizing Tensor CoresCode2
When Large Multimodal Models Confront Evolving Knowledge:Challenges and PathwaysCode2
ViStoryBench: Comprehensive Benchmark Suite for Story VisualizationCode2
Hogwild! Inference: Parallel LLM Generation via Concurrent AttentionCode2
DualMap: Online Open-Vocabulary Semantic Mapping for Natural Language Navigation in Dynamic Changing ScenesCode2
Savage-Dickey density ratio estimation with normalizing flows for Bayesian model comparisonCode2
VideoMolmo: Spatio-Temporal Grounding Meets PointingCode2
ORV: 4D Occupancy-centric Robot Video GenerationCode2
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand BetterCode2
Thinking vs. Doing: Agents that Reason by Scaling Test-Time InteractionCode2
Urban1960SatSeg: Unsupervised Semantic Segmentation of Mid-20^th century Urban Landscapes with Satellite ImageriesCode2
UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian SplattingCode2
CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video ModelsCode2
Language Modeling by Language ModelsCode2
PocketVina Enables Scalable and Highly Accurate Physically Valid Docking through Multi-Pocket ConditioningCode2
LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPSCode2
Show:102550
← PrevPage 367 of 7094Next →