SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 58515900 of 661570 papers

TitleStatusHype
VMBench: A Benchmark for Perception-Aligned Video Motion GenerationCode2
3D Student Splatting and ScoopingCode2
OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with TransformerCode2
GroundingSuite: Measuring Complex Multi-Granular Pixel GroundingCode2
EEdit: Rethinking the Spatial and Temporal Redundancy for Efficient Image EditingCode2
Multi-Modal Mamba Modeling for Survival Prediction (M4Survive): Adapting Joint Foundation Model RepresentationsCode2
ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant TightnessCode2
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language ModelsCode2
RoMA: Scaling up Mamba-based Foundation Models for Remote SensingCode2
A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1Code2
Bayesian Prompt Flow Learning for Zero-Shot Anomaly DetectionCode2
DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario UnderstandingCode2
OR-LLM-Agent: Automating Modeling and Solving of Operations Research Optimization Problem with Reasoning Large Language ModelCode2
RI3D: Few-Shot Gaussian Splatting With Repair and Inpainting Diffusion PriorsCode2
Autoregressive Image Generation with Randomized Parallel DecodingCode2
Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a BenchmarkCode2
SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any VideoCode2
Neighboring Autoregressive Modeling for Efficient Visual GenerationCode2
Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent SpaceCode2
PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff DropCode2
Teaching LMMs for Image Quality Scoring and InterpretingCode2
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement LearningCode2
Manify: A Python Library for Learning Non-Euclidean RepresentationsCode2
Foundation Models for Spatio-Temporal Data Science: A Tutorial and SurveyCode2
Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in ClutterCode2
KNighter: Transforming Static Analysis with LLM-Synthesized CheckersCode2
CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing GamesCode2
OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space ModelsCode2
External Knowledge Injection for CLIP-Based Class-Incremental LearningCode2
Mellow: a small audio language model for reasoningCode2
TrackOcc: Camera-based 4D Panoptic Occupancy TrackingCode2
MMRL: Multi-Modal Representation Learning for Vision-Language ModelsCode2
Referring to Any PersonCode2
QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video ComprehensionCode2
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator TrajectoriesCode2
LongProLIP: A Probabilistic Vision-Language Model with Long Context TextCode2
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator TrajectoriesCode2
"Principal Components" Enable A New Language of ImagesCode2
A Neural Symbolic Model for Space PhysicsCode2
GigaSLAM: Large-Scale Monocular SLAM with Hierarchical Gaussian SplatsCode2
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference OptimizationCode2
V-Max: A Reinforcement Learning Framework for Autonomous DrivingCode2
HiP-AD: Hierarchical and Multi-Granularity Planning with Deformable Attention for Autonomous Driving in a Single DecoderCode2
Parametric Point Cloud Completion for Polygonal Surface ReconstructionCode2
YOLOMG: Vision-based Drone-to-Drone Detection with Appearance and Pixel-Level Motion FusionCode2
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token PruningCode2
Boosting the Generalization and Reasoning of Vision Language Models with Curriculum Reinforcement LearningCode2
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation ModelCode2
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMsCode2
AR-Diffusion: Asynchronous Video Generation with Auto-Regressive DiffusionCode2
Show:102550
← PrevPage 118 of 13232Next →