SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 70517100 of 661570 papers

TitleStatusHype
Test-Time Domain Generalization via Universe Learning: A Multi-Graph Matching Approach for Medical Image SegmentationCode2
DTGBrepGen: A Novel B-rep Generative Model through Decoupling Topology and GeometryCode2
Learning to Detect Multi-class Anomalies with Just One Normal Image PromptCode2
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning ModelsCode2
Relational Graph TransformerCode2
AdaptThink: Reasoning Models Can Learn When to ThinkCode2
AD-AGENT: A Multi-agent Framework for End-to-end Anomaly DetectionCode2
FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language ModelsCode2
GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI AgentCode2
Ranked Entropy Minimization for Continual Test-Time AdaptationCode2
Training Long-Context LLMs Efficiently via Chunk-wise OptimizationCode2
Training-Free Multi-Step Audio Source SeparationCode2
Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement LearningCode2
WeatherEdit: Controllable Weather Editing with 4D Gaussian FieldCode2
HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex MotionsCode2
Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion ModulationCode2
TC-GS: A Faster Gaussian Splatting Module Utilizing Tensor CoresCode2
When Large Multimodal Models Confront Evolving Knowledge:Challenges and PathwaysCode2
ViStoryBench: Comprehensive Benchmark Suite for Story VisualizationCode2
Hogwild! Inference: Parallel LLM Generation via Concurrent AttentionCode2
DualMap: Online Open-Vocabulary Semantic Mapping for Natural Language Navigation in Dynamic Changing ScenesCode2
Savage-Dickey density ratio estimation with normalizing flows for Bayesian model comparisonCode2
VideoMolmo: Spatio-Temporal Grounding Meets PointingCode2
ORV: 4D Occupancy-centric Robot Video GenerationCode2
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand BetterCode2
Thinking vs. Doing: Agents that Reason by Scaling Test-Time InteractionCode2
Urban1960SatSeg: Unsupervised Semantic Segmentation of Mid-20^th century Urban Landscapes with Satellite ImageriesCode2
UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian SplattingCode2
CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video ModelsCode2
Language Modeling by Language ModelsCode2
PocketVina Enables Scalable and Highly Accurate Physically Valid Docking through Multi-Pocket ConditioningCode2
LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPSCode2
RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and TrackingCode2
SonicVerse: Multi-Task Learning for Music Feature-Informed CaptioningCode2
AnyCalib: On-Manifold Learning for Model-Agnostic Single-View Camera CalibrationCode2
Learning to See in the Extremely DarkCode2
Closed-form Continuous-time Neural ModelsCode2
Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement LearningCode2
When Language Model Meets Private LibraryCode2
H_2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language ModelsCode2
MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement LearningCode2
Visual Reinforcement Learning with Imagined GoalsCode2
Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in RoboticsCode2
The Replica Dataset: A Digital Replica of Indoor SpacesCode2
Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language ModelCode2
Multi-Objective Molecule Generation using Interpretable SubstructuresCode2
Neural Network Compression Framework for fast model inferenceCode2
Towards Backdoor Attacks and Defense in Robust Machine Learning ModelsCode2
Adversarial Attacks and Defenses on Graphs: A Review, A Tool and Empirical StudiesCode2
On the Planning Abilities of Large Language Models - A Critical InvestigationCode2
Show:102550
← PrevPage 142 of 13232Next →