SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 56515700 of 661570 papers

TitleStatusHype
LARGE: Legal Retrieval Augmented Generation Evaluation ToolCode2
Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve AdjustmentCode2
MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security ExploitsCode2
Efficient Federated Learning Tiny Language Models for Mobile Network Feature PredictionCode2
An Illusion of Progress? Assessing the Current State of Web AgentsCode2
shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and PythonCode2
FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal ReasoningCode2
Z1: Efficient Test-time Scaling with CodeCode2
Learned Image Compression with Dictionary-based Entropy ModelCode2
A Decade of Deep Learning for Remote Sensing Spatiotemporal Fusion: Advances, Challenges, and OpportunitiesCode2
Less is More: Efficient Black-box Attribution via Minimal Interpretable Subset SelectionCode2
CrackSQL: A Hybrid SQL Dialect Translation System Powered by Large Language ModelsCode2
OpenFACADES: An Open Framework for Architectural Caption and Attribute Data Enrichment via Street View ImageryCode2
Effective Cloud Removal for Remote Sensing Images by an Improved Mean-Reverting Denoising Model with Elucidated Design SpaceCode2
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMsCode2
Every Painting Awakened: A Training-free Framework for Painting-to-Animation GenerationCode2
Force-Free Molecular Dynamics Through Autoregressive Equivariant NetworksCode2
Training-Free Text-Guided Image Editing with Visual Autoregressive ModelCode2
SALT: A Flexible Semi-Automatic Labeling Tool for General LiDAR Point Clouds with Cross-Scene Adaptability and 4D ConsistencyCode2
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies AheadCode2
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1Code2
AutoEval: Autonomous Evaluation of Generalist Robot Manipulation Policies in the Real WorldCode2
On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile DevicesCode2
A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?Code2
TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud DetectionCode2
Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language ModelsCode2
THEMIS: Towards Practical Intellectual Property Protection for Post-Deployment On-Device Deep Learning ModelsCode2
Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement LearningCode2
Dynamic Parametric Retrieval Augmented Generation for Test-time Knowledge EnhancementCode2
Optimal Invariant Bases for Atomistic Machine LearningCode2
RARE: Retrieval-Augmented Reasoning ModelingCode2
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual ScenesCode2
FastVAR: Linear Visual Autoregressive Modeling via Cached Token PruningCode2
Graph ODEs and Beyond: A Comprehensive Survey on Integrating Differential Equations with Graph Neural NetworksCode2
OncoReg: Medical Image Registration for Oncological ChallengesCode2
From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3DCode2
Large Self-Supervised Models Bridge the Gap in Domain Adaptive Object DetectionCode2
FreeSplat++: Generalizable 3D Gaussian Splatting for Efficient Indoor Scene ReconstructionCode2
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning ModelsCode2
Q-Insight: Understanding Image Quality via Visual Reinforcement LearningCode2
Unicorn: Text-Only Data Synthesis for Vision Language Model TrainingCode2
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space ModelsCode2
Learning to Reason for Long-Form Story GenerationCode2
A Survey on Remote Sensing Foundation Models: From Vision to MultimodalityCode2
Harmonizing Visual Representations for Unified Multimodal Understanding and GenerationCode2
A Unified Image-Dense Annotation Generation Model for Underwater ScenesCode2
Mobile-VideoGPT: Fast and Accurate Video Understanding Language ModelCode2
UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement LearningCode2
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive TasksCode2
Datasets for Depression Modeling in Social Media: An OverviewCode2
Show:102550
← PrevPage 114 of 13232Next →