SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 56515675 of 474278 papers

TitleStatusHype
Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve AdjustmentCode2
LARGE: Legal Retrieval Augmented Generation Evaluation ToolCode2
shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and PythonCode2
AI-Newton: A Concept-Driven Physical Law Discovery System without Prior Physical KnowledgeCode2
Scene-Centric Unsupervised Panoptic SegmentationCode2
MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security ExploitsCode2
Less is More: Efficient Black-box Attribution via Minimal Interpretable Subset SelectionCode2
FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal ReasoningCode2
CrackSQL: A Hybrid SQL Dialect Translation System Powered by Large Language ModelsCode2
A Decade of Deep Learning for Remote Sensing Spatiotemporal Fusion: Advances, Challenges, and OpportunitiesCode2
Z1: Efficient Test-time Scaling with CodeCode2
Learned Image Compression with Dictionary-based Entropy ModelCode2
OpenFACADES: An Open Framework for Architectural Caption and Attribute Data Enrichment via Street View ImageryCode2
Dynamic Parametric Retrieval Augmented Generation for Test-time Knowledge EnhancementCode2
Training-Free Text-Guided Image Editing with Visual Autoregressive ModelCode2
A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?Code2
Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement LearningCode2
SALT: A Flexible Semi-Automatic Labeling Tool for General LiDAR Point Clouds with Cross-Scene Adaptability and 4D ConsistencyCode2
Force-Free Molecular Dynamics Through Autoregressive Equivariant NetworksCode2
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies AheadCode2
Effective Cloud Removal for Remote Sensing Images by an Improved Mean-Reverting Denoising Model with Elucidated Design SpaceCode2
THEMIS: Towards Practical Intellectual Property Protection for Post-Deployment On-Device Deep Learning ModelsCode2
TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud DetectionCode2
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMsCode2
Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language ModelsCode2
Show:102550
← PrevPage 227 of 18972Next →