SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1500115050 of 474278 papers

TitleStatusHype
Utility-Driven Speculative Decoding for Mixture-of-Experts0
Event-Driven Online Vertical Federated Learning0
Cost-Efficient Serving of LLM Agents via Test-Time Plan Caching0
CNN-Enabled Scheduling for Probabilistic Real-Time Guarantees in Industrial URLLC0
FEAST: A Flexible Mealtime-Assistance System Towards In-the-Wild Personalization0
OS-Harm: A Benchmark for Measuring Safety of Computer Use AgentsCode2
A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments0
Efficient Serving of LLM Applications with Probabilistic Demand Modeling0
LLM Jailbreak Oracle0
Efficient Retail Video Annotation: A Robust Key Frame Generation Approach for Product and Customer Interaction Analysis0
Preparing for the Intelligence Explosion0
Towards Perception-based Collision Avoidance for UAVs when Guiding the Visually Impaired0
Explain First, Trust Later: LLM-Augmented Explanations for Graph-Based Crypto Anomaly DetectionCode0
Equivariance Everywhere All At Once: A Recipe for Graph Foundation ModelsCode1
Lightweight Relevance Grader in RAGCode0
SCISSOR: Mitigating Semantic Bias through Cluster-Aware Siamese Networks for Robust ClassificationCode0
M2BeamLLM: Multimodal Sensing-empowered mmWave Beam Prediction with Large Language Models0
From Bytes to Ideas: Language Modeling with Autoregressive U-NetsCode7
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMsCode2
DreamLight: Towards Harmonious and Consistent Image Relighting0
I Speak and You Find: Robust 3D Visual Grounding with Noisy and Ambiguous Speech Inputs0
A Variational Information Theoretic Approach to Out-of-Distribution Detection0
Less is More: Undertraining Experts Improves Model Upcycling0
A Multi-Expert Structural-Semantic Hybrid Framework for Unveiling Historical Patterns in Temporal Knowledge GraphsCode0
MAS-LitEval : Multi-Agent System for Literary Translation Quality Assessment0
Automated Decision-Making on Networks with LLMs through Knowledge-Guided Evolution0
DDS-NAS: Dynamic Data Selection within Neural Architecture Search via On-line Hard Example Mining applied to Image Classification0
Exploring Diffusion with Test-Time Training on Efficient Image Restoration0
Align Your Flow: Scaling Continuous-Time Flow Map Distillation0
Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models0
Active InSAR monitoring of building damage in Gaza during the Israel-Hamas War0
Cost-Aware Routing for Efficient Text-To-Image Generation0
orGAN: A Synthetic Data Augmentation Pipeline for Simultaneous Generation of Surgical Images and Ground Truth Labels0
BRISC: Annotated Dataset for Brain Tumor Segmentation and Classification with Swin-HAFNet0
Compressed Video Super-Resolution based on Hierarchical Encoding0
Enclosing Prototypical Variational Autoencoder for Explainable Out-of-Distribution Detection0
A large-scale heterogeneous 3D magnetic resonance brain imaging dataset for self-supervised learning0
Towards Reliable WMH Segmentation under Domain Shift: An Application Study using Maximum Entropy Regularization to Improve Uncertainty Estimation0
Train Once, Forget Precisely: Anchored Optimization for Efficient Post-Hoc Unlearning0
Plug-and-Play with 2.5D Artifact Reduction Prior for Fast and Accurate Industrial Computed Tomography Reconstruction0
Explainable Detection of Implicit Influential Patterns in Conversations via Data Augmentation0
Expectation Confirmation Preference Optimization for Multi-Turn Conversational Recommendation Agent0
Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality0
Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers0
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs0
Reasoning with Exploration: An Entropy Perspective0
FormGym: Doing Paperwork with Agents0
ADRD: LLM-Driven Autonomous Driving Based on Rule-based Decision Systems0
AviationLLM: An LLM-based Knowledge System for Aviation Training0
Don't Make It Up: Preserving Ignorance Awareness in LLM Fine-Tuning0
Show:102550
← PrevPage 301 of 9486Next →