SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 93519400 of 661570 papers

TitleStatusHype
Safety Under Scaffolding: How Evaluation Conditions Shape Measured Safety0
Toward Epistemic Stability: Engineering Consistent Procedures for Industrial LLM Hallucination Reduction0
Gated Adaptation for Continual Learning in Human Activity Recognition0
Stable Multi-Drone GNSS Tracking System for Marine Robots0
4DRC-OCC: Robust Semantic Occupancy Prediction Through Fusion of 4D Radar and Camera0
CLAD-Net: Continual Activity Recognition in Multi-Sensor Wearable Systems0
Dual Randomized Smoothing: Beyond Global Noise Variance0
Robustness Verification of Graph Neural Networks Via Lightweight Satisfiability Testing0
Crowdsourcing the Frontier: Advancing Hybrid Physics-ML Climate Simulation via a $50,000 Kaggle Competition0
UltraUPConvNet: A UPerNet- and ConvNeXt-Based Multi-Task Network for Ultrasound Tissue Segmentation and Disease PredictionCode0
ABD: Default Exception Abduction in Finite First Order Worlds0
A Lightweight MPC Bidding Framework for Brand Auction Ads0
Using GPUs And LLMs Can Be Satisfying for Nonlinear Real Arithmetic Problems0
QuadAI at SemEval-2026 Task 3: Ensemble Learning of Hybrid RoBERTa and LLMs for Dimensional Aspect-Based Sentiment AnalysisCode0
Go Beyond Your Means: Unlearning with Per-Sample Gradient Orthogonalization0
CompanionCast: Toward Social Collaboration with Multi-Agent Systems in Shared Experiences0
TokMem: One-Token Procedural Memory for Large Language Models0
Streaming Drag-Oriented Interactive Video Manipulation: Drag Anything, Anytime!0
Ego-Vision World Model for Humanoid Contact Planning0
Towards Efficient Federated Learning of Networked Mixture-of-Experts for Mobile Edge Computing0
Meta-RL Induces Exploration in Language Agents0
ReMeDI: Refined Memory for Disambiguation of Identities with SAM3 in Surgical Segmentation0
Certifying the Right to Be Forgotten: Primal-Dual Optimization for Sample and Label Unlearning in Vertical Federated Learning0
BioAgent Bench: An AI Agent Evaluation Suite for Bioinformatics0
In-Run Data Shapley for Adam Optimizer0
Learning Page Order in Shuffled WOO Releases0
OVerSeeC: Open-Vocabulary Costmap Generation from Satellite Images and Natural Language0
Whole-Brain Connectomic Graph Model Enables Whole-Body Locomotion Control in Fruit Fly0
On Sample-Efficient Generalized Planning via Learned Transition Models0
How Well Do Multimodal Models Reason on ECG Signals?0
Test-Time Meta-Adaptation with Self-Synthesis0
Memory for Autonomous LLM Agents:Mechanisms, Evaluation, and Emerging Frontiers0
A Primer on Evolutionary Frameworks for Near-Field Multi-Source Localization0
Mitigating the Memory Bottleneck with Machine Learning-Driven and Data-Aware Microarchitectural Techniques0
FrameVGGT: Frame Evidence Rolling Memory for streaming VGGT0
RoboPCA: Pose-centered Affordance Learning from Human Demonstrations for Robot Manipulation0
PARSE: Part-Aware Relational Spatial Modeling0
VoiceSHIELD-Small: Real-Time Malicious Speech Detection and Transcription0
YAQIN: Culturally Sensitive, Agentic AI for Mental Healthcare Support Among Muslim Women in the UK0
A Novel Multi-Agent Architecture to Reduce Hallucinations of Large Language Models in Multi-Step Structural Modeling0
Large Language Model for Discrete Optimization Problems: Evaluation and Step-by-step Reasoning0
3ViewSense: Spatial and Mental Perspective Reasoning from Orthographic Views in Vision-Language Models0
Uncertainty-Gated Generative Modeling0
Whitening Reveals Cluster Commitment as the Geometric Separator of Hallucination Types0
AR2-4FV: Anchored Referring and Re-identification for Long-Term Grounding in Fixed-View Videos0
DECADE: A Temporally-Consistent Unsupervised Diffusion Model for Enhanced Rb-82 Dynamic Cardiac PET Image Denoising0
MedQ-Deg: A Multidimensional Benchmark for Evaluating MLLMs Across Medical Image Quality Degradations0
Geometric Knowledge-Assisted Federated Dual Knowledge Distillation Approach Towards Remote Sensing Satellite Imagery0
Parameterized Brushstroke Style Transfer0
Breaking Training Bottlenecks: Effective and Stable Reinforcement Learning for Coding Models0
Show:102550
← PrevPage 188 of 13232Next →