SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 99019950 of 661570 papers

TitleStatusHype
From Physician Expertise to Clinical Agents: Preserving, Standardizing, and Scaling Physicians' Medical Expertise with Lightweight LLM0
Chitrakshara: A Large Multilingual Multimodal Dataset for Indian languages0
Qworld: Question-Specific Evaluation Criteria for LLMs0
Do 3D Large Language Models Really Understand 3D Spatial Relationships?0
Navigating the Concept Space of Language Models0
Prompt Compression in Production Task Orchestration: A Pre-Registered Randomized Trial0
Plato's Cave: A Human-Centered Research Verification System0
Compression Method Matters: Benchmark-Dependent Output Dynamics in LLM Prompt Compression0
The Compression Paradox in LLM Inference: Provider-Dependent Energy Effects of Prompt Compression0
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens0
Beyond Accuracy: Introducing a Symbolic-Mechanistic Approach to Interpretable Evaluation0
Cluster-R1: Large Reasoning Models Are Instruction-following Clustering Agents0
Training a Large Language Model for Medical Coding Using Privacy-Preserving Synthetic Clinical Data0
Email in the Era of LLMs0
Characterizing the ability of LLMs to recapitulate Americans' distributional responses to public opinion polling questions across political issues0
Beyond Scalar Rewards: Distributional Reinforcement Learning with Preordered Objectives for Safe and Reliable Autonomous Driving0
Automated Motif Indexing on the Arabian Nights0
KD-EKF: Knowledge-Distilled Adaptive Covariance EKF for Robust UWB/PDR Indoor Localization0
Clinically Meaningful Explainability for NeuroAI: An ethical, technical, and clinical perspective0
WMoE-CLIP: Wavelet-Enhanced Mixture-of-Experts Prompt Learning for Zero-Shot Anomaly Detection0
Chaotic Oscillator Networks for Classification Tasks0
TerraLingua: Emergence and Analysis of Open-endedness in LLM Ecologies0
How to Achieve Prototypical Birth and Death for OOD Detection?0
A federated learning framework with knowledge graph and temporal transformer for early sepsis prediction in multi-center ICUs0
MindfulAgents: Personalizing Mindfulness Meditation via an Expert-Aligned Multi-Agent System0
MultiSolSegment: Multi-channel segmentation of overlapping features in electroluminescence images of photovoltaic cells0
AdaBox: Adaptive Density-Based Box Clustering with Parameter Generalization0
Information-Theoretic Constraints for Continual Vision-Language-Action Alignment0
OpenExtract: Automated Data Extraction for Systematic Reviews in HealthCode0
Supporting Artifact Evaluation with LLMs: A Study with Published Security Research Papers0
Evaluating Generalization Mechanisms in Autonomous Cyber Attack Agents0
Enhancing SHAP Explainability for Diagnostic and Prognostic ML Models in Alzheimer Disease0
Do Compact SSL Backbones Matter for Audio Deepfake Detection? A Controlled Study with RAPTOR0
FTSplat: Feed-forward Triangle Splatting Network0
A recipe for scalable attention-based MLIPs: unlocking long-range accuracy with all-to-all node attention0
HGT-Scheduler: Deep Reinforcement Learning for the Job Shop Scheduling Problem via Heterogeneous Graph Transformers0
AI-Assisted Curation of Conference Scholarship: Compiling, Structuring, and Analyzing Two Decades of Presentations at the Society for Social Work and Research0
Agentic LLM Planning via Step-Wise PDDL Simulation: An Empirical Characterisation0
A Detection-Gated Pipeline for Robust Glottal Area Waveform Extraction and Clinical Pathology AssessmentCode0
A Hazard-Informed Data Pipeline for Robotics Physical Safety0
Are Audio-Language Models Listening? Audio-Specialist Heads for Adaptive Audio Steering0
SLER-IR: Spherical Layer-wise Expert Routing for All-in-One Image Restoration0
Accelerating Scientific Research with Gemini: Case Studies and Common Techniques0
Multi-Agent Reinforcement Learning with Submodular Reward0
Making AI Evaluation Deployment Relevant Through Context Specification0
Counting on Consensus: Selecting the Right Inter-annotator Agreement Metric for NLP Annotation and Evaluation0
ContextBench: Modifying Contexts for Targeted Latent Activation0
SPoT: Subpixel Placement of Tokens in Vision Transformers0
Performance Assessment Strategies for Language Model Applications in Healthcare0
SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning0
Show:102550
← PrevPage 199 of 13232Next →