SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 77017750 of 661570 papers

TitleStatusHype
Ill-Conditioning in Dictionary-Based Dynamic-Equation Learning: A Systems Biology Case Study0
Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover0
On the Computational Hardness of Transformers0
LLM-Augmented Digital Twin for Policy Evaluation in Short-Video Platforms0
RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents0
DriveXQA: Cross-modal Visual Question Answering for Adverse Driving Scene Understanding0
FinRule-Bench: A Benchmark for Joint Reasoning over Financial Tables and Principles0
Evaluating Explainable AI Attribution Methods in Neural Machine Translation via Attention-Guided Knowledge Distillation0
Spatially Robust Inference with Predicted and Missing at Random Labels0
Learning to Assist: Physics-Grounded Human-Human Control via Multi-Agent Reinforcement Learning0
Novelty Adaptation Through Hybrid Large Language Model (LLM)-Symbolic Planning and LLM-guided Reinforcement Learning0
TimeSqueeze: Dynamic Patching for Efficient Time Series Forecasting0
Ensuring Safety in Automated Mechanical Ventilation through Offline Reinforcement Learning and Digital Twin Verification0
How do AI agents talk about science and research? An exploration of scientific discussions on Moltbook using BERTopic0
Vision-Based Hand Shadowing for Robotic Manipulation via Inverse Kinematics0
Beyond the Class Subspace: Teacher-Guided Training for Reliable Out-of-Distribution Detection in Single-Domain Models0
Improving LLM Performance Through Black-Box Online Tuning: A Case for Adding System Specs to Factsheets for Trusted AI0
Hierarchical Granularity Alignment and State Space Modeling for Robust Multimodal AU Detection in the Wild0
The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning0
Towards Trustworthy Selective Generation: Reliability-Guided Diffusion for Ultra-Low-Field to High-Field MRI Synthesis0
POrTAL: Plan-Orchestrated Tree Assembly for Lookahead0
Busemann Functions in the Wasserstein Space: Existence, Closed-Forms, and Applications to Slicing0
Temporal Text Classification with Large Language Models0
abx_amr_simulator: A simulation environment for antibiotic prescribing policy optimization under antimicrobial resistance0
Continued Pretraining for Low-Resource Swahili ASR: Achieving State-of-the-Art Performance with Minimal Labeled Data0
On the Robustness of Langevin Dynamics to Score Function Error0
TRACE: AI-Assisted Assessment of Collaborative Projects in Computer Science Education0
Single molecule localization microscopy challenge: a biologically inspired benchmark for long-sequence modeling0
Multilingual Financial Fraud Detection Using Machine Learning and Transformer Models: A Bangla-English Study0
ThReadMed-QA: A Multi-Turn Medical Dialogue Benchmark from Real Patient Questions0
Geodesic Semantic Search: Learning Local Riemannian Metrics for Citation Graph RetrievalCode0
Synthetic Data Generation for Brain-Computer Interfaces: Overview, Benchmarking, and Future DirectionsCode0
Multi-objective Genetic Programming with Multi-view Multi-level Feature for Enhanced Protein Secondary Structure PredictionCode0
UNet-AF: An alias-free UNet for image restorationCode0
ResearchGym: Evaluating Language Model Agents on Real-World AI Research1
Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper1
A Survey of Reasoning in Autonomous Driving Systems: Open Challenges and Emerging Paradigms0
SwissGov-RSD: A Human-annotated, Cross-lingual Benchmark for Token-level Recognition of Semantic Differences Between Related Documents0
Communication Enables Cooperation in LLM Agents: A Comparison with Curriculum-Based Approaches0
Copula-ResLogit: A Deep-Copula Framework for Unobserved Confounding Effects0
UrbanAlign: Post-hoc Semantic Calibration for VLM-Human Preference Alignment0
Beyond Relevance: On the Relationship Between Retrieval and RAG Information Coverage0
Evaluation of LLMs in retrieving food and nutritional context for RAG systems0
Quantum entanglement provides a competitive advantage in adversarial games0
A New Tensor Network: Tubal Tensor Train and Its Applications0
MUNIChus: Multilingual News Image Captioning Benchmark0
Revisiting Value Iteration: Unified Analysis of Discounted and Average-Reward Cases0
Novel Architecture of RPA In Oral Cancer Lesion Detection0
Interventional Time Series Priors for Causal Foundation Models0
The Attack and Defense Landscape of Agentic AI: A Comprehensive Survey0
Show:102550
← PrevPage 155 of 13232Next →