SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 501550 of 659983 papers

TitleStatusHype
IslamicMMLU: A Benchmark for Evaluating LLMs on Islamic Knowledge0
IJmond Industrial Smoke Segmentation Dataset0
Self Paced Gaussian Contextual Reinforcement Learning0
Learning Cross-Joint Attention for Generalizable Video-Based Seizure Detection0
Towards a general-purpose foundation model for fMRI analysis0
UniCA: Unified Covariate Adaptation for Time Series Foundation Model0
Children's Intelligence Tests Pose Challenges for MLLMs? KidGym: A 2D Grid-Based Reasoning Benchmark for MLLMs0
CRoCoDiL: Continuous and Robust Conditioned Diffusion for Language0
An Industrial-Scale Retrieval-Augmented Generation Framework for Requirements Engineering: Empirical Evaluation with Automotive Manufacturing Data0
GHOST: Ground-projected Hypotheses from Observed Structure-from-Motion Trajectories0
MKA: Memory-Keyed Attention for Efficient Long-Context Reasoning0
ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework0
Exponential Family Discriminant Analysis: Generalizing LDA-Style Generative Classification to Non-Gaussian Models0
Towards Intelligent Geospatial Data Discovery: a knowledge graph-driven multi-agent framework powered by large language models0
PiLoT: Neural Pixel-to-3D Registration for UAV-based Ego and Target Geo-localization0
LPNSR: Prior-Enhanced Diffusion Image Super-Resolution via LR-Guided Noise Prediction0
2Xplat: Two Experts Are Better Than One Generalist0
Cerebra: A Multidisciplinary AI Board for Multimodal Dementia Characterization and Risk Assessment0
Uncertainty Quantification for Distribution-to-Distribution Flow Matching in Scientific Imaging0
CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement Learning0
BadminSense: Enabling Fine-Grained Badminton Stroke Evaluation on a Single Smartwatch0
Generative Inversion of Spectroscopic Data for Amorphous Structure Elucidation0
Uncertainty-guided Compositional Alignment with Part-to-Whole Semantic Representativeness in Hyperbolic Vision-Language Models0
Beyond Matching to Tiles: Bridging Unaligned Aerial and Satellite Views for Vision-Only UAV Navigation0
Decoding AI Authorship: Can LLMs Truly Mimic Human Style Across Literature and Politics?0
Benchmarking Multi-Agent LLM Architectures for Financial Document Processing: A Comparative Study of Orchestration Patterns, Cost-Accuracy Tradeoffs and Production Scaling Strategies0
Generalizing Dynamics Modeling More Easily from Representation Perspective0
Large-Scale Avalanche Mapping from SAR Images with Deep Learning-based Change Detection0
How Far Can VLMs Go for Visual Bug Detection? Studying 19,738 Keyframes from 41 Hours of Gameplay Videos0
Detecting Non-Membership in LLM Training Data via Rank Correlations0
Who Spoke What When? Evaluating Spoken Language Models for Conversational ASR with Semantic and Overlap-Aware Metrics0
Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Constraints0
HyFI: Hyperbolic Feature Interpolation for Brain-Vision Alignment0
Double Coupling Architecture and Training Method for Optimization Problems of Differential Algebraic Equations with Parameters0
Spiking Personalized Federated Learning for Brain-Computer Interface-Enabled Immersive Communication0
Behavioral Heterogeneity as Quantum-Inspired Representation0
How Utilitarian Are OpenAI's Models Really? Replicating and Reinterpreting Pfeffer, Krügel, and Uhl (2025)0
Reconstruction-Guided Slot Curriculum: Addressing Object Over-Fragmentation in Video Object-Centric Learning0
ENC-Bench: A Benchmark for Evaluating Multimodal Large Language Models in Electronic Navigational Chart Understanding0
DALDALL: Data Augmentation for Lexical and Semantic Diverse in Legal Domain by leveraging LLM-Persona0
From Overload to Convergence: Supporting Multi-Issue Human-AI Negotiation with Bayesian Visualization0
Can LLM Agents Generate Real-World Evidence? Evaluating Observational Studies in Medical Databases0
From Pixels to Semantics: A Multi-Stage AI Framework for Structural Damage Detection in Satellite Imagery0
From Arithmetic to Logic: The Resilience of Logic and Lookup-Based Neural Networks Under Parameter Bit-Flips0
Explainable Threat Attribution for IoT Networks Using Conditional SHAP and Flow Behavior Modelling0
Viewport-based Neural 360° Image Compression0
AgriPestDatabase-v1.0: A Structured Insect Dataset for Training Agricultural Large Language Model0
Typography-Based Monocular Distance Estimation Framework for Vehicle Safety Systems0
Know3D: Prompting 3D Generation with Knowledge from Vision-Language Models0
Caterpillar of Thoughts: The Optimal Test-Time Algorithm for Large Language Models0
Show:102550
← PrevPage 11 of 13200Next →