SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1615116200 of 474278 papers

TitleStatusHype
Beyond the Battlefield: Framing Analysis of Media Coverage in Conflict Reporting0
Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models0
Reliable Reasoning Path: Distilling Effective Guidance for LLM Reasoning with Knowledge Graphs0
Improving Named Entity Transcription with Contextual LLM-based Revision0
Slimming Down LLMs Without Losing Their Minds0
Magistral0
Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series0
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning0
TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document ReasoningCode2
Probably Approximately Correct LabelsCode1
ChineseHarm-Bench: A Chinese Harmful Content Detection BenchmarkCode2
Geometric Jensen-Shannon Divergence Between Gaussian Measures On Hilbert Space0
Graph-MLLM: Harnessing Multimodal Large Language Models for Multimodal Graph Learning0
Computational Complexity of Statistics: New Insights from Low-Degree Polynomials0
Air in Your Neighborhood: Fine-Grained AQI Forecasting Using Mobile Sensor DataCode0
PyLO: Towards Accessible Learned Optimizers in PyTorchCode1
TaxoAdapt: Aligning LLM-Based Multidimensional Taxonomy Construction to Evolving Research CorporaCode1
Burn After Reading: Do Multimodal Large Language Models Truly Capture Order of Events in Image Sequences?Code0
Mitigating Negative Interference in Multilingual Sequential Knowledge Editing through Null-Space ConstraintsCode0
Deep Learning-Based Digitization of Overlapping ECG Images with Open-Source Python CodeCode0
MSSDF: Modality-Shared Self-supervised Distillation for High-Resolution Multi-modal Remote Sensing Image LearningCode0
GLD-Road:A global-local decoding road network extraction model for remote sensing imagesCode0
Fast Monte Carlo Tree Diffusion: 100x Speedup via Parallel Sparse Planning0
Gaussian Herding across Pens: An Optimal Transport Perspective on Global Gaussian Reduction for 3DGS0
Alzheimer's Dementia Detection Using Perplexity from Paired Large Language Models0
Bench to the Future: A Pastcasting Benchmark for Forecasting Agents0
HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data0
ADAgent: LLM Agent for Alzheimer's Disease Analysis with Collaborative Coordinator0
HI-SQL: Optimizing Text-to-SQL Systems through Dynamic Hint Integration0
RePO: Replay-Enhanced Policy OptimizationCode1
SANGAM: SystemVerilog Assertion Generation via Monte Carlo Tree Self-RefineCode0
The NordDRG AI Benchmark for Large Language ModelsCode0
Med-REFL: Medical Reasoning Enhancement via Self-Corrected Fine-grained ReflectionCode0
Enhancing Bagging Ensemble Regression with Data Integration for Time Series-Based Diabetes Prediction0
ICE-ID: A Novel Historical Census Data Benchmark Comparing NARS against LLMs, \& a ML Ensemble on Longitudinal Identity Resolution0
CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error ScenariosCode1
LLM-Driven Data Generation and a Novel Soft Metric for Evaluating Text-to-SQL in Aviation MRO0
S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation0
SLRNet: A Real-Time LSTM-Based Sign Language Recognition SystemCode0
Autonomous Computer Vision Development with Agentic AICode0
Vector Representations of Vessel Trees0
Self-Calibrating BCIs: Ranking and Recovery of Mental Targets Without Labels0
Analysis of Anonymous User Interaction Relationships and Prediction of Advertising Feedback Based on Graph Neural Network0
Tracking of Intermittent and Moving Speakers : Dataset and Metrics0
FARCLUSS: Fuzzy Adaptive Rebalancing and Contrastive Uncertainty Learning for Semi-Supervised Semantic SegmentationCode0
ScholarSearch: Benchmarking Scholar Searching Ability of LLMs0
Mutual-Supervised Learning for Sequential-to-Parallel Code TranslationCode1
Marrying Autoregressive Transformer and Diffusion with Multi-Reference AutoregressionCode2
Quantifying Data Requirements for EEG Independent Component Analysis Using AMICA0
Eigenvalue-Based Detection in MIMO Systems for Integrated Sensing and Communication0
Show:102550
← PrevPage 324 of 9486Next →