SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 301325 of 659983 papers

TitleStatusHype
Synthetic or Authentic? Building Mental Patient Simulators from Longitudinal Evidence0
Explanation Generation for Contradiction Reconciliation with LLMs0
Multitask-Informed Prior for In-Context Learning on Tabular Data: Application to Steel Property Prediction0
Analysing LLM Persona Generation and Fairness Interpretation in Polarised Geopolitical Contexts0
CoMaTrack: Competitive Multi-Agent Game-Theoretic Tracking with Vision-Language-Action Models0
Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought0
UniQueR: Unified Query-based Feedforward 3D Reconstruction0
Gau-Occ: Geometry-Completed Gaussians for Multi-Modal 3D Occupancy Prediction0
Agent Audit: A Security Analysis System for LLM Agent Applications0
Avoiding Over-smoothing in Social Media Rumor Detection with Pre-trained Propagation Tree Transformer0
Agent-Sentry: Bounding LLM Agents via Execution Provenance0
Chain-of-Authorization: Internalizing Authorization into Large Language Models via Reasoning Trajectories0
Designing to Forget: Deep Semi-parametric Models for Unlearning0
Dynamical Systems Theory Behind a Hierarchical Reasoning Model0
ForeSea: AI Forensic Search with Multi-modal Queries for Video Surveillance0
Template-Based Feature Aggregation Network for Industrial Anomaly Detection0
VLGOR: Visual-Language Knowledge Guided Offline Reinforcement Learning for Generalizable Agents0
Off-Policy Evaluation and Learning for Survival Outcomes under Censoring0
Separating Diagnosis from Control: Auditable Policy Adaptation in Agent-Based Simulations with LLM-Based Diagnostics0
EchoKV: Efficient KV Cache Compression via Similarity-Based Reconstruction0
Ran Score: a LLM-based Evaluation Score for Radiology Report Generation0
FixationFormer: Direct Utilization of Expert Gaze Trajectories for Chest X-Ray Classification0
Algorithmic warm starts for Hamiltonian Monte Carlo0
Beyond Binary Correctness: Scaling Evaluation of Long-Horizon Agents on Subjective Enterprise Tasks0
REALITrees: Rashomon Ensemble Active Learning for Interpretable Trees0
Show:102550
← PrevPage 13 of 26400Next →