SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 53015350 of 661570 papers

TitleStatusHype
Think Before You Lie: How Reasoning Leads to Honesty0
When Tables Go Crazy: Evaluating Multimodal Models on French Financial Documents0
SToRM: Supervised Token Reduction for Multi-modal LLMs toward efficient end-to-end autonomous driving0
Probing and Bridging Geometry-Interaction Cues for Affordance Reasoning in Vision Foundation Models0
Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning0
Under the Influence: Quantifying Persuasion and Vigilance in Large Language Models0
Toward Personalized LLM-Powered Agents: Foundations, Evaluation, and Future Directions0
TARAZ: Persian Short-Answer Question Benchmark for Cultural Evaluation of Language Models0
BLINK: Behavioral Latent Modeling of NK Cell Cytotoxicity0
Latent-Mark: An Audio Watermark Robust to Neural Resynthesis0
On the Statistical Optimality of Optimal Decision Trees0
Preserving Continuous Symmetry in Discrete Spaces: Geometric-Aware Quantization for SO(3)-Equivariant GNNs0
A Multilingual Human Annotated Corpus of Original and Easy-to-Read Texts to Support Access to Democratic Participatory Processes0
NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks0
Distributional Regression with Tabular Foundation Models: Evaluating Probabilistic Predictions via Proper Scoring Rules0
Structural Causal Bottleneck Models0
ForgeDreamer: Industrial Text-to-3D Generation with Multi-Expert LoRA and Cross-View Hypergraph0
Open-World Motion Forecasting0
GLM-OCR Technical Report0
MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants1
TOSSS: a CVE-based Software Security Benchmark for Large Language Models0
Detecting Intrinsic and Instrumental Self-Preservation in Autonomous Agents: The Unified Continuation-Interest Protocol0
Shape-of-You: Fused Gromov-Wasserstein Optimal Transport for Semantic Correspondence in-the-Wild0
BackdoorIDS: Zero-shot Backdoor Detection for Pretrained Vision Encoder0
Truth as a Compression Artifact in Language Model Training0
Mitigating the Multiplicity Burden: The Role of Calibration in Reducing Predictive Multiplicity of Classifiers0
Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models0
The DIME Architecture: A Unified Operational Algorithm for Neural Representation, Dynamics, Control and Integration0
Optimizing Task Completion Time Updates Using POMDPs0
MetaKE: Meta-learning Aligned Knowledge Editing via Bi-level Optimization0
The COTe score: A decomposable framework for evaluating Document Layout Analysis models0
A Fractional Fox H-Function Kernel for Support Vector Machines: Robust Classification via Weighted Transmutation Operators0
Exact Federated Continual Unlearning for Ridge Heads on Frozen Foundation Models0
BadLLM-TG: A Backdoor Defender powered by LLM Trigger Generator0
Beyond Final Answers: CRYSTAL Benchmark for Transparent Multimodal Reasoning Evaluation0
Semantic Invariance in Agentic AI0
MVHOI: Bridge Multi-view Condition to Complex Human-Object Interaction Video Reenactment via 3D Foundation Model0
Robust Building Damage Detection in Cross-Disaster Settings Using Domain Adaptation0
Scaling Autoregressive Models for Lattice Thermodynamics0
AURORA-KITTI: Any-Weather Depth Completion and Denoising in the Wild0
Beyond Local Code Optimization: Multi-Agent Reasoning for Software System Optimization0
Towards Next-Generation LLM Training: From the Data-Centric Perspective0
Training-Free Generation of Protein Sequences from Small Family Alignments via Stochastic Attention0
Multimodal Deep Learning for Early Prediction of Patient Deterioration in the ICU: Integrating Time-Series EHR Data with Clinical Notes0
GameUIAgent: An LLM-Powered Framework for Automated Game UI Design with Structured Intermediate Representation0
Enhancing Hands in 3D Whole-Body Pose Estimation with Conditional Hands Modulator0
Automated Diabetic Screening via Anterior Segment Ocular Imaging: A Deep Learning and Explainable AI Approach0
A Skill-augmented Agentic Framework and Benchmark for Multi-Video Understanding0
Gauge-Equivariant Intrinsic Neural Operators for Geometry-Consistent Learning of Elliptic PDE Maps0
Efficient Event Camera Volume System0
Show:102550
← PrevPage 107 of 13232Next →