SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 20762100 of 661570 papers

TitleStatusHype
Confidence Calibration under Ambiguous Ground Truth0
TreeTeaming: Autonomous Red-Teaming of Vision-Language Models via Hierarchical Strategy Exploration0
ForestPrune: High-ratio Visual Token Compression for Video Multimodal Large Language Models via Spatial-Temporal Forest Modeling0
From the AI Act to a European AI Agency: Completing the Union's Regulatory Architecture0
Multilingual KokoroChat: A Multi-LLM Ensemble Translation Method for Creating a Multilingual Counseling Dialogue Dataset0
When AVSR Meets Video Conferencing: Dataset, Degradation, and the Hidden Mechanism Behind Performance Collapse0
EVA: Efficient Reinforcement Learning for End-to-End Video Agent0
The EU AI Act and the Rights-based Approach to Technological Governance0
Quality Over Clicks: Intrinsic Quality-Driven Iterative Reinforcement Learning for Cold-Start E-Commerce Query Suggestion0
ProGRank: Probe-Gradient Reranking to Defend Dense-Retriever RAG from Corpus Poisoning0
Caption Generation for Dongba Paintings via Prompt Learning and Semantic Fusion0
Weak-PDE-Net: Discovering Open-Form PDEs via Differentiable Symbolic Networks and Weak Formulation0
Cluster-Wise Spatio-Temporal Masking for Efficient Video-Language Pretraining0
Privacy-Preserving EHR Data Transformation via Geometric Operators: A Human-AI Co-Design Technical Report0
Set-Valued Prediction for Large Language Models with Feasibility-Aware Coverage Guarantees0
Beyond Theoretical Bounds: Empirical Privacy Loss Calibration for Text Rewriting Under Local Differential Privacy0
FCL-COD: Weakly Supervised Camouflaged Object Detection with Frequency-aware and Contrastive Learning0
Where Experts Disagree, Models Fail: Detecting Implicit Legal Citations in French Court Decisions0
DariMis: Harm-Aware Modeling for Dari Misinformation Detection on YouTube0
JFTA-Bench: Evaluate LLM's Ability of Tracking and Analyzing Malfunctions Using Fault Trees0
Can Graph Foundation Models Generalize Over Architecture?0
Beyond Hate: Differentiating Uncivil and Intolerant Speech in Multimodal Content Moderation0
VQ-Jarvis: Retrieval-Augmented Video Restoration Agent with Sharp Vision and Fast Thought0
PaperVoyager : Building Interactive Web with Visual Language Models0
On the use of Aggregation Operators to improve Human Identification using Dental Records0
Show:102550
← PrevPage 84 of 26463Next →