SOTAVerified

Benchmarking

Papers

Showing 391400 of 5548 papers

TitleStatusHype
Challenges and Opportunities in Offline Reinforcement Learning from Visual ObservationsCode2
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence ActCode2
R-Judge: Benchmarking Safety Risk Awareness for LLM AgentsCode2
Benchmarking Robustness of 3D Point Cloud Recognition Against Common CorruptionsCode2
Desbordante: from benchmarking suite to high-performance science-intensive data profiler (preprint)Code2
Quanda: An Interpretability Toolkit for Training Data Attribution Evaluation and BeyondCode2
COSMOS: Catching Out-of-Context Misinformation with Self-Supervised LearningCode1
Category-wise Fine-Tuning: Resisting Incorrect Pseudo-Labels in Multi-Label Image Classification with Partial LabelsCode1
RADAR: Benchmarking Language Models on Imperfect Tabular DataCode1
Application-Oriented Benchmarking of Quantum Generative Learning Using QUARKCode1
Show:102550
← PrevPage 40 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified