SOTAVerified

Benchmarking

Papers

Showing 47614770 of 5548 papers

TitleStatusHype
MiLiC-Eval: Benchmarking Multilingual LLMs for China's Minority LanguagesCode0
The LOCATA Challenge: Acoustic Source Localization and TrackingCode0
Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion ColliderCode0
A Meta-Analysis of the Anomaly Detection ProblemCode0
On the Measure of IntelligenceCode0
Generalization and Regularization in DQNCode0
Automatic Resolution of Domain Name DisputesCode0
Mind the XAI Gap: A Human-Centered LLM Framework for Democratizing Explainable AICode0
Automatic benchmarking of large multimodal models via iterative experiment programmingCode0
GenderBench: Evaluation Suite for Gender Biases in LLMsCode0
Show:102550
← PrevPage 477 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified