SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4761–4770 of 5548 papers

Title	Date	Tasks	Status	Hype
MiLiC-Eval: Benchmarking Multilingual LLMs for China's Minority Languages	Mar 3, 2025	Benchmarking	CodeCode Available	0
The LOCATA Challenge: Acoustic Source Localization and Tracking	Sep 3, 2019	BenchmarkingSound Source Localization	CodeCode Available	0
Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion Collider	Apr 26, 2025	BenchmarkingGPU	CodeCode Available	0
A Meta-Analysis of the Anomaly Detection Problem	Mar 3, 2015	Anomaly DetectionBenchmarking	CodeCode Available	0
On the Measure of Intelligence	Nov 5, 2019	ARCBenchmarking	CodeCode Available	0
Generalization and Regularization in DQN	Sep 29, 2018	Atari GamesBenchmarking	CodeCode Available	0
Automatic Resolution of Domain Name Disputes	Nov 1, 2021	Benchmarking	CodeCode Available	0
Mind the XAI Gap: A Human-Centered LLM Framework for Democratizing Explainable AI	Jun 13, 2025	BenchmarkingIn-Context Learning	CodeCode Available	0
Automatic benchmarking of large multimodal models via iterative experiment programming	Jun 18, 2024	BenchmarkingLanguage Modeling	CodeCode Available	0
GenderBench: Evaluation Suite for Gender Biases in LLMs	May 17, 2025	Benchmarking	CodeCode Available	0

Show:10 25 50

← PrevPage 477 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified