SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4701–4710 of 5548 papers

Title	Date	Tasks	Status	Hype
Hard-Label Cryptanalytic Extraction of Neural Network Models	Sep 18, 2024	Benchmarking	CodeCode Available	0
Dynamic Neighborhood Construction for Structured Large Discrete Action Spaces	May 31, 2023	BenchmarkingRecommendation Systems	CodeCode Available	0
Benchmarking Top-K Keyword and Top-K Document Processing with T^2K^2 and T^2K^2D^2	Apr 20, 2018	Benchmarking	CodeCode Available	0
HammerBench: Fine-Grained Function-Calling Evaluation in Real Mobile Device Scenarios	Dec 21, 2024	Benchmarking	CodeCode Available	0
MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Chatbots and Dialogue Evaluators	May 28, 2025	BenchmarkingChatbot	CodeCode Available	0
MedArabiQ: Benchmarking Large Language Models on Arabic Medical Tasks	May 6, 2025	BenchmarkingMultiple-choice	CodeCode Available	0
Benchmarking tools for a priori identifiability analysis	Jul 20, 2022	Benchmarking	CodeCode Available	0
MedBookVQA: A Systematic and Comprehensive Medical Benchmark Derived from Open-Access Book	Jun 1, 2025	Benchmarking	CodeCode Available	0
Benchmarking time series classification -- Functional data vs machine learning approaches	Nov 18, 2019	Additive modelsBenchmarking	CodeCode Available	0
Benchmarking the Robustness of UAV Tracking Against Common Corruptions	Mar 18, 2024	Benchmarking	CodeCode Available	0

Show:10 25 50

← PrevPage 471 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified