SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2451–2460 of 5548 papers

Title	Date	Tasks	Status	Hype
The 6th Affective Behavior Analysis in-the-wild (ABAW) Competition	Feb 29, 2024	Action Unit DetectionArousal Estimation	—Unverified	0
Benchmarking Uncertainty Disentanglement: Specialized Uncertainties for Specialized Tasks	Feb 29, 2024	BenchmarkingDisentanglement	CodeCode Available	2
Efficient Lifelong Model Evaluation in an Era of Rapid Progress	Feb 29, 2024	BenchmarkingGPU	CodeCode Available	1
FlowCyt: A Comparative Study of Deep Learning Approaches for Multi-Class Classification in Flow Cytometry Benchmarking	Feb 28, 2024	BenchmarkingInductive Learning	CodeCode Available	0
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions	Feb 28, 2024	BenchmarkingMultiple-choice	CodeCode Available	1
Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models	Feb 28, 2024	BenchmarkingHallucination	CodeCode Available	0
The Seeker's Dilemma: Realistic Formulation and Benchmarking for Hardware Trojan Detection	Feb 27, 2024	Benchmarking	—Unverified	0
Beacon, a lightweight deep reinforcement learning benchmark library for flow control	Feb 27, 2024	BenchmarkingCPU	CodeCode Available	1
Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies	Feb 27, 2024	BenchmarkingSystematic Generalization	—Unverified	0
Benchmarking Data Science Agents	Feb 27, 2024	BenchmarkingCode Generation	CodeCode Available	1

Show:10 25 50

← PrevPage 246 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified