SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1751–1760 of 5548 papers

Title	Date	Tasks	Status	Hype
BLADE: Benchmarking Language Model Agents for Data-Driven Science	Aug 19, 2024	BenchmarkingDecision Making	CodeCode Available	1
Large Language Models for Classical Chinese Poetry Translation: Benchmarking, Evaluating, and Improving	Aug 19, 2024	BenchmarkingMachine Translation	—Unverified	0
Benchmarking quantum machine learning kernel training for classification tasks	Aug 17, 2024	BenchmarkingQuantum Machine Learning	CodeCode Available	0
PADetBench: Towards Benchmarking Physical Attacks against Object Detection	Aug 17, 2024	Adversarial RobustnessBenchmarking	CodeCode Available	1
Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors	Aug 15, 2024	BenchmarkingManagement	—Unverified	0
SustainDC: Benchmarking for Sustainable Data Center Control	Aug 14, 2024	BenchmarkingManagement	CodeCode Available	2
SER Evals: In-domain and Out-of-domain Benchmarking for Speech Emotion Recognition	Aug 14, 2024	Automatic Speech RecognitionBenchmarking	CodeCode Available	1
TabularBench: Benchmarking Adversarial Robustness for Tabular Deep Learning in Real-world Use-cases	Aug 14, 2024	Adversarial RobustnessBenchmarking	CodeCode Available	1
XCompress: LLM assisted Python-based text compression toolkit	Aug 12, 2024	BenchmarkingLanguage Modeling	CodeCode Available	0
Benchmarking tree species classification from proximally-sensed laser scanning data: introducing the FOR-species20K dataset	Aug 12, 2024	Benchmarking	CodeCode Available	1

Show:10 25 50

← PrevPage 176 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified