SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 631–640 of 5548 papers

Title	Date	Tasks	Status	Hype
LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models	Aug 28, 2024	BenchmarkingLogical Reasoning	CodeCode Available	1
Variational Autoencoder for Anomaly Detection: A Comparative Study	Aug 24, 2024	Anomaly DetectionBenchmarking	CodeCode Available	1
Scribbles for All: Benchmarking Scribble Supervised Segmentation Across Datasets	Aug 22, 2024	AllBenchmarking	CodeCode Available	1
BLADE: Benchmarking Language Model Agents for Data-Driven Science	Aug 19, 2024	BenchmarkingDecision Making	CodeCode Available	1
PADetBench: Towards Benchmarking Physical Attacks against Object Detection	Aug 17, 2024	Adversarial RobustnessBenchmarking	CodeCode Available	1
SER Evals: In-domain and Out-of-domain Benchmarking for Speech Emotion Recognition	Aug 14, 2024	Automatic Speech RecognitionBenchmarking	CodeCode Available	1
TabularBench: Benchmarking Adversarial Robustness for Tabular Deep Learning in Real-world Use-cases	Aug 14, 2024	Adversarial RobustnessBenchmarking	CodeCode Available	1
Benchmarking tree species classification from proximally-sensed laser scanning data: introducing the FOR-species20K dataset	Aug 12, 2024	Benchmarking	CodeCode Available	1
The impact of internal variability on benchmarking deep learning climate emulators	Aug 9, 2024	BenchmarkingDeep Learning	CodeCode Available	1
UAV-Enhanced Combination to Application: Comprehensive Analysis and Benchmarking of a Human Detection Dataset for Disaster Scenarios	Aug 9, 2024	BenchmarkingHuman Detection	CodeCode Available	1

Show:10 25 50

← PrevPage 64 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified