SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3271–3280 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking the Robustness of UAV Tracking Against Common Corruptions	Mar 18, 2024	Benchmarking	CodeCode Available	0
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety	Mar 18, 2024	BenchmarkingMathematical Reasoning	—Unverified	0
Granular Change Accuracy: A More Accurate Performance Metric for Dialogue State Tracking	Mar 17, 2024	BenchmarkingDialogue State Tracking	—Unverified	0
FlowMind: Automatic Workflow Generation with LLMs	Mar 17, 2024	BenchmarkingQuestion Answering	—Unverified	0
Depression Detection on Social Media with Large Language Models	Mar 16, 2024	BenchmarkingDepression Detection	—Unverified	0
Benchmarking Adversarial Robustness of Image Shadow Removal with Shadow-adaptive Attacks	Mar 15, 2024	Adversarial AttackAdversarial Robustness	—Unverified	0
Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study	Mar 15, 2024	Benchmarking	CodeCode Available	0
SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different Languages	Mar 14, 2024	BenchmarkingDimensionality Reduction	CodeCode Available	0
Attention-based Class-Conditioned Alignment for Multi-Source Domain Adaptation of Object Detectors	Mar 14, 2024	BenchmarkingDomain Adaptation	CodeCode Available	0
Semi-Supervised Learning for Anomaly Traffic Detection via Bidirectional Normalizing Flows	Mar 13, 2024	Anomaly DetectionBenchmarking	CodeCode Available	0

Show:10 25 50

← PrevPage 328 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified