SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3191–3200 of 5548 papers

Title	Date	Tasks	Status	Hype
FRED: The Florence RGB-Event Drone Dataset	Jun 5, 2025	BenchmarkingTrajectory Forecasting	—Unverified	0
Free Performance Gain from Mixing Multiple Partially Labeled Samples in Multi-label Image Classification	May 24, 2024	BenchmarkingData Augmentation	—Unverified	0
From 2D to 3D: Re-thinking Benchmarking of Monocular Depth Prediction	Mar 15, 2022	3D geometryBenchmarking	—Unverified	0
From Audio Encoders to Piano Judges: Benchmarking Performance Understanding for Solo Piano	Jul 5, 2024	AttributeBenchmarking	—Unverified	0
From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems	Oct 24, 2024	BenchmarkingCommon Sense Reasoning	—Unverified	0
From Code to Play: Benchmarking Program Search for Games Using Large Language Models	Dec 5, 2024	Atari GamesBenchmarking	—Unverified	0
From Environmental Sound Representation to Robustness of 2D CNN Models Against Adversarial Attacks	Apr 14, 2022	Adversarial AttackAdversarial Robustness	—Unverified	0
From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT	May 17, 2024	BenchmarkingMultiple-choice	—Unverified	0
From Generation to Detection: A Multimodal Multi-Task Dataset for Benchmarking Health Misinformation	May 24, 2025	ArticlesBenchmarking	—Unverified	0
From Grounding to Planning: Benchmarking Bottlenecks in Web Agents	Sep 3, 2024	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 320 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified