SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3571–3580 of 5548 papers

Title	Date	Tasks	Status	Hype
On the Performance of Multimodal Language Models	Oct 4, 2023	BenchmarkingBinary Classification	—Unverified	0
On the Potential of Large Language Models to Solve Semantics-Aware Process Mining Tasks	Apr 29, 2025	Anomaly DetectionBenchmarking	—Unverified	0
On the project risk baseline: integrating aleatory uncertainty into project scheduling	May 31, 2024	BenchmarkingScheduling	—Unverified	0
On the Real-Time Semantic Segmentation of Aphid Clusters in the Wild	Jul 17, 2023	BenchmarkingReal-Time Semantic Segmentation	—Unverified	0
On the reduction of Linear Parameter-Varying State-Space models	Apr 2, 2024	BenchmarkingDimensionality Reduction	—Unverified	0
On the relationship between Benchmarking, Standards and Certification in Robotics and AI	Sep 21, 2023	Benchmarking	—Unverified	0
On the Reliability and Validity of Detecting Approval of Political Actors in Tweets	Nov 1, 2020	BenchmarkingSentiment Analysis	—Unverified	0
On the Robustness of Human-Object Interaction Detection against Distribution Shift	Jun 22, 2025	BenchmarkingData Augmentation	—Unverified	0
On the role of benchmarking data sets and simulations in method comparison studies	Aug 2, 2022	Benchmarking	—Unverified	0
Optimizer Benchmarking Needs to Account for Hyperparameter Tuning	Oct 25, 2019	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 358 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified