SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2521–2530 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking Robustness of Deep Learning Classifiers Using Two-Factor Perturbation	Mar 2, 2022	BenchmarkingDeep Learning	—Unverified	0
A tale of two toolkits, report the first: benchmarking time series classification algorithms for correctness and efficiency	Sep 12, 2019	BenchmarkingGeneral Classification	—Unverified	0
Benchmarking Robustness of Contrastive Learning Models for Medical Image-Report Retrieval	Jan 15, 2025	BenchmarkingContrastive Learning	—Unverified	0
Benchmarking Robustness of AI-Enabled Multi-sensor Fusion Systems: Challenges and Opportunities	Jun 6, 2023	BenchmarkingDepth Completion	—Unverified	0
A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models	Jun 17, 2024	BenchmarkingSurvey	—Unverified	0
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models	Jun 3, 2023	Benchmarking	—Unverified	0
AI vs. Human Judgment of Content Moderation: LLM-as-a-Judge and Ethics-Based Response Refusals	May 21, 2025	BenchmarkingChatbot	—Unverified	0
Benchmarking Robustness in Neural Radiance Fields	Jan 10, 2023	BenchmarkingCamera Calibration	—Unverified	0
A Systematic Evaluation of Domain Adaptation Algorithms On Time Series Data	Sep 29, 2021	BenchmarkingDomain Adaptation	—Unverified	0
Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO	Aug 30, 2023	BenchmarkingReinforcement Learning (RL)	—Unverified	0

Show:10 25 50

← PrevPage 253 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified