SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2971–2980 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking the Robustness of Quantized Models	Apr 8, 2023	BenchmarkingQuantization	—Unverified	0	0
Vulnerability of Face Morphing Attacks: A Case Study on Lookalike and Identical Twins	Mar 24, 2023	BenchmarkingFace Recognition	—Unverified	0	0
Ice Cream Doesn't Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference	May 19, 2025	BenchmarkingCausal Inference	—Unverified	0	0
ICE-ID: A Novel Historical Census Data Benchmark Comparing NARS against LLMs, \& a ML Ensemble on Longitudinal Identity Resolution	Jun 11, 2025	Benchmarking	—Unverified	0	0
ICON^2: Reliably Benchmarking Predictive Inequity in Object Detection	Jun 7, 2023	AttributeAutonomous Driving	—Unverified	0	0
Benchmarking the Robustness of Panoptic Segmentation for Automated Driving	Feb 23, 2024	BenchmarkingDecision Making	—Unverified	0	0
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs	Oct 2, 2024	BenchmarkingHallucination	—Unverified	0	0
Identifiable Convex-Concave Regression via Sub-gradient Regularised Least Squares	Jun 22, 2025	Benchmarkingregression	—Unverified	0	0
Identification of vortex in unstructured mesh with graph neural networks	Nov 11, 2023	BenchmarkingGraph Generation	—Unverified	0	0
The Leaderboard Illusion	Apr 29, 2025	BenchmarkingChatbot	—Unverified	0	0

Show:10 25 50

← PrevPage 298 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified