SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3221–3230 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
TIIF-Bench: How Does Your T2I Model Follow Your Instructions?	Jun 2, 2025	BenchmarkingInstruction Following	—Unverified	0	0
Knowledge Sharing in Manufacturing using Large Language Models: User Evaluation and Model Benchmarking	Jan 10, 2024	BenchmarkingInformation Retrieval	—Unverified	0	0
3D Compositional Zero-shot Learning with DeCompositional Consensus	Nov 29, 2021	BenchmarkingCompositional Zero-Shot Learning	—Unverified	0	0
Benchmarking Performance of Deep Learning Model for Material Segmentation on Two HPC Systems	Jul 27, 2023	BenchmarkingGPU	—Unverified	0	0
Know Thy Judge: On the Robustness Meta-Evaluation of LLM Safety Judges	Mar 6, 2025	BenchmarkingLanguage Modeling	—Unverified	0	0
Benchmarking Pedestrian Odometry: The Brown Pedestrian Odometry Dataset (BPOD)	Dec 24, 2021	BenchmarkingPosition	—Unverified	0	0
Benchmarking PathCLIP for Pathology Image Analysis	Jan 5, 2024	BenchmarkingDecision Making	—Unverified	0	0
Kolmogorov-Arnold Network for Transistor Compact Modeling	Mar 19, 2025	Benchmarking	—Unverified	0	0
Koopman Theory-Inspired Method for Learning Time Advancement Operators in Unstable Flame Front Evolution	Dec 11, 2024	Benchmarking	—Unverified	0	0
Benchmarking Out-of-Distribution Generalization Capabilities of DNN-based Encoding Models for the Ventral Visual Cortex	Jun 16, 2024	BenchmarkingObject Recognition	—Unverified	0	0

Show:10 25 50

← PrevPage 323 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified