SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1181–1190 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking saliency methods for chest X-ray interpretation	Oct 10, 2022	BenchmarkingDecision Making	CodeCode Available	1	5
Benchmarking Multi-Scene Fire and Smoke Detection	Oct 22, 2024	Benchmarking	CodeCode Available	1	5
Benchmarking Segmentation Models with Mask-Preserved Attribute Editing	Mar 2, 2024	AttributeBenchmarking	CodeCode Available	1	5
Benchmarking Self-Supervised Learning on Diverse Pathology Datasets	Dec 9, 2022	BenchmarkingClassification	CodeCode Available	1	5
GraphArena: Benchmarking Large Language Models on Graph Computational Problems	Jun 29, 2024	BenchmarkingHallucination	CodeCode Available	1	5
GraphWorld: Fake Graphs Bring Real Insights for GNNs	Feb 28, 2022	Benchmarking	CodeCode Available	1	5
Benchmarking Natural Language Understanding Services for building Conversational Agents	Mar 13, 2019	BenchmarkingGeneral Classification	CodeCode Available	1	5
Boosting Neural Image Compression for Machines Using Latent Space Masking	Dec 15, 2021	BenchmarkingImage Compression	CodeCode Available	1	5
Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions	May 27, 2022	BenchmarkingFew-Shot Image Classification	CodeCode Available	1	5
GNNs as Predictors of Agentic Workflow Performances	Mar 14, 2025	BenchmarkingPosition	CodeCode Available	1	5

Show:10 25 50

← PrevPage 119 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified