SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2491–2500 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion Collider	Apr 26, 2025	BenchmarkingGPU	CodeCode Available	0	5
GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking	May 24, 2023	BenchmarkingGraph Mining	CodeCode Available	0	5
Exploring Model-based Planning with Policy Networks	Jun 20, 2019	Benchmarkingmodel	CodeCode Available	0	5
GenderBench: Evaluation Suite for Gender Biases in LLMs	May 17, 2025	Benchmarking	CodeCode Available	0	5
GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal Data	Feb 22, 2024	Benchmarking	CodeCode Available	0	5
Benchmarking Language-agnostic Intent Classification for Virtual Assistant Platforms	Jul 1, 2022	BenchmarkingClassification	CodeCode Available	0	5
GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations	Jun 17, 2024	BenchmarkingDataset Generation	CodeCode Available	0	5
A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting	Apr 15, 2024	Benchmarking	CodeCode Available	0	5
Benchmarking Label Noise in Instance Segmentation: Spatial Noise Matters	Jun 16, 2024	BenchmarkingInstance Segmentation	CodeCode Available	0	5
Fully Automatic Segmentation of Gross Target Volume and Organs-at-Risk for Radiotherapy Planning of Nasopharyngeal Carcinoma	Oct 4, 2023	BenchmarkingSegmentation	CodeCode Available	0	5

Show:10 25 50

← PrevPage 250 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified