SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2081–2090 of 5548 papers

Title	Date	Tasks	Status	Hype
How well it works: Benchmarking performance of GPT models on medical natural language processing tasks	Jun 12, 2024	Benchmarking	—Unverified	0
DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition	Jun 11, 2024	BenchmarkingCross-corpus	—Unverified	0
A PRISMA Driven Systematic Review of Publicly Available Datasets for Benchmark and Model Developments for Industrial Defect Detection	Jun 11, 2024	BenchmarkingDefect Detection	—Unverified	0
Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing	Jun 11, 2024	BenchmarkingStance Detection	—Unverified	0
Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images	Jun 11, 2024	BenchmarkingGPU	—Unverified	0
RAD: A Comprehensive Dataset for Benchmarking the Robustness of Image Anomaly Detection	Jun 11, 2024	Anomaly DetectionBenchmarking	CodeCode Available	1
Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning	Jun 11, 2024	BenchmarkingContrastive Learning	CodeCode Available	0
MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models	Jun 11, 2024	BenchmarkingFairness	—Unverified	0
AudioMarkBench: Benchmarking Robustness of Audio Watermarking	Jun 11, 2024	Benchmarkingtext-to-speech	CodeCode Available	1
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models	Jun 10, 2024	BenchmarkingCode Generation	CodeCode Available	0

Show:10 25 50

← PrevPage 209 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified