SOTAVerified

Benchmarking

Papers

Showing 20812090 of 5548 papers

TitleStatusHype
How well it works: Benchmarking performance of GPT models on medical natural language processing tasks0
DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition0
A PRISMA Driven Systematic Review of Publicly Available Datasets for Benchmark and Model Developments for Industrial Defect Detection0
Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing0
Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images0
RAD: A Comprehensive Dataset for Benchmarking the Robustness of Image Anomaly DetectionCode1
Benchmarking Vision-Language Contrastive Methods for Medical Representation LearningCode0
MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models0
AudioMarkBench: Benchmarking Robustness of Audio WatermarkingCode1
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language ModelsCode0
Show:102550
← PrevPage 209 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified