SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3861–3870 of 5548 papers

Title	Date	Tasks	Status	Hype
Separating form and meaning: Using self-consistency to quantify task understanding across multiple senses	May 19, 2023	BenchmarkingForm	CodeCode Available	0
Ahead-of-Time P-Tuning	May 18, 2023	Benchmarkingparameter-efficient fine-tuning	—Unverified	0
Benchmarking Deep Learning Frameworks for Automated Diagnosis of Ocular Toxoplasmosis: A Comprehensive Approach to Classification and Segmentation	May 18, 2023	BenchmarkingDiagnostic	—Unverified	0
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization	May 18, 2023	BenchmarkingGPU	—Unverified	0
Human Behavioral Benchmarking: Numeric Magnitude Comparison Effects in Large Language Models	May 18, 2023	Benchmarking	—Unverified	0
Smiling Women Pitching Down: Auditing Representational and Presentational Gender Biases in Image Generative AI	May 17, 2023	Benchmarking	—Unverified	0
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks	May 17, 2023	Benchmarking	—Unverified	0
Restoring Images Captured in Arbitrary Hybrid Adverse Weather Conditions in One Go	May 17, 2023	BenchmarkingImage Restoration	—Unverified	0
DLUE: Benchmarking Document Language Understanding	May 16, 2023	BenchmarkingDocument Classification	—Unverified	0
OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking	May 15, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0

Show:10 25 50

← PrevPage 387 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified