SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2581–2590 of 5548 papers

Title	Date	Tasks	Status	Hype
A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray Interpretation	Jan 22, 2024	BenchmarkingDiagnostic	CodeCode Available	3
Benchmarking Large Multimodal Models against Common Corruptions	Jan 22, 2024	BenchmarkingImage to text	CodeCode Available	1
CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling	Jan 21, 2024	Benchmarking	CodeCode Available	1
Data-Driven Target Localization: Benchmarking Gradient Descent Using the Cramer-Rao Bound	Jan 20, 2024	Benchmarking	—Unverified	0
Data Augmentation for Traffic Classification	Jan 19, 2024	BenchmarkingClassification	—Unverified	0
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents	Jan 18, 2024	Benchmarking	CodeCode Available	2
Harnessing Orthogonality to Train Low-Rank Neural Networks	Jan 16, 2024	Benchmarking	CodeCode Available	0
WAVES: Benchmarking the Robustness of Image Watermarks	Jan 16, 2024	Benchmarking	CodeCode Available	2
NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription	Jan 16, 2024	Automatic Speech RecognitionBenchmarking	—Unverified	0
Large Language Models are Null-Shot Learners	Jan 16, 2024	Arithmetic ReasoningBenchmarking	—Unverified	0

Show:10 25 50

← PrevPage 259 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified