SOTAVerified

Benchmarking

Papers

Showing 33013310 of 5548 papers

TitleStatusHype
X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language ModelsCode1
Human Behavioral Benchmarking: Numeric Magnitude Comparison Effects in Large Language Models0
Smiling Women Pitching Down: Auditing Representational and Presentational Gender Biases in Image Generative AI0
PMC-VQA: Visual Instruction Tuning for Medical Visual Question AnsweringCode1
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks0
Restoring Images Captured in Arbitrary Hybrid Adverse Weather Conditions in One Go0
DLUE: Benchmarking Document Language Understanding0
An Empirical Study on Google Research Football Multi-agent ScenariosCode1
Benchmarking the human brain against computational architectures0
OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking0
Show:102550
← PrevPage 331 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified