SOTAVerified

Benchmarking

Papers

Showing 10011010 of 5548 papers

TitleStatusHype
ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain Readability AssessmentCode1
Towards Benchmarking and Assessing Visual Naturalness of Physical World Adversarial AttacksCode1
Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought MethodCode1
X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language ModelsCode1
PMC-VQA: Visual Instruction Tuning for Medical Visual Question AnsweringCode1
An Empirical Study on Google Research Football Multi-agent ScenariosCode1
A Platform for the Biomedical Application of Large Language ModelsCode1
Benchmarking large language models for biomedical natural language processing applications and recommendationsCode1
InfoMetIC: An Informative Metric for Reference-free Image Caption EvaluationCode1
DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated ObjectsCode1
Show:102550
← PrevPage 101 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified