SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1001–1010 of 5548 papers

Title	Date	Tasks	Status	Hype
ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain Readability Assessment	May 23, 2023	BenchmarkingCross-Lingual Transfer	CodeCode Available	1
Towards Benchmarking and Assessing Visual Naturalness of Physical World Adversarial Attacks	May 22, 2023	Adversarial AttackAutonomous Driving	CodeCode Available	1
Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method	May 22, 2023	BenchmarkingHallucination	CodeCode Available	1
X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models	May 18, 2023	BenchmarkingImage Generation	CodeCode Available	1
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering	May 17, 2023	BenchmarkingDiagnostic	CodeCode Available	1
An Empirical Study on Google Research Football Multi-agent Scenarios	May 16, 2023	BenchmarkingMulti-agent Reinforcement Learning	CodeCode Available	1
A Platform for the Biomedical Application of Large Language Models	May 10, 2023	BenchmarkingPrivacy Preserving	CodeCode Available	1
Benchmarking large language models for biomedical natural language processing applications and recommendations	May 10, 2023	BenchmarkingDocument Classification	CodeCode Available	1
InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation	May 10, 2023	BenchmarkingImage Captioning	CodeCode Available	1
DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects	May 9, 2023	BenchmarkingDecision Making	CodeCode Available	1

Show:10 25 50

← PrevPage 101 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified