| ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain Readability Assessment | May 23, 2023 | BenchmarkingCross-Lingual Transfer | CodeCode Available | 1 |
| Towards Benchmarking and Assessing Visual Naturalness of Physical World Adversarial Attacks | May 22, 2023 | Adversarial AttackAutonomous Driving | CodeCode Available | 1 |
| Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method | May 22, 2023 | BenchmarkingHallucination | CodeCode Available | 1 |
| X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models | May 18, 2023 | BenchmarkingImage Generation | CodeCode Available | 1 |
| PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering | May 17, 2023 | BenchmarkingDiagnostic | CodeCode Available | 1 |
| An Empirical Study on Google Research Football Multi-agent Scenarios | May 16, 2023 | BenchmarkingMulti-agent Reinforcement Learning | CodeCode Available | 1 |
| A Platform for the Biomedical Application of Large Language Models | May 10, 2023 | BenchmarkingPrivacy Preserving | CodeCode Available | 1 |
| Benchmarking large language models for biomedical natural language processing applications and recommendations | May 10, 2023 | BenchmarkingDocument Classification | CodeCode Available | 1 |
| InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation | May 10, 2023 | BenchmarkingImage Captioning | CodeCode Available | 1 |
| DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects | May 9, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 |