| Fine-tuning Multimodal Large Language Models for Product Bundling | Jul 16, 2024 | In-Context LearningMultiple-choice | CodeCode Available | 1 |
| ARMAN: Pre-training with Semantically Selecting and Reordering of Sentences for Persian Abstractive Summarization | Sep 9, 2021 | Abstractive Text SummarizationDecoder | CodeCode Available | 1 |
| E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language Models | Jan 29, 2024 | EthicsMultiple-choice | CodeCode Available | 1 |
| EduQG: A Multi-format Multiple Choice Dataset for the Educational Domain | Oct 12, 2022 | Distractor GenerationMultiple-choice | CodeCode Available | 1 |
| EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding | Aug 17, 2023 | DiagnosticEgoSchema | CodeCode Available | 1 |
| Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcom | Apr 30, 2024 | ImplicaturesMultiple-choice | CodeCode Available | 1 |
| IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages | Nov 8, 2020 | Genre classificationMultiple-choice | CodeCode Available | 1 |
| InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding | Jun 28, 2024 | Multiple-choiceVideo Understanding | CodeCode Available | 1 |
| Delving into the Reversal Curse: How Far Can Large Language Models Generalize? | Oct 24, 2024 | Multiple-choice | CodeCode Available | 1 |
| Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive Framework | Jul 24, 2023 | Contrastive LearningMultimodal Reasoning | CodeCode Available | 1 |