| HCQA @ Ego4D EgoSchema Challenge 2024 | Jun 22, 2024 | Caption Generation | CodeCode Available | 1 | 5 |
| Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations | Oct 2, 2023 | In-Context LearningInstruction Following | CodeCode Available | 1 | 5 |
| FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture | Jun 16, 2024 | DiversityMultiple-choice | CodeCode Available | 1 | 5 |
| From Machine Reading Comprehension to Dialogue State Tracking: Bridging the Gap | Apr 13, 2020 | Dialogue State TrackingMachine Reading Comprehension | CodeCode Available | 1 | 5 |
| Ranked Voting based Self-Consistency of Large Language Models | May 16, 2025 | Multiple-choiceOpen-Ended Question Answering | CodeCode Available | 1 | 5 |
| General-Purpose Question-Answering with Macaw | Sep 6, 2021 | Generative Question AnsweringMultiple-choice | CodeCode Available | 1 | 5 |
| An Open Source Data Contamination Report for Large Language Models | Oct 26, 2023 | HellaSwagLanguage Modeling | CodeCode Available | 1 | 5 |
| Annealed Winner-Takes-All for Motion Forecasting | Sep 17, 2024 | AllAutonomous Driving | CodeCode Available | 1 | 5 |
| CC-Riddle: A Question Answering Dataset of Chinese Character Riddles | Jun 28, 2022 | General KnowledgeLanguage Modelling | CodeCode Available | 1 | 5 |
| Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing | Jul 22, 2024 | AllDiversity | CodeCode Available | 1 | 5 |