| Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models | Aug 18, 2023 | Multiple-choiceQuestion Answering | CodeCode Available | 1 |
| EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding | Aug 17, 2023 | DiagnosticEgoSchema | CodeCode Available | 1 |
| A Comparative Study of Open-Source Large Language Models, GPT-4 and Claude 2: Multiple-Choice Test Taking in Nephrology | Aug 9, 2023 | Multiple-choice | —Unverified | 0 |
| Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning | Aug 7, 2023 | In-Context LearningMath | CodeCode Available | 0 |
| ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and Retrieval | Aug 4, 2023 | BenchmarkingInformation Retrieval | CodeCode Available | 0 |
| ReCoMIF: Reading comprehension based multi-source information fusion network for Chinese spoken language understanding | Aug 1, 2023 | Intent DetectionMultiple-choice | CodeCode Available | 0 |
| MovieChat: From Dense Token to Sparse Memory for Long Video Understanding | Jul 31, 2023 | Multiple-choiceQuestion Answering | CodeCode Available | 2 |
| Distractor generation for multiple-choice questions with predictive prompting and large language models | Jul 30, 2023 | Distractor GenerationMultiple-choice | CodeCode Available | 0 |
| SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension | Jul 30, 2023 | BenchmarkingMultiple-choice | CodeCode Available | 2 |
| A large language model-assisted education tool to provide feedback on open-ended responses | Jul 25, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 |