| Is There No Such Thing as a Bad Question? H4R: HalluciBot For Ratiocination, Rewriting, Ranking, and Routing | Apr 18, 2024 | HallucinationMultiple-choice | —Unverified | 0 |
| BLINK: Multimodal Large Language Models Can See but Not Perceive | Apr 18, 2024 | Depth EstimationMultiple-choice | —Unverified | 0 |
| ViLLM-Eval: A Comprehensive Evaluation Suite for Vietnamese Large Language Models | Apr 17, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Question Difficulty Ranking for Multiple-Choice Reading Comprehension | Apr 16, 2024 | Multiple-choiceReading Comprehension | —Unverified | 0 |
| Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think | Apr 12, 2024 | Multiple-choice | CodeCode Available | 0 |
| Automatic Generation and Evaluation of Reading Comprehension Test Items with Large Language Models | Apr 11, 2024 | Multiple-choiceReading Comprehension | CodeCode Available | 0 |
| MoReVQA: Exploring Modular Reasoning Models for Video Question Answering | Apr 9, 2024 | EgoSchemaMultiple-choice | —Unverified | 0 |
| MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding | Apr 8, 2024 | GPUMultiple-choice | CodeCode Available | 3 |
| MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models | Apr 7, 2024 | Benchmarkingknowledge editing | CodeCode Available | 0 |
| Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents | Apr 5, 2024 | Multiple-choiceNavigate | —Unverified | 0 |