| How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering? | Jun 19, 2025 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Humanity's Last Exam | Jan 24, 2025 | Humanity's Last ExamLanguage Modeling | —Unverified | 0 |
| End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering | Oct 10, 2016 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Fundamental Limitations in Defending LLM Finetuning APIs | Feb 20, 2025 | Multiple-choice | —Unverified | 0 |
| Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents | Apr 5, 2024 | Multiple-choiceNavigate | —Unverified | 0 |
| FusionMind -- Improving question and answering with external context fusion | Dec 31, 2023 | Knowledge GraphsMultiple-choice | —Unverified | 0 |
| Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework | Jan 16, 2025 | Multiple-choiceQuestion Generation | —Unverified | 0 |
| Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions | Oct 24, 2020 | General ClassificationMultiple-choice | —Unverified | 0 |
| LLMs May Perform MCQA by Selecting the Least Incorrect Option | Feb 2, 2024 | Multiple-choiceMultiple Choice Question Answering (MCQA) | —Unverified | 0 |
| ELiRF-UPV at SemEval-2018 Task 11: Machine Comprehension using Commonsense Knowledge | Jun 1, 2018 | Multiple-choiceQuestion Answering | —Unverified | 0 |