| A Novel Multi-Stage Prompting Approach for Language Agnostic MCQ Generation using GPT | Jan 13, 2024 | Distractor GenerationMultiple-choice | CodeCode Available | 0 |
| PUB: A Pragmatics Understanding Benchmark for Assessing LLMs' Pragmatics Capabilities | Jan 13, 2024 | Instruction FollowingMultiple-choice | —Unverified | 0 |
| A Joint-Reasoning based Disease Q&A System | Jan 6, 2024 | Knowledge GraphsMisinformation | —Unverified | 0 |
| The Earth is Flat? Unveiling Factual Errors in Large Language Models | Jan 1, 2024 | In-Context LearningMultiple-choice | —Unverified | 0 |
| FusionMind -- Improving question and answering with external context fusion | Dec 31, 2023 | Knowledge GraphsMultiple-choice | —Unverified | 0 |
| SecQA: A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer Security | Dec 26, 2023 | Computer SecurityMultiple-choice | CodeCode Available | 0 |
| Towards a Unified Multimodal Reasoning Framework | Dec 22, 2023 | Multimodal ReasoningMultiple-choice | CodeCode Available | 0 |
| Perception Test 2023: A Summary of the First Challenge And Outcome | Dec 20, 2023 | BenchmarkingGrounded Video Question Answering | —Unverified | 0 |
| BloomVQA: Assessing Hierarchical Multi-modal Comprehension | Dec 20, 2023 | Data AugmentationMemorization | —Unverified | 0 |
| Multiple Hypothesis Dropout: Estimating the Parameters of Multi-Modal Output Distributions | Dec 18, 2023 | Multiple-choicePedestrian Trajectory Prediction | CodeCode Available | 0 |