| Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond | Oct 23, 2023 | counterfactualMultiple-choice | —Unverified | 0 |
| StoryAnalogy: Deriving Story-level Analogies from Large Language Models to Unlock Analogical Understanding | Oct 19, 2023 | Multiple-choiceNatural Language Understanding | CodeCode Available | 0 |
| Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting | Oct 18, 2023 | Multiple-choice | —Unverified | 0 |
| Field-testing items using artificial intelligence: Natural language processing with transformers | Oct 18, 2023 | Multiple-choice | —Unverified | 0 |
| Evaluating the Symbol Binding Ability of Large Language Models for Multiple-Choice Questions in Vietnamese General Education | Oct 18, 2023 | Multiple-choiceMultiple Choice Question Answering (MCQA) | —Unverified | 0 |
| KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models | Oct 15, 2023 | Multiple-choiceTriplet | CodeCode Available | 0 |
| Mitigating Bias for Question Answering Models by Tracking Bias Influence | Oct 13, 2023 | Multiple-choiceMulti-Task Learning | —Unverified | 0 |
| Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks | Oct 7, 2023 | Action RecognitionMultiple-choice | —Unverified | 0 |
| On the Performance of Multimodal Language Models | Oct 4, 2023 | BenchmarkingBinary Classification | —Unverified | 0 |
| Language Models as Knowledge Bases for Visual Word Sense Disambiguation | Oct 3, 2023 | Image CaptioningMultiple-choice | CodeCode Available | 0 |
| AutoCast++: Enhancing World Event Prediction with Zero-shot Ranking-based Context Retrieval | Oct 3, 2023 | ArticlesDecision Making | CodeCode Available | 0 |
| Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute Misconceptions | Oct 3, 2023 | MisconceptionsMultiple-choice | CodeCode Available | 0 |
| Fusing Models with Complementary Expertise | Oct 2, 2023 | Multiple-choicetext-classification | CodeCode Available | 0 |
| Automating question generation from educational text | Sep 26, 2023 | Multiple-choiceQuestion Generation | —Unverified | 0 |
| HANS, are you clever? Clever Hans Effect Analysis of Neural Systems | Sep 21, 2023 | Decision MakingMultiple-choice | —Unverified | 0 |
| Benchmarks for Pirá 2.0, a Reading Comprehension Dataset about the Ocean, the Brazilian Coast, and Climate Change | Sep 19, 2023 | Generative Question AnsweringInformation Retrieval | —Unverified | 0 |
| Exploring Iterative Enhancement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language Models | Sep 19, 2023 | Explanation GenerationLanguage Modelling | CodeCode Available | 0 |
| Language models are susceptible to incorrect patient self-diagnosis in medical applications | Sep 17, 2023 | DiagnosticMultiple-choice | —Unverified | 0 |
| Self-Assessment Tests are Unreliable Measures of LLM Personality | Sep 15, 2023 | Multiple-choice | —Unverified | 0 |
| Use neural networks to recognize students' handwritten letters and incorrect symbols | Sep 12, 2023 | Multiple-choice | —Unverified | 0 |
| Performance of ChatGPT-3.5 and GPT-4 on the United States Medical Licensing Examination With and Without Distractions | Sep 12, 2023 | Multiple-choiceSentence | —Unverified | 0 |
| INCEPTNET: Precise And Early Disease Detection Application For Medical Images Analyses | Sep 5, 2023 | Cell DetectionLesion Segmentation | CodeCode Available | 0 |
| An Automatic Evaluation Framework for Multi-turn Medical Consultations Capabilities of Large Language Models | Sep 5, 2023 | Multiple-choice | —Unverified | 0 |
| Generalised Winograd Schema and its Contextuality | Aug 31, 2023 | coreference-resolutionCoreference Resolution | —Unverified | 0 |
| Spoken Language Intelligence of Large Language Models for Language Learning | Aug 28, 2023 | Language AcquisitionMultiple-choice | CodeCode Available | 0 |
| Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions | Aug 22, 2023 | Multiple-choiceSensitivity | —Unverified | 0 |
| A Comparative Study of Open-Source Large Language Models, GPT-4 and Claude 2: Multiple-Choice Test Taking in Nephrology | Aug 9, 2023 | Multiple-choice | —Unverified | 0 |
| Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning | Aug 7, 2023 | In-Context LearningMath | CodeCode Available | 0 |
| ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and Retrieval | Aug 4, 2023 | BenchmarkingInformation Retrieval | CodeCode Available | 0 |
| ReCoMIF: Reading comprehension based multi-source information fusion network for Chinese spoken language understanding | Aug 1, 2023 | Intent DetectionMultiple-choice | CodeCode Available | 0 |
| Distractor generation for multiple-choice questions with predictive prompting and large language models | Jul 30, 2023 | Distractor GenerationMultiple-choice | CodeCode Available | 0 |
| A large language model-assisted education tool to provide feedback on open-ended responses | Jul 25, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla | Jul 18, 2023 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods | Jul 16, 2023 | Multiple-choice | CodeCode Available | 0 |
| Analyzing Multiple-Choice Reading and Listening Comprehension Tests | Jul 3, 2023 | Multiple-choiceReading Comprehension | —Unverified | 0 |
| Chance-Constrained Multiple-Choice Knapsack Problem: Model, Algorithms, and Applications | Jun 26, 2023 | Combinatorial OptimizationMultiple-choice | CodeCode Available | 0 |
| Structured Dialogue Discourse Parsing | Jun 26, 2023 | Discourse ParsingMultiple-choice | CodeCode Available | 0 |
| Analysis of the Cambridge Multiple-Choice Questions Reading Dataset with a Focus on Candidate Response Distribution | Jun 22, 2023 | Multiple-choice | —Unverified | 0 |
| Solving and Generating NPR Sunday Puzzles with Large Language Models | Jun 21, 2023 | Multiple-choicePrompt Engineering | CodeCode Available | 0 |
| RECAP-KG: Mining Knowledge Graphs from Raw GP Notes for Remote COVID-19 Assessment in Primary Care | Jun 17, 2023 | Decision Makinggraph construction | —Unverified | 0 |
| Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses | Jun 15, 2023 | Multiple-choice | —Unverified | 0 |
| Can ChatGPT pass the Vietnamese National High School Graduation Examination? | Jun 15, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Questioning the Survey Responses of Large Language Models | Jun 13, 2023 | Multiple-choiceSurvey | CodeCode Available | 0 |
| Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and Problem Solving: Evidence from the Vietnamese National High School Graduation Examination | Jun 10, 2023 | MathMathematical Reasoning | —Unverified | 0 |
| Network-based Representations and Dynamic Discrete Choice Models for Multiple Discrete Choice Analysis | Jun 7, 2023 | Discrete Choice ModelsMultiple-choice | —Unverified | 0 |
| BUCA: A Binary Classification Approach to Unsupervised Commonsense Question Answering | May 25, 2023 | Binary ClassificationKnowledge Graphs | CodeCode Available | 0 |
| Increasing Probability Mass on Answer Choices Does Not Always Improve Accuracy | May 24, 2023 | In-Context LearningMultiple-choice | CodeCode Available | 0 |
| Have Large Language Models Developed a Personality?: Applicability of Self-Assessment Tests in Measuring Personality in LLMs | May 24, 2023 | Multiple-choice | —Unverified | 0 |
| ToMChallenges: A Principle-Guided Dataset and Diverse Evaluation Tasks for Exploring Theory of Mind | May 24, 2023 | Multiple-choiceQuestion Answering | CodeCode Available | 0 |
| This Land is Your, My Land: Evaluating Geopolitical Biases in Language Models | May 24, 2023 | Language ModellingLarge Language Model | CodeCode Available | 0 |