| Automated Generation and Tagging of Knowledge Components from Multiple-Choice Questions | May 30, 2024 | Language ModellingLarge Language Model | CodeCode Available | 0 |
| DGRC: An Effective Fine-tuning Framework for Distractor Generation in Chinese Multi-choice Reading Comprehension | May 29, 2024 | Distractor GenerationMultiple-choice | —Unverified | 0 |
| Edinburgh Clinical NLP at MEDIQA-CORR 2024: Guiding Large Language Models with Hints | May 28, 2024 | Multiple-choiceSentence | —Unverified | 0 |
| Can We Trust LLMs? Mitigate Overconfidence Bias in LLMs through Knowledge Transfer | May 27, 2024 | Multiple-choiceSentiment Analysis | —Unverified | 0 |
| iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers | May 25, 2024 | Common Sense ReasoningMultiple-choice | CodeCode Available | 0 |
| Eliciting Informative Text Evaluations with Large Language Models | May 23, 2024 | Multiple-choicePrediction | CodeCode Available | 0 |
| Imagery as Inquiry: Exploring A Multimodal Dataset for Conversational Recommendation | May 23, 2024 | Conversational RecommendationMultiple-choice | —Unverified | 0 |
| Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation | May 22, 2024 | InformativenessLanguage Modeling | CodeCode Available | 2 |
| Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning | May 22, 2024 | Mathematical ReasoningMultiple-choice | CodeCode Available | 1 |
| Robust portfolio optimization model for electronic coupon allocation | May 21, 2024 | Multiple-choicePortfolio Optimization | —Unverified | 0 |
| Multiple-Choice Questions are Efficient and Robust LLM Evaluators | May 20, 2024 | GSM8KHumanEval | CodeCode Available | 1 |
| Exploring the Capabilities of Prompted Large Language Models in Educational and Assessment Applications | May 19, 2024 | Multiple-choice | —Unverified | 0 |
| From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT | May 17, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 |
| Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset | May 17, 2024 | 16kBenchmarking | CodeCode Available | 3 |
| COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain | May 17, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning | May 16, 2024 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| CinePile: A Long Video Question Answering Dataset and Benchmark | May 14, 2024 | FormHuman-Object Interaction Detection | —Unverified | 0 |
| SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation | May 14, 2024 | BenchmarkingMultiple-choice | CodeCode Available | 1 |
| MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation | May 13, 2024 | In-Context LearningMultiple-choice | —Unverified | 0 |
| Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric Analysis | May 12, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 0 |
| THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models | May 8, 2024 | AttributeData Augmentation | CodeCode Available | 1 |
| WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning | May 6, 2024 | Multiple-choiceVideo Understanding | —Unverified | 0 |
| Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice Questions | May 6, 2024 | Decision MakingMultiple-choice | CodeCode Available | 0 |
| Self-Reflection in LLM Agents: Effects on Problem-Solving Performance | May 5, 2024 | Multiple-choice | CodeCode Available | 2 |
| Math Multiple Choice Question Generation via Human-Large Language Model Collaboration | May 1, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |