SOTAVerified

Multiple-choice

Papers

Showing 451500 of 1107 papers

TitleStatusHype
DE-COP: Detecting Copyrighted Content in Language Models Training DataCode0
An Automatic Question Usability Evaluation ToolkitCode0
Language Models as Knowledge Bases for Visual Word Sense DisambiguationCode0
A Profit-Maximizing Strategy for Advertising on the e-Commerce PlatformsCode0
Automated Generation and Tagging of Knowledge Components from Multiple-Choice QuestionsCode0
Truth Knows No Language: Evaluating Truthfulness Beyond EnglishCode0
Joint Learning of Sentence Embeddings for Relevance and EntailmentCode0
Chance-Constrained Multiple-Choice Knapsack Problem: Model, Algorithms, and ApplicationsCode0
Kaleidoscope: In-language Exams for Massively Multilingual Vision EvaluationCode0
It's Not Easy Being Wrong: Large Language Models Struggle with Process of Elimination ReasoningCode0
DAHL: Domain-specific Automated Hallucination Evaluation of Long-Form Text through a Benchmark Dataset in BiomedicineCode0
KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language ModelsCode0
Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question AnsweringCode0
iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain TeasersCode0
Investigating the Shortcomings of LLMs in Step-by-Step Legal ReasoningCode0
CSEPrompts: A Benchmark of Introductory Computer Science PromptsCode0
IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language ModelsCode0
Utilizing Background Knowledge for Robust Reasoning over Traffic SituationsCode0
Is Your Large Language Model Knowledgeable or a Choices-Only Cheater?Code0
AutoCast++: Enhancing World Event Prediction with Zero-shot Ranking-based Context RetrievalCode0
Introducing a framework to assess newly created questions with Natural Language ProcessingCode0
QMOS: Enhancing LLMs for Telecommunication with Question Masked loss and Option ShufflingCode0
Introducing Flexible Monotone Multiple Choice Item Response Theory Models and Bit ScalesCode0
Iterative Forward Tuning Boosts In-Context Learning in Language ModelsCode0
CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language ModelsCode0
Improving Question Answering with External KnowledgeCode0
Video Prediction via Selective SamplingCode0
VisBias: Measuring Explicit and Implicit Social Biases in Vision Language ModelsCode0
Increasing Probability Mass on Answer Choices Does Not Always Improve AccuracyCode0
INCEPTNET: Precise And Early Disease Detection Application For Medical Images AnalysesCode0
A multimodal dataset for understanding the impact of mobile phones on remote online virtual educationCode0
IdentifyMe: A Challenging Long-Context Mention Resolution Benchmark for LLMsCode0
What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks?Code0
What Makes Reading Comprehension Questions Easier?Code0
Improving Machine Reading Comprehension with General Reading StrategiesCode0
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning DistractorCode0
Controlling Cloze-test Question Item Difficulty with PLM-based Surrogate Models for IRT Assessment0
Contextual Response Interpretation for Automated Structured Interviews: A Case Study in Market Research0
Context Modeling with Evidence Filter for Multiple Choice Question Answering0
Context-guided Triple Matching for Multiple Choice Question Answering0
AstroMLab 1: Who Wins Astronomy Jeopardy!?0
Analysing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets0
HRCA+: Advanced Multiple-choice Machine Reading Comprehension Method0
Context-guided Triple Matching for Multiple Choice Question Answering0
How well do LLMs reason over tabular data, really?0
How Susceptible are LLMs to Influence in Prompts?0
How Many Workers to Ask? Adaptive Exploration for Collecting High Quality Labels0
A statistical model for aggregating judgments by incorporating peer predictions0
Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III0
How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?0
Show:102550
← PrevPage 10 of 23Next →

No leaderboard results yet.