Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 601–650 of 1107 papers

Title	Date	Tasks	Status	Hype
Instruction Fine-Tuning: Does Prompt Loss Matter?	Jan 24, 2024	Multiple-choicetoken-classification	—Unverified	0
A Study on Large Language Models' Limitations in Multiple-Choice Question Answering	Jan 15, 2024	Multiple-choiceQuestion Answering	CodeCode Available	0
Towards Efficient Methods in Medical Question Answering using Knowledge Graph Embeddings	Jan 15, 2024	Knowledge Graph EmbeddingsKnowledge Graphs	CodeCode Available	0
Assessing Large Language Models in Mechanical Engineering Education: A Study on Mechanics-Focused Conceptual Understanding	Jan 13, 2024	Multiple-choicePrompt Engineering	—Unverified	0
Automated Answer Validation using Text Similarity	Jan 13, 2024	Information RetrievalMultiple-choice	—Unverified	0
PUB: A Pragmatics Understanding Benchmark for Assessing LLMs' Pragmatics Capabilities	Jan 13, 2024	Instruction FollowingMultiple-choice	—Unverified	0
A Novel Multi-Stage Prompting Approach for Language Agnostic MCQ Generation using GPT	Jan 13, 2024	Distractor GenerationMultiple-choice	CodeCode Available	0
The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models	Jan 11, 2024	MathMultiple-choice	CodeCode Available	1
A Joint-Reasoning based Disease Q&A System	Jan 6, 2024	Knowledge GraphsMisinformation	—Unverified	0
SEED-Bench: Benchmarking Multimodal Large Language Models	Jan 1, 2024	BenchmarkingImage Generation	CodeCode Available	3
The Earth is Flat? Unveiling Factual Errors in Large Language Models	Jan 1, 2024	In-Context LearningMultiple-choice	—Unverified	0
FusionMind -- Improving question and answering with external context fusion	Dec 31, 2023	Knowledge GraphsMultiple-choice	—Unverified	0
SecQA: A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer Security	Dec 26, 2023	Computer SecurityMultiple-choice	CodeCode Available	0
RoleEval: A Bilingual Role Evaluation Benchmark for Large Language Models	Dec 26, 2023	MemorizationMultiple-choice	CodeCode Available	1
HyKGE: A Hypothesis Knowledge Graph Enhanced Framework for Accurate and Reliable Medical LLMs Responses	Dec 26, 2023	DiversityKnowledge Graphs	CodeCode Available	1
Towards a Unified Multimodal Reasoning Framework	Dec 22, 2023	Multimodal ReasoningMultiple-choice	CodeCode Available	0
Perception Test 2023: A Summary of the First Challenge And Outcome	Dec 20, 2023	BenchmarkingGrounded Video Question Answering	—Unverified	0
BloomVQA: Assessing Hierarchical Multi-modal Comprehension	Dec 20, 2023	Data AugmentationMemorization	—Unverified	0
Multiple Hypothesis Dropout: Estimating the Parameters of Multi-Modal Output Distributions	Dec 18, 2023	Multiple-choicePedestrian Trajectory Prediction	CodeCode Available	0
An In-depth Look at Gemini's Language Abilities	Dec 18, 2023	Instruction FollowingMath	CodeCode Available	1
Marathon: A Race Through the Realm of Long Context with Large Language Models	Dec 15, 2023	Long-Context UnderstandingMultiple-choice	CodeCode Available	1
Self-Evaluation Improves Selective Generation in Large Language Models	Dec 14, 2023	Multiple-choiceTruthfulQA	—Unverified	0
A Foundational Multimodal Vision Language AI Assistant for Human Pathology	Dec 13, 2023	Decision MakingDiagnostic	—Unverified	0
Steering Llama 2 via Contrastive Activation Addition	Dec 9, 2023	Multiple-choice	CodeCode Available	2
Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers	Dec 7, 2023	MathMultiple-choice	CodeCode Available	1
A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming Education	Dec 5, 2023	Multiple-choice	—Unverified	0
Unleashing the Potential of Large Language Model: Zero-shot VQA for Flood Disaster Scenario	Dec 4, 2023	Language ModelingLanguage Modelling	—Unverified	0
Explanatory Argument Extraction of Correct Answers in Resident Medical Exams	Dec 1, 2023	Multiple-choice	CodeCode Available	0
Evaluating the Rationale Understanding of Critical Reasoning in Logical Reading Comprehension	Nov 30, 2023	Multiple-choiceReading Comprehension	—Unverified	0
Biomedical knowledge graph-optimized prompt generation for large language models	Nov 29, 2023	BenchmarkingKnowledge Graphs	CodeCode Available	2
CLOMO: Counterfactual Logical Modification with Large Language Models	Nov 29, 2023	counterfactualCounterfactual Reasoning	CodeCode Available	0
SEED-Bench-2: Benchmarking Multimodal Large Language Models	Nov 28, 2023	BenchmarkingImage Generation	CodeCode Available	2
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark	Nov 28, 2023	3D Question Answering (3D-QA)Diagnostic	CodeCode Available	2
GPQA: A Graduate-Level Google-Proof Q&A Benchmark	Nov 20, 2023	Multiple-choice	CodeCode Available	2
Downstream Trade-offs of a Family of Text Watermarks	Nov 16, 2023	FormLanguage Modelling	CodeCode Available	0
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection	Nov 16, 2023	Language ModelingLanguage Modelling	CodeCode Available	4
ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in Psychology	Nov 16, 2023	MMLUMultiple-choice	—Unverified	0
Investigating Data Contamination in Modern Benchmarks for Large Language Models	Nov 16, 2023	Common Sense ReasoningMMLU	—Unverified	0
Evaluating LLMs on Document-Based QA: Exact Answer Selection and Numerical Extraction using Cogtale dataset	Nov 14, 2023	Answer SelectionInformation Retrieval	—Unverified	0
It's Not Easy Being Wrong: Large Language Models Struggle with Process of Elimination Reasoning	Nov 13, 2023	Multiple-choice	CodeCode Available	0
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models	Nov 10, 2023	GSM8KMemorization	CodeCode Available	1
Fake Alignment: Are LLMs Really Aligned Well?	Nov 10, 2023	Multiple-choice	CodeCode Available	1
Characterizing Large Language Models as Rationalizers of Knowledge-intensive Tasks	Nov 9, 2023	Multiple-choiceWorld Knowledge	—Unverified	0
Assessing Distractors in Multiple-Choice Tests	Nov 8, 2023	DiversityMultiple-choice	—Unverified	0
Evaluating multiple large language models in pediatric ophthalmology	Nov 7, 2023	Multiple-choice	—Unverified	0
Evaluating the Potential of Leading Large Language Models in Reasoning Biology Questions	Nov 5, 2023	Logical ReasoningMultiple-choice	—Unverified	0
More Robots are Coming: Large Multimodal Models (ChatGPT) can Solve Visually Diverse Images of Parsons Problems	Nov 3, 2023	Multiple-choice	—Unverified	0
CASE: Commonsense-Augmented Score with an Expanded Answer Space	Nov 3, 2023	Multiple-choice	CodeCode Available	0
Resilient Multiple Choice Learning: A learned scoring scheme with application to audio scene analysis	Nov 2, 2023	Density EstimationDiversity	CodeCode Available	1
An Open Source Data Contamination Report for Large Language Models	Oct 26, 2023	HellaSwagLanguage Modeling	CodeCode Available	1

Show:10 25 50

← PrevPage 13 of 23Next →

No leaderboard results yet.