Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 351–400 of 1107 papers

Title	Date	Tasks	Status	Hype
LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ	Sep 25, 2024	ChatbotGSM8K	—Unverified	0
RISCORE: Enhancing In-Context Riddle Solving in Language Models through Context-Reconstructed Example Augmentation	Sep 24, 2024	Multiple-choiceSentence	—Unverified	0
Boosting Healthcare LLMs Through Retrieved Context	Sep 23, 2024	BenchmarkingMultiple-choice	CodeCode Available	1
Detect, Describe, Discriminate: Moving Beyond VQA for MLLM Evaluation	Sep 23, 2024	Multiple-choiceQuestion Answering	—Unverified	0
Evaluating the Performance and Robustness of LLMs in Materials Science Q&A and Property Predictions	Sep 22, 2024	Band GapIn-Context Learning	—Unverified	0
QMOS: Enhancing LLMs for Telecommunication with Question Masked loss and Option Shuffling	Sep 21, 2024	Multiple-choicePrompt Engineering	CodeCode Available	0
First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge	Sep 20, 2024	Multiple-choiceQuestion Answering	—Unverified	0
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination	Sep 19, 2024	General KnowledgeMMLU	—Unverified	0
Efficient Knowledge Distillation: Empowering Small Language Models with Teacher Model Insights	Sep 19, 2024	Decision MakingKnowledge Distillation	—Unverified	0
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models	Sep 19, 2024	EthicsMultiple-choice	CodeCode Available	0
LLM-as-a-Judge & Reward Model: What They Can and Cannot Do	Sep 17, 2024	Language ModelingLanguage Modelling	—Unverified	0
Annealed Winner-Takes-All for Motion Forecasting	Sep 17, 2024	AllAutonomous Driving	CodeCode Available	1
Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia	Sep 13, 2024	MathMultiple-choice	—Unverified	0
Exploring syntactic information in sentence embeddings through multilingual subject-verb agreement	Sep 10, 2024	Multiple-choiceSentence	—Unverified	0
Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach	Sep 9, 2024	Computational EfficiencyContinual Pretraining	CodeCode Available	0
COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes	Sep 6, 2024	Multiple-choiceQuestion Answering	CodeCode Available	0
MaterialBENCH: Evaluating College-Level Materials Science Problem-Solving Abilities of Large Language Models	Sep 5, 2024	Multiple-choice	—Unverified	0
CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models	Sep 4, 2024	GSM8KMath	CodeCode Available	2
Training on the Benchmark Is Not All You Need	Sep 3, 2024	AllMultiple-choice	CodeCode Available	1
The Role of Large Language Models in Musicology: Are We Ready to Trust the Machines?	Sep 3, 2024	Multiple-choiceQuestion Generation	—Unverified	0
Novel-WD: Exploring acquisition of Novel World Knowledge in LLMs Using Prefix-Tuning	Aug 30, 2024	Causal Language ModelingContinual Learning	—Unverified	0
Wait, that's not an option: LLMs Robustness with Incorrect Multiple-Choice Options	Aug 27, 2024	Decision MakingMultiple-choice	CodeCode Available	0
TourSynbio: A Multi-Modal Large Model and Agent Framework to Bridge Text and Protein Sequences for Protein Engineering	Aug 27, 2024	Multiple-choiceProtein Folding	CodeCode Available	1
Vision-Language and Large Language Model Performance in Gastroenterology: GPT, Claude, Llama, Phi, Mistral, Gemma, and Quantized Models	Aug 25, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
Enhancing Knowledge Tracing with Concept Map and Response Disentanglement	Aug 23, 2024	DisentanglementKnowledge Tracing	CodeCode Available	1
Towards Evaluating and Building Versatile Large Language Models for Medicine	Aug 22, 2024	Multiple-choicenamed-entity-recognition	CodeCode Available	2
Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations	Aug 22, 2024	Multiple-choice	—Unverified	0
Differentiating Choices via Commonality for Multiple-Choice Question Answering	Aug 21, 2024	Multiple-choiceMultiple Choice Question Answering (MCQA)	CodeCode Available	0
How Susceptible are LLMs to Influence in Prompts?	Aug 17, 2024	Multiple-choiceQuestion Answering	—Unverified	0
Measuring Agreeableness Bias in Multimodal Models	Aug 17, 2024	Decision MakingMultiple-choice	CodeCode Available	0
Chain-of-Exemplar: Enhancing Distractor Generation for Multimodal Educational Question Generation	Aug 16, 2024	Distractor GenerationMultiple-choice	CodeCode Available	0
LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs	Aug 16, 2024	Instruction FollowingMultiple-choice	CodeCode Available	1
Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil	Aug 9, 2024	MathMultiple-choice	—Unverified	0
LLaVA-OneVision: Easy Visual Task Transfer	Aug 6, 2024	3D Question Answering (3D-QA)	CodeCode Available	0
Winning Amazon KDD Cup'24	Aug 5, 2024	Data AugmentationMultiple-choice	—Unverified	0
XMainframe: A Large Language Model for Mainframe Modernization	Aug 5, 2024	Code SummarizationLanguage Modeling	CodeCode Available	2
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models	Aug 5, 2024	Image ComprehensionMultiple-choice	CodeCode Available	2
Recent Advances in Multi-Choice Machine Reading Comprehension: A Survey on Methods and Datasets	Aug 4, 2024	Few-Shot LearningMachine Reading Comprehension	—Unverified	0
MiniCPM-V: A GPT-4V Level MLLM on Your Phone	Aug 3, 2024	HallucinationMultiple-choice	CodeCode Available	12
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models	Aug 2, 2024	Multimodal ReasoningMultiple-choice	CodeCode Available	3
Improved Few-Shot Image Classification Through Multiple-Choice Questions	Jul 23, 2024	ArticlesFew-Shot Image Classification	—Unverified	0
Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models	Jul 23, 2024	Language ModellingLarge Language Model	—Unverified	0
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity	Jul 22, 2024	DiversityMultiple-choice	CodeCode Available	2
Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing	Jul 22, 2024	AllDiversity	CodeCode Available	1
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding	Jul 22, 2024	Multiple-choiceQuestion Answering	CodeCode Available	2
Answer, Assemble, Ace: Understanding How Transformers Answer Multiple Choice Questions	Jul 21, 2024	Multiple-choiceMultiple Choice Question Answering (MCQA)	—Unverified	0
MIBench: Evaluating Multimodal Large Language Models over Multiple Images	Jul 21, 2024	In-Context LearningMultiple-choice	—Unverified	0
Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment	Jul 20, 2024	Contrastive LearningMultiple-choice	CodeCode Available	0
Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data	Jul 20, 2024	Language ModellingMachine Translation	—Unverified	0
Evaluating language models as risk scores	Jul 19, 2024	Multiple-choiceQuestion Answering	CodeCode Available	1

Show:10 25 50

← PrevPage 8 of 23Next →

No leaderboard results yet.