Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 351–375 of 1107 papers

Title	Date	Tasks	Status	Hype
LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ	Sep 25, 2024	ChatbotGSM8K	—Unverified	0
RISCORE: Enhancing In-Context Riddle Solving in Language Models through Context-Reconstructed Example Augmentation	Sep 24, 2024	Multiple-choiceSentence	—Unverified	0
Boosting Healthcare LLMs Through Retrieved Context	Sep 23, 2024	BenchmarkingMultiple-choice	CodeCode Available	1
Detect, Describe, Discriminate: Moving Beyond VQA for MLLM Evaluation	Sep 23, 2024	Multiple-choiceQuestion Answering	—Unverified	0
Evaluating the Performance and Robustness of LLMs in Materials Science Q&A and Property Predictions	Sep 22, 2024	Band GapIn-Context Learning	—Unverified	0
QMOS: Enhancing LLMs for Telecommunication with Question Masked loss and Option Shuffling	Sep 21, 2024	Multiple-choicePrompt Engineering	CodeCode Available	0
First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge	Sep 20, 2024	Multiple-choiceQuestion Answering	—Unverified	0
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination	Sep 19, 2024	General KnowledgeMMLU	—Unverified	0
Efficient Knowledge Distillation: Empowering Small Language Models with Teacher Model Insights	Sep 19, 2024	Decision MakingKnowledge Distillation	—Unverified	0
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models	Sep 19, 2024	EthicsMultiple-choice	CodeCode Available	0
LLM-as-a-Judge & Reward Model: What They Can and Cannot Do	Sep 17, 2024	Language ModelingLanguage Modelling	—Unverified	0
Annealed Winner-Takes-All for Motion Forecasting	Sep 17, 2024	AllAutonomous Driving	CodeCode Available	1
Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia	Sep 13, 2024	MathMultiple-choice	—Unverified	0
Exploring syntactic information in sentence embeddings through multilingual subject-verb agreement	Sep 10, 2024	Multiple-choiceSentence	—Unverified	0
Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach	Sep 9, 2024	Computational EfficiencyContinual Pretraining	CodeCode Available	0
COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes	Sep 6, 2024	Multiple-choiceQuestion Answering	CodeCode Available	0
MaterialBENCH: Evaluating College-Level Materials Science Problem-Solving Abilities of Large Language Models	Sep 5, 2024	Multiple-choice	—Unverified	0
CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models	Sep 4, 2024	GSM8KMath	CodeCode Available	2
Training on the Benchmark Is Not All You Need	Sep 3, 2024	AllMultiple-choice	CodeCode Available	1
The Role of Large Language Models in Musicology: Are We Ready to Trust the Machines?	Sep 3, 2024	Multiple-choiceQuestion Generation	—Unverified	0
Novel-WD: Exploring acquisition of Novel World Knowledge in LLMs Using Prefix-Tuning	Aug 30, 2024	Causal Language ModelingContinual Learning	—Unverified	0
Wait, that's not an option: LLMs Robustness with Incorrect Multiple-Choice Options	Aug 27, 2024	Decision MakingMultiple-choice	CodeCode Available	0
TourSynbio: A Multi-Modal Large Model and Agent Framework to Bridge Text and Protein Sequences for Protein Engineering	Aug 27, 2024	Multiple-choiceProtein Folding	CodeCode Available	1
Vision-Language and Large Language Model Performance in Gastroenterology: GPT, Claude, Llama, Phi, Mistral, Gemma, and Quantized Models	Aug 25, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
Enhancing Knowledge Tracing with Concept Map and Response Disentanglement	Aug 23, 2024	DisentanglementKnowledge Tracing	CodeCode Available	1

Show:10 25 50

← PrevPage 15 of 45Next →

No leaderboard results yet.