Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–425 of 1107 papers

Title	Date	Tasks	Status
LogiDynamics: Unraveling the Dynamics of Logical Inference in Large Language Model Reasoning	Feb 16, 2025	Analogical questionsIn-Context Learning	—Unverified
VisCon-100K: Leveraging Contextual Web Data for Fine-tuning Vision Language Models	Feb 14, 2025	Image CaptioningLarge Language Model	—Unverified
Truth Knows No Language: Evaluating Truthfulness Beyond English	Feb 13, 2025	InformativenessMachine Translation	CodeCode Available
Objective quantification of mood states using large language models	Feb 13, 2025	Multiple-choice	—Unverified
A Semantic Parsing Algorithm to Solve Linear Ordering Problems	Feb 12, 2025	Multiple-choiceSemantic Parsing	—Unverified
SB-Bench: Stereotype Bias Benchmark for Large Multimodal Models	Feb 12, 2025	FairnessMultiple-choice	—Unverified
Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs	Feb 12, 2025	Multiple-choiceSurvey	—Unverified
PerCul: A Story-Driven Cultural Evaluation of LLMs in Persian	Feb 11, 2025	Multiple-choice	—Unverified
Tokenization Standards for Linguistic Integrity: Turkish as a Benchmark	Feb 10, 2025	MMLUMorphological Analysis	—Unverified
HSI: Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language Models	Feb 9, 2025	Answer GenerationLanguage Modeling	CodeCode Available
Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning	Feb 8, 2025	Legal ReasoningMultiple-choice	CodeCode Available
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning	Feb 7, 2025	Multiple-choiceQuestion Answering	CodeCode Available
The Order Effect: Investigating Prompt Sensitivity to Input Order in LLMs	Feb 6, 2025	Multiple-choiceSensitivity	—Unverified
LLMs to Support a Domain Specific Knowledge Assistant	Feb 6, 2025	ChatbotMultiple-choice	—Unverified
Evalita-LLM: Benchmarking Large Language Models on Italian	Feb 4, 2025	BenchmarkingMultiple-choice	—Unverified
The Use of Artificial Intelligence Tools in Assessing Content Validity: A Comparative Study with Human Experts	Feb 3, 2025	Multiple-choiceReading Comprehension	—Unverified
CoddLLM: Empowering Large Language Models for Data Analytics	Feb 1, 2025	Multiple-choiceSynthetic Data Generation	—Unverified
InnerThoughts: Disentangling Representations and Predictions in Large Language Models	Jan 29, 2025	Multiple-choicePosition	—Unverified
Town Hall Debate Prompting: Enhancing Logical Reasoning in LLMs through Multi-Persona Interaction	Jan 28, 2025	Logical ReasoningMultiple-choice	—Unverified
Inferring from Logits: Exploring Best Practices for Decoding-Free Generative Candidate Selection	Jan 28, 2025	Multiple-choice	—Unverified
Attribution analysis of legal language as used by LLM	Jan 28, 2025	Binary ClassificationMultiple-choice	—Unverified
Options-Aware Dense Retrieval for Multiple-Choice query Answering	Jan 27, 2025	Multiple-choiceQuestion Answering	—Unverified
HardML: A Benchmark For Evaluating Data Science And Machine Learning knowledge and reasoning in AI	Jan 26, 2025	MMLUMultiple-choice	—Unverified
LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion	Jan 25, 2025	Multiple-choiceReading Comprehension	—Unverified
LLM Evaluation Based on Aerospace Manufacturing Expertise: Automated Generation and Multi-Model Question Answering	Jan 25, 2025	Information RetrievalMultiple-choice	—Unverified

Show:10 25 50

← PrevPage 17 of 45Next →

No leaderboard results yet.