Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 176–200 of 1107 papers

Title	Date	Tasks	Status	Hype
SB-Bench: Stereotype Bias Benchmark for Large Multimodal Models	Feb 12, 2025	FairnessMultiple-choice	—Unverified	0
A Semantic Parsing Algorithm to Solve Linear Ordering Problems	Feb 12, 2025	Multiple-choiceSemantic Parsing	—Unverified	0
Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs	Feb 12, 2025	Multiple-choiceSurvey	—Unverified	0
PerCul: A Story-Driven Cultural Evaluation of LLMs in Persian	Feb 11, 2025	Multiple-choice	—Unverified	0
Tokenization Standards for Linguistic Integrity: Turkish as a Benchmark	Feb 10, 2025	MMLUMorphological Analysis	—Unverified	0
HSI: Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language Models	Feb 9, 2025	Answer GenerationLanguage Modeling	CodeCode Available	0
Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning	Feb 8, 2025	Legal ReasoningMultiple-choice	CodeCode Available	0
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning	Feb 7, 2025	Multiple-choiceQuestion Answering	CodeCode Available	0
The Order Effect: Investigating Prompt Sensitivity to Input Order in LLMs	Feb 6, 2025	Multiple-choiceSensitivity	—Unverified	0
LLMs to Support a Domain Specific Knowledge Assistant	Feb 6, 2025	ChatbotMultiple-choice	—Unverified	0
TUMTraffic-VideoQA: A Benchmark for Unified Spatio-Temporal Video Understanding in Traffic Scenes	Feb 4, 2025	Autonomous DrivingMultiple-choice	CodeCode Available	1
Evalita-LLM: Benchmarking Large Language Models on Italian	Feb 4, 2025	BenchmarkingMultiple-choice	—Unverified	0
The Use of Artificial Intelligence Tools in Assessing Content Validity: A Comparative Study with Human Experts	Feb 3, 2025	Multiple-choiceReading Comprehension	—Unverified	0
CoddLLM: Empowering Large Language Models for Data Analytics	Feb 1, 2025	Multiple-choiceSynthetic Data Generation	—Unverified	0
InnerThoughts: Disentangling Representations and Predictions in Large Language Models	Jan 29, 2025	Multiple-choicePosition	—Unverified	0
Town Hall Debate Prompting: Enhancing Logical Reasoning in LLMs through Multi-Persona Interaction	Jan 28, 2025	Logical ReasoningMultiple-choice	—Unverified	0
Inferring from Logits: Exploring Best Practices for Decoding-Free Generative Candidate Selection	Jan 28, 2025	Multiple-choice	—Unverified	0
Attribution analysis of legal language as used by LLM	Jan 28, 2025	Binary ClassificationMultiple-choice	—Unverified	0
Options-Aware Dense Retrieval for Multiple-Choice query Answering	Jan 27, 2025	Multiple-choiceQuestion Answering	—Unverified	0
HardML: A Benchmark For Evaluating Data Science And Machine Learning knowledge and reasoning in AI	Jan 26, 2025	MMLUMultiple-choice	—Unverified	0
LLM Evaluation Based on Aerospace Manufacturing Expertise: Automated Generation and Multi-Model Question Answering	Jan 25, 2025	Information RetrievalMultiple-choice	—Unverified	0
LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion	Jan 25, 2025	Multiple-choiceReading Comprehension	—Unverified	0
Option-ID Based Elimination For Multiple Choice Questions	Jan 25, 2025	Multiple-choice	CodeCode Available	0
Humanity's Last Exam	Jan 24, 2025	Humanity's Last ExamLanguage Modeling	—Unverified	0
Auto-Evaluation: A Critical Measure in Driving Improvements in Quality and Safety of AI-Generated Lesson Resources	Jan 23, 2025	Multiple-choice	—Unverified	0

Show:10 25 50

← PrevPage 8 of 45Next →

No leaderboard results yet.