SOTAVerified

Multiple-choice

Papers

Showing 476500 of 1107 papers

TitleStatusHype
Is There No Such Thing as a Bad Question? H4R: HalluciBot For Ratiocination, Rewriting, Ranking, and Routing0
Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation0
HANS, are you clever? Clever Hans Effect Analysis of Neural Systems0
A Graph-Guided Reasoning Approach for Open-ended Commonsense Question Answering0
Eliciting Categorical Data for Optimal Aggregation0
Eigen Values Features for the Classification of Brain Signals corresponding to 2D and 3D Educational Contents0
Not All Options Are Created Equal: Textual Option Weighting for Token-Efficient LLM-Based Knowledge Tracing0
HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models0
Have Large Language Models Developed a Personality?: Applicability of Self-Assessment Tests in Measuring Personality in LLMs0
Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation0
Efficient Knowledge Distillation: Empowering Small Language Models with Teacher Model Insights0
Beyond Profile: From Surface-Level Facts to Deep Persona Simulation in LLMs0
HFL-RC System at SemEval-2018 Task 11: Hybrid Multi-Aspects Model for Commonsense Reading Comprehension0
Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation0
HindiLLM: Large Language Model for Hindi0
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models0
A Novel Approach for Constrained Optimization in Graphical Models0
AgMMU: A Comprehensive Agricultural Multimodal Understanding and Reasoning Benchmark0
How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?0
Edinburgh Clinical NLP at MEDIQA-CORR 2024: Guiding Large Language Models with Hints0
Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III0
Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization0
How well do LLMs reason over tabular data, really?0
E-Commerce Promotions Personalization via Online Multiple-Choice Knapsack with Uplift Modeling0
Beyond Multiple-Choice Accuracy: Real-World Challenges of Implementing Large Language Models in Healthcare0
Show:102550
← PrevPage 20 of 45Next →

No leaderboard results yet.