SOTAVerified

Multiple-choice

Papers

Showing 501525 of 1107 papers

TitleStatusHype
A statistical model for aggregating judgments by incorporating peer predictions0
Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III0
Hypothesis Testing for Quantifying LLM-Human Misalignment in Multiple Choice Settings0
Identification of mental fatigue in language comprehension tasks based on EEG and deep learning0
Treatment Effects with Multidimensional Unobserved Heterogeneity: Identification of the Marginal Treatment Effect0
Identifying Multiple Personalities in Large Language Models with External Evaluation0
How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?0
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites0
IIE-NLP-Eyas at SemEval-2021 Task 4: Enhancing PLM for ReCAM with Special Tokens, Re-Ranking, Siamese Encoders and Back Translation0
IIE-NLP-NUT at SemEval-2020 Task 4: Guiding PLM with Prompt Template Reconstruction Strategy for ComVE0
Confidence-Aware Learning Assistant0
HindiLLM: Large Language Model for Hindi0
Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation0
Comparative Study of Learning Outcomes for Online Learning Platforms0
HFL-RC System at SemEval-2018 Task 11: Hybrid Multi-Aspects Model for Commonsense Reading Comprehension0
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information0
Assessing Large Language Models in Mechanical Engineering Education: A Study on Mechanics-Focused Conceptual Understanding0
An Algorithm for Generating Gap-Fill Multiple Choice Questions of an Expert System0
Combining Multiple Cues for Visual Madlibs Question Answering0
Have Large Language Models Developed a Personality?: Applicability of Self-Assessment Tests in Measuring Personality in LLMs0
HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models0
Combinatorial framework for planning in geological exploration0
Assessing Distractors in Multiple-Choice Tests0
HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing0
HardML: A Benchmark For Evaluating Data Science And Machine Learning knowledge and reasoning in AI0
Show:102550
← PrevPage 21 of 45Next →

No leaderboard results yet.