SOTAVerified

Multiple-choice

Papers

Showing 501550 of 1107 papers

TitleStatusHype
A statistical model for aggregating judgments by incorporating peer predictions0
Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III0
Hypothesis Testing for Quantifying LLM-Human Misalignment in Multiple Choice Settings0
Identification of mental fatigue in language comprehension tasks based on EEG and deep learning0
Treatment Effects with Multidimensional Unobserved Heterogeneity: Identification of the Marginal Treatment Effect0
Identifying Multiple Personalities in Large Language Models with External Evaluation0
How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?0
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites0
IIE-NLP-Eyas at SemEval-2021 Task 4: Enhancing PLM for ReCAM with Special Tokens, Re-Ranking, Siamese Encoders and Back Translation0
IIE-NLP-NUT at SemEval-2020 Task 4: Guiding PLM with Prompt Template Reconstruction Strategy for ComVE0
Confidence-Aware Learning Assistant0
HindiLLM: Large Language Model for Hindi0
Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation0
Comparative Study of Learning Outcomes for Online Learning Platforms0
HFL-RC System at SemEval-2018 Task 11: Hybrid Multi-Aspects Model for Commonsense Reading Comprehension0
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information0
Assessing Large Language Models in Mechanical Engineering Education: A Study on Mechanics-Focused Conceptual Understanding0
An Algorithm for Generating Gap-Fill Multiple Choice Questions of an Expert System0
Combining Multiple Cues for Visual Madlibs Question Answering0
Have Large Language Models Developed a Personality?: Applicability of Self-Assessment Tests in Measuring Personality in LLMs0
HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models0
Combinatorial framework for planning in geological exploration0
Assessing Distractors in Multiple-Choice Tests0
HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing0
HardML: A Benchmark For Evaluating Data Science And Machine Learning knowledge and reasoning in AI0
Assessing AI-Generated Questions' Alignment with Cognitive Frameworks in Educational Assessment0
An AI-based Solution for Enhancing Delivery of Digital Learning for Future Teachers0
Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models0
HANS, are you clever? Clever Hans Effect Analysis of Neural Systems0
Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation0
Collaboration among Multiple Large Language Models for Medical Question Answering0
Is There No Such Thing as a Bad Question? H4R: HalluciBot For Ratiocination, Rewriting, Ranking, and Routing0
Cognitive Biases in Large Language Models: A Survey and Mitigation Experiments0
Graph-Structured Representations for Visual Question Answering0
GraphITE: Estimating Individual Effects of Graph-structured Treatments0
COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain0
GRAF: Graph Retrieval Augmented by Facts for Romanian Legal Multi-Choice Question Answering0
CodeReviewQA: The Code Review Comprehension Assessment for Large Language Models0
A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs0
GPT-4 to GPT-3.5: 'Hold My Scalpel' -- A Look at the Competency of OpenAI's GPT on the Plastic Surgery In-Service Training Exam0
GPT-4o System Card0
CoddLLM: Empowering Large Language Models for Data Analytics0
A Semantic Parsing Algorithm to Solve Linear Ordering Problems0
Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark0
Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning0
GeoCode-GPT: A Large Language Model for Geospatial Code Generation Tasks0
A Semantic Feature-Wise Transformation Relation Network for Automatic Short Answer Grading0
An Add-On for Empowering Google Forms to be an Automatic Question Generator in Online Assessments0
Genome-Bench: A Scientific Reasoning Benchmark from Real-World Expert Discussions0
GenNet : Reading Comprehension with Multiple Choice Questions using Generation and Selection model0
Show:102550
← PrevPage 11 of 23Next →

No leaderboard results yet.