SOTAVerified

Multiple-choice

Papers

Showing 251275 of 1107 papers

TitleStatusHype
HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing0
A multimodal dataset for understanding the impact of mobile phones on remote online virtual educationCode0
LLM Distillation for Efficient Few-Shot Multiple Choice Question Answering0
Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCTCode0
Neptune: The Long Orbit to Benchmarking Long Video UnderstandingCode2
Filter-then-Generate: Large Language Models with Structure-Text Adapter for Knowledge Graph CompletionCode1
MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal ModelsCode0
Evaluating and Mitigating Social Bias for Large Language Models in Open-ended SettingsCode0
ACQ: A Unified Framework for Automated Programmatic Creativity in Online Advertising0
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning DistractorCode0
MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects0
Establishing Task Scaling Laws via Compute-Efficient Model Ladders0
GRAF: Graph Retrieval Augmented by Facts for Romanian Legal Multi-Choice Question Answering0
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?Code1
SailCompass: Towards Reproducible and Robust Evaluation for Southeast Asian LanguagesCode1
Noise Injection Reveals Hidden Capabilities of Sandbagging Language ModelsCode0
The use of large language models to enhance cancer clinical trial educational materials0
Unlocking Video-LLM via Agent-of-Thoughts Distillation0
Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages0
VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric InformationCode1
KnowledgePrompts: Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced PromptingCode0
Cognitive Biases in Large Language Models: A Survey and Mitigation Experiments0
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark0
Applying IRT to Distinguish Between Human and Generative AI Responses to Multiple-Choice Assessments0
Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers0
Show:102550
← PrevPage 11 of 45Next →

No leaderboard results yet.