SOTAVerified

Multiple-choice

Papers

Showing 676700 of 1107 papers

TitleStatusHype
Rethinking AI Cultural Alignment0
Rethinking Generative Large Language Model Evaluation for Semantic Comprehension0
Reusing Swedish FrameNet for training semantic roles0
Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions0
RiddleSense: Reasoning about Riddle Questions Featuring Linguistic Creativity and Commonsense Knowledge0
RISCORE: Enhancing In-Context Riddle Solving in Language Models through Context-Reconstructed Example Augmentation0
R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest0
Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets0
Robust portfolio optimization model for electronic coupon allocation0
Visual Madlibs: Fill in the blank Image Generation and Question Answering0
SafePath: Conformal Prediction for Safe LLM-Based Autonomous Navigation0
Adversarial Training for Machine Reading Comprehension with Virtual Embeddings0
SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text0
Visual Question Answering as Reading Comprehension0
Adversarial Databases Improve Success in Retrieval-based Large Language Models0
SaL-Lightning Dataset: Search and Eye Gaze Behavior, Resource Interactions and Knowledge Gain during Web Search0
Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models0
SARI: Structured Audio Reasoning via Curriculum-Guided Reinforcement Learning0
SaudiCulture: A Benchmark for Evaluating Large Language Models Cultural Competence within Saudi Arabia0
SB-Bench: Stereotype Bias Benchmark for Large Multimodal Models0
SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark0
Scene Restoring for Narrative Machine Reading Comprehension0
Scheduling Algorithms for Federated Learning with Minimal Energy Consumption0
VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare0
GeoSQA: A Benchmark for Scenario-based Question Answering in the Geography Domain at High School Level0
Show:102550
← PrevPage 28 of 45Next →

No leaderboard results yet.