SOTAVerified

Multiple-choice

Papers

Showing 601625 of 1107 papers

TitleStatusHype
Narrative Embedding: Re-Contextualization Through Attention0
VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks0
NEMO: Can Multimodal LLMs Identify Attribute-Modified Objects?0
AgMMU: A Comprehensive Agricultural Multimodal Understanding and Reasoning Benchmark0
Network-based Representations and Dynamic Discrete Choice Models for Multiple Discrete Choice Analysis0
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning0
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation0
NEWSKVQA: Knowledge-Aware News Video Question Answering0
Video Instruction Tuning With Synthetic Data0
None of the Above, Less of the Right: Parallel Patterns between Humans and LLMs on Multi-Choice Questions Answering0
None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks0
No Task Left Behind: Multi-Task Learning of Knowledge Tracing and Option Tracing for Better Student Assessment0
Note on Combinatorial Engineering Frameworks for Hierarchical Modular Systems0
Note on Evolution and Forecasting of Requirements: Communications Example0
Novel-WD: Exploring acquisition of Novel World Knowledge in LLMs Using Prefix-Tuning0
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models0
Objective quantification of mood states using large language models0
OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities0
OLMES: A Standard for Language Model Evaluations0
OmniEval: A Benchmark for Evaluating Omni-modal Models with Visual, Auditory, and Textual Inputs0
Online Joint Bid/Daily Budget Optimization of Internet Advertising Campaigns0
On the application of Transformers for estimating the difficulty of Multiple-Choice Questions from text0
On the Performance of Multimodal Language Models0
On the Principles behind Opinion Dynamics in Multi-Agent Systems of Large Language Models0
On the Reasoning Capacity of AI Models and How to Quantify It0
Show:102550
← PrevPage 25 of 45Next →

No leaderboard results yet.