SOTAVerified

Multiple-choice

Papers

Showing 661670 of 1107 papers

TitleStatusHype
OpsEval: A Comprehensive IT Operations Benchmark Suite for Large Language ModelsCode1
BRAINTEASER: Lateral Thinking Puzzles for Large Language ModelsCode1
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks0
LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language ModelsCode1
On the Performance of Multimodal Language Models0
AutoCast++: Enhancing World Event Prediction with Zero-shot Ranking-based Context RetrievalCode0
Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute MisconceptionsCode0
Language Models as Knowledge Bases for Visual Word Sense DisambiguationCode0
Fusing Models with Complementary ExpertiseCode0
Fool Your (Vision and) Language Model With Embarrassingly Simple PermutationsCode1
Show:102550
← PrevPage 67 of 111Next →

No leaderboard results yet.