SOTAVerified

Multiple-choice

Papers

Showing 151175 of 1107 papers

TitleStatusHype
FaceXBench: Evaluating Multimodal LLMs on Face UnderstandingCode1
Filter-then-Generate: Large Language Models with Structure-Text Adapter for Knowledge Graph CompletionCode1
Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission ExamsCode1
Explicit Planning Helps Language Models in Logical ReasoningCode1
African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object ClassificationCode1
Benchmarking AI scientists in omics data-driven biological researchCode1
An MRC Framework for Semantic Role LabelingCode1
Benchmarking Large Language Models on Answering and Explaining Challenging Medical QuestionsCode1
Evaluating language models as risk scoresCode1
Enhancing Knowledge Tracing with Concept Map and Response DisentanglementCode1
A Fine-tuning Dataset and Benchmark for Large Language Models for Protein UnderstandingCode1
BRAINTEASER: Lateral Thinking Puzzles for Large Language ModelsCode1
Annealed Winner-Takes-All for Motion ForecastingCode1
Boosting Healthcare LLMs Through Retrieved ContextCode1
An Open Source Data Contamination Report for Large Language ModelsCode1
From Machine Reading Comprehension to Dialogue State Tracking: Bridging the GapCode1
Estimating Contamination via Perplexity: Quantifying Memorisation in Language Model EvaluationCode1
GPT as Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA CapabilitiesCode1
LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language ModelsCode1
HCQA @ Ego4D EgoSchema Challenge 2024Code1
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language ModelsCode1
Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and LayersCode1
E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language ModelsCode1
EduQG: A Multi-format Multiple Choice Dataset for the Educational DomainCode1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingCode1
Show:102550
← PrevPage 7 of 45Next →

No leaderboard results yet.