Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 1107 papers

Title	Date	Tasks	Status	Hype
MiniCPM-V: A GPT-4V Level MLLM on Your Phone	Aug 3, 2024	HallucinationMultiple-choice	CodeCode Available	12
HealthBench: Evaluating Large Language Models Towards Improved Human Health	May 13, 2025	Instruction FollowingMultiple-choice	CodeCode Available	7
MMBench: Is Your Multi-modal Model an All-around Player?	Jul 12, 2023	AllInstruction Following	CodeCode Available	5
LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks	Dec 19, 2024	8kIn-Context Learning	CodeCode Available	5
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs	Jun 11, 2024	Multiple-choiceQuestion Answering	CodeCode Available	5
Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective	Oct 16, 2022	Coreference ResolutionMultiple-choice	CodeCode Available	4
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation	May 20, 2025	MMEMultiple-choice	CodeCode Available	4
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning	Mar 5, 2024	Multiple-choice	CodeCode Available	4
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models	Jan 30, 2023	Generative Visual Question AnsweringImage Captioning	CodeCode Available	4
I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench	Jan 31, 2024	BenchmarkingMultiple-choice	CodeCode Available	4
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens	Apr 4, 2024	Language ModelingLanguage Modelling	CodeCode Available	4
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text	Mar 27, 2024	ArticlesLanguage Modeling	CodeCode Available	4
Flamingo: a Visual Language Model for Few-Shot Learning	Apr 29, 2022	Few-Shot LearningGenerative Visual Question Answering	CodeCode Available	4
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection	Nov 16, 2023	Language ModelingLanguage Modelling	CodeCode Available	4
SEED-Bench: Benchmarking Multimodal Large Language Models	Jan 1, 2024	BenchmarkingImage Generation	CodeCode Available	3
Fine-Tuning Language Models with Just Forward Passes	May 27, 2023	GPUIn-Context Learning	CodeCode Available	3
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models	May 15, 2023	Multiple-choice	CodeCode Available	3
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding	Apr 8, 2024	GPUMultiple-choice	CodeCode Available	3
PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language Models	Mar 26, 2024	Code CompletionFew-Shot Learning	CodeCode Available	3
Improving Model Evaluation using SMART Filtering of Benchmark Datasets	Oct 26, 2024	ChatbotDiversity	CodeCode Available	3
Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset	May 17, 2024	16kBenchmarking	CodeCode Available	3
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models	Aug 2, 2024	Multimodal ReasoningMultiple-choice	CodeCode Available	3
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension	Apr 25, 2024	BenchmarkingMultiple-choice	CodeCode Available	3
FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models	Aug 19, 2023	Multiple-choice	CodeCode Available	2
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models	Aug 5, 2024	Image ComprehensionMultiple-choice	CodeCode Available	2

Show:10 25 50

← PrevPage 1 of 45Next →

No leaderboard results yet.