SOTAVerified

Multiple-choice

Papers

Showing 3140 of 1107 papers

TitleStatusHype
Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language ModelsCode0
Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM EvaluationCode0
TCM-Ladder: A Benchmark for Multimodal Question Answering on Traditional Chinese Medicine0
DyePack: Provably Flagging Test Set Contamination in LLMs Using BackdoorsCode0
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence0
SNS-Bench-VL: Benchmarking Multimodal Large Language Models in Social Networking ServicesCode0
VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-TuningCode2
Image Aesthetic Reasoning: A New Benchmark for Medical Image Screening with MLLMs0
Large Language Models Often Know When They Are Being Evaluated0
SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge0
Show:102550
← PrevPage 4 of 111Next →

No leaderboard results yet.