SOTAVerified

Benchmarking

Papers

Showing 151160 of 5548 papers

TitleStatusHype
DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models0
FRED: The Florence RGB-Event Drone Dataset0
Debatable Intelligence: Benchmarking LLM Judges via Debate Speech EvaluationCode0
Refer to Anything with Vision-Language Prompts0
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos0
MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K CategoriesCode2
CzechLynx: A Dataset for Individual Identification and Pose Estimation of the Eurasian Lynx0
From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems0
HoliSafe: Holistic Safety Benchmarking and Modeling with Safety Meta Token for Vision-Language Model0
A Unified Framework for Provably Efficient Algorithms to Estimate Shapley Values0
Show:102550
← PrevPage 16 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified