SOTAVerified

Benchmarking

Papers

Showing 19111920 of 5548 papers

TitleStatusHype
Comics Datasets Framework: Mix of Comics datasets for detection benchmarkingCode1
Social Bias in Large Language Models For Bangla: An Empirical Study on Gender and Religious BiasCode0
CoIR: A Comprehensive Benchmark for Code Information Retrieval ModelsCode2
GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language ModelsCode1
Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking DatasetCode1
TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations0
Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining TasksCode0
Open foundation models for Azerbaijani language0
Occlusion-Aware Seamless SegmentationCode1
MIRAI: Evaluating LLM Agents for Event Forecasting0
Show:102550
← PrevPage 192 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified