SOTAVerified

Benchmarking

Papers

Showing 691700 of 5548 papers

TitleStatusHype
African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object ClassificationCode1
BeHonest: Benchmarking Honesty in Large Language ModelsCode1
Job-SDF: A Multi-Granularity Dataset for Job Skill Demand Forecasting and BenchmarkingCode1
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language ModelsCode1
A GPU-accelerated Large-scale Simulator for Transportation System Optimization BenchmarkingCode1
VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMsCode1
Benchmarking Spectral Graph Neural Networks: A Comprehensive Study on Effectiveness and EfficiencyCode1
LUMA: A Benchmark Dataset for Learning from Uncertain and Multimodal DataCode1
SR-CACO-2: A Dataset for Confocal Fluorescence Microscopy Image Super-ResolutionCode1
SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language ModelsCode1
Show:102550
← PrevPage 70 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified