SOTAVerified

Benchmarking

Papers

Showing 811820 of 5548 papers

TitleStatusHype
Machine Translation Meta Evaluation through Translation Accuracy Challenge SetsCode1
SciMMIR: Benchmarking Scientific Multi-modal Information RetrievalCode1
Dataset and Benchmark: Novel Sensors for Autonomous Vehicle PerceptionCode1
Benchmarking Large Multimodal Models against Common CorruptionsCode1
CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report LabelingCode1
RSUD20K: A Dataset for Road Scene Understanding In Autonomous DrivingCode1
CAVIAR: Co-simulation of 6G Communications, 3D Scenarios and AI for Digital TwinsCode1
German Text Embedding Clustering BenchmarkCode1
FinDABench: Benchmarking Financial Data Analysis Ability of Large Language ModelsCode1
Benchmarking Large Language Models on Controllable Generation under Diversified InstructionsCode1
Show:102550
← PrevPage 82 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified