SOTAVerified

Benchmarking

Papers

Showing 14611470 of 5548 papers

TitleStatusHype
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language ModelsCode1
OpenFWI: Large-Scale Multi-Structural Benchmark Datasets for Seismic Full Waveform InversionCode1
CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasksCode1
Amharic LLaMA and LLaVA: Multimodal LLMs for Low Resource LanguagesCode1
Benchmarking Graph Neural Networks on Dynamic Link PredictionCode1
Benchmarking Graph Neural Networks for FMRI analysisCode1
FinDABench: Benchmarking Financial Data Analysis Ability of Large Language ModelsCode1
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language ModelsCode1
Large Scale MRI Collection and Segmentation of Cirrhotic LiverCode1
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmark of Large Language Models in Mental Health CounselingCode1
Show:102550
← PrevPage 147 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified