SOTAVerified

Benchmarking

Papers

Showing 47714780 of 5548 papers

TitleStatusHype
MineRL: A Large-Scale Dataset of Minecraft DemonstrationsCode0
GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal DataCode0
GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in ExplanationsCode0
Mining-Gym: A Configurable RL Benchmarking Environment for Truck Dispatch SchedulingCode0
Fully Automatic Segmentation of Gross Target Volume and Organs-at-Risk for Radiotherapy Planning of Nasopharyngeal CarcinomaCode0
MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context UnderstandingCode0
Mirage: Model-Agnostic Graph Distillation for Graph ClassificationCode0
Benchmarking Subset Selection from Large Candidate Solution Sets in Evolutionary Multi-objective OptimizationCode0
Sanity Simulations for Saliency MethodsCode0
From Variability to Stability: Advancing RecSys Benchmarking PracticesCode0
Show:102550
← PrevPage 478 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified