SOTAVerified

Benchmarking

Papers

Showing 42514260 of 5548 papers

TitleStatusHype
Extensible Logging and Empirical Attainment Function for IOHexperimenter0
Context-guided Triple Matching for Multiple Choice Question Answering0
PASS: An ImageNet replacement for self-supervised pretraining without humansCode1
FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language UnderstandingCode1
Disentangled Feature Representation for Few-shot Image ClassificationCode1
MetaDrive: Composing Diverse Driving Scenarios for Generalizable Reinforcement LearningCode2
Curb Your Carbon Emissions: Benchmarking Carbon Emissions in Machine Translation0
Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue SystemCode1
Benchmarking Lane-changing Decision-making for Deep Reinforcement Learning0
Benchmarking Augmentation Methods for Learning Robust Navigation Agents: the Winning Entry of the 2021 iGibson Challenge0
Show:102550
← PrevPage 426 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified