SOTAVerified

Benchmarking

Papers

Showing 25112520 of 5548 papers

TitleStatusHype
LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied AgentsCode2
Privacy-Preserving Language Model Inference with Instance Obfuscation0
BdSLW60: A Word-Level Bangla Sign Language DatasetCode0
EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages0
Customizable Perturbation Synthesis for Robust SLAM BenchmarkingCode2
Impact of spatial transformations on landscape features of CEC2022 basic benchmark problems0
Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT0
AIR-Bench: Benchmarking Large Audio-Language Models via Generative ComprehensionCode2
Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking StudyCode0
Explainable Global Wildfire Prediction Models using Graph Neural NetworksCode1
Show:102550
← PrevPage 252 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified