SOTAVerified

Benchmarking

Papers

Showing 38413850 of 5548 papers

TitleStatusHype
Benchmarking state-of-the-art gradient boosting algorithms for classification0
CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical DatasetCode0
Investigation of UAV Detection in Images with Complex Backgrounds and Rainy ArtifactsCode0
Analysis of modular CMA-ES on strict box-constrained problems in the SBOX-COST benchmarking suite0
GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and BenchmarkingCode0
BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer0
LAraBench: Benchmarking Arabic AI with Large Language Models0
Barkour: Benchmarking Animal-level Agility with Quadruped Robots0
R2H: Building Multimodal Navigation Helpers that Respond to Help Requests0
When the Music Stops: Tip-of-the-Tongue Retrieval for MusicCode0
Show:102550
← PrevPage 385 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified