SOTAVerified

Benchmarking

Papers

Showing 32713280 of 5548 papers

TitleStatusHype
Investigation of UAV Detection in Images with Complex Backgrounds and Rainy ArtifactsCode0
CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical DatasetCode0
KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range MultilaterationCode1
Analysis of modular CMA-ES on strict box-constrained problems in the SBOX-COST benchmarking suite0
Barkour: Benchmarking Animal-level Agility with Quadruped Robots0
GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and BenchmarkingCode0
BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer0
LAraBench: Benchmarking Arabic AI with Large Language Models0
Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet ExtractionCode0
ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain Readability AssessmentCode1
Show:102550
← PrevPage 328 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified