SOTAVerified

Benchmarking

Papers

Showing 951960 of 5548 papers

TitleStatusHype
IntelliGraphs: Datasets for Benchmarking Knowledge Graph GenerationCode1
A Comprehensive Overview of Large Language ModelsCode1
AnuraSet: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoringCode1
Benchmarking Algorithms for Federated Domain GeneralizationCode1
A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified BenchmarkCode1
Benchmarking Test-Time Adaptation against Distribution Shifts in Image ClassificationCode1
Uncovering the Limits of Machine Learning for Automatic Vulnerability DetectionCode1
SCENEREPLICA: Benchmarking Real-World Robot Manipulation by Creating Replicable ScenesCode1
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMsCode1
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolutionCode1
Show:102550
← PrevPage 96 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified