SOTAVerified

Benchmarking

Papers

Showing 25912600 of 5548 papers

TitleStatusHype
Forecasting Future International Events: A Reliable Dataset for Text-Based Event ModelingCode0
Aesthetic Image Captioning From Weakly-Labelled PhotographsCode0
Defense-friendly Images in Adversarial Attacks: Dataset and Metrics for Perturbation DifficultyCode0
DefAn: Definitive Answer Dataset for LLMs Hallucination EvaluationCode0
Forecasting Across Time Series Databases using Recurrent Neural Networks on Groups of Similar Series: A Clustering ApproachCode0
FORLORN: A Framework for Comparing Offline Methods and Reinforcement Learning for Optimization of RAN ParametersCode0
Fluorescence Reference Target Quantitative Analysis LibraryCode0
Finding the Perfect Fit: Applying Regression Models to ClimateBench v1.0Code0
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word ProblemCode0
Benchmarking Graph Representations and Graph Neural Networks for Multivariate Time Series ClassificationCode0
Show:102550
← PrevPage 260 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified