SOTAVerified

Benchmarking

Papers

Showing 42764300 of 5548 papers

TitleStatusHype
MSAMSum: Towards Benchmarking Multi-lingual Dialogue SummarizationCode0
MMCoQA: Conversational Question Answering over Text, Tables, and ImagesCode0
Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset for Narrative Comprehension0
To Find Waldo You Need Contextual Cues: Debiasing Who’s WaldoCode0
Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny DetectionCode0
Answer Consolidation: Formulation and BenchmarkingCode0
Foundations for learning from noisy quantum experiments0
Watts: Infrastructure for Open-Ended LearningCode0
A Collection of Quality Diversity Optimization Problems Derived from Hyperparameter Optimization of Machine Learning ModelsCode0
Benchmarking the Hooke-Jeeves Method, MTS-LS1, and BSrr on the Large-scale BBOB Function SetCode0
Deeper Insights into the Robustness of ViTs towards Common Corruptions0
Causal Reasoning Meets Visual Representation Learning: A Prospective Study0
Label Anchored Contrastive Learning for Language Understanding0
Transformation-Interaction-Rational Representation for Symbolic RegressionCode0
MOLE: Digging Tunnels Through Multimodal Multi-Objective LandscapesCode0
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics0
Changepoint Detection in Noisy Data Using a Novel Residuals Permutation-Based Method (RESPERM): Benchmarking and Application to Single Trial ERPsCode0
Learning to Fold Real Garments with One Arm: A Case Study in Cloud-Based Robotics Research0
Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations0
Analyzing the Impact of Undersampling on the Benchmarking and Configuration of Evolutionary Algorithms0
Label Efficient Regularization and Propagation for Graph Node Classification0
Radio Galaxy Zoo: Using semi-supervised learning to leverage large unlabelled data-sets for radio galaxy classification under data-set shiftCode0
Benchmarking Domain Generalization on EEG-based Emotion Recognition0
SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos0
From Environmental Sound Representation to Robustness of 2D CNN Models Against Adversarial Attacks0
Show:102550
← PrevPage 172 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified