SOTAVerified

Benchmarking

Papers

Showing 42514300 of 5548 papers

TitleStatusHype
RCC-GAN: Regularized Compound Conditional GAN for Large-Scale Tabular Data Synthesis0
Advanced Manufacturing Configuration by Sample-efficient Batch Bayesian Optimization0
Graph-theoretical approach to robust 3D normal extraction of LiDAR dataCode0
Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining DatasetsCode0
Generalization, Mayhems and Limits in Recurrent Proximal Policy Optimization0
Paddy Doctor: A Visual Image Dataset for Automated Paddy Disease Classification and Benchmarking0
Deep Learning vs. Gradient Boosting: Benchmarking state-of-the-art machine learning algorithms for credit scoring0
Self-Supervised Speech Representation Learning: A Review0
SNaC: Coherence Error Detection for Narrative SummarizationCode0
Entity Alignment For Knowledge Graphs: Progress, Challenges, and Empirical Studies0
Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data0
Uncertainty estimation for Cross-dataset performance in Trajectory prediction0
Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking0
Beyond Static Models and Test Sets: Benchmarking the Potential of Pre-trained Models Across Tasks and Languages0
Individual Fairness Guarantees for Neural NetworksCode0
Subspace Learning Machine (SLM): Methodology and Performance0
Towards Intersectionality in Machine Learning: Including More Identities, Handling Underrepresentation, and Performing EvaluationCode0
LayoutXLM vs. GNN: An Empirical Evaluation of Relation Extraction for Documents0
Assigning Species Information to Corresponding Genes by a Sequence Labeling FrameworkCode0
Design Target Achievement Index: A Differentiable Metric to Enhance Deep Generative Models in Multi-Objective Inverse Design0
VFHQ: A High-Quality Dataset and Benchmark for Video Face Super-Resolution0
Surface Reconstruction from Point Clouds: A Survey and a Benchmark0
Learn-to-Race Challenge 2022: Benchmarking Safe Learning and Cross-domain Generalisation in Autonomous Racing0
On Continual Model Refinement in Out-of-Distribution Data Streams0
Training Mixed-Domain Translation Models via Federated Learning0
MSAMSum: Towards Benchmarking Multi-lingual Dialogue SummarizationCode0
MMCoQA: Conversational Question Answering over Text, Tables, and ImagesCode0
Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset for Narrative Comprehension0
To Find Waldo You Need Contextual Cues: Debiasing Who’s WaldoCode0
Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny DetectionCode0
Answer Consolidation: Formulation and BenchmarkingCode0
Foundations for learning from noisy quantum experiments0
Watts: Infrastructure for Open-Ended LearningCode0
A Collection of Quality Diversity Optimization Problems Derived from Hyperparameter Optimization of Machine Learning ModelsCode0
Benchmarking the Hooke-Jeeves Method, MTS-LS1, and BSrr on the Large-scale BBOB Function SetCode0
Deeper Insights into the Robustness of ViTs towards Common Corruptions0
Causal Reasoning Meets Visual Representation Learning: A Prospective Study0
Label Anchored Contrastive Learning for Language Understanding0
Transformation-Interaction-Rational Representation for Symbolic RegressionCode0
MOLE: Digging Tunnels Through Multimodal Multi-Objective LandscapesCode0
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics0
Changepoint Detection in Noisy Data Using a Novel Residuals Permutation-Based Method (RESPERM): Benchmarking and Application to Single Trial ERPsCode0
Learning to Fold Real Garments with One Arm: A Case Study in Cloud-Based Robotics Research0
Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations0
Analyzing the Impact of Undersampling on the Benchmarking and Configuration of Evolutionary Algorithms0
Label Efficient Regularization and Propagation for Graph Node Classification0
Radio Galaxy Zoo: Using semi-supervised learning to leverage large unlabelled data-sets for radio galaxy classification under data-set shiftCode0
Benchmarking Domain Generalization on EEG-based Emotion Recognition0
SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos0
From Environmental Sound Representation to Robustness of 2D CNN Models Against Adversarial Attacks0
Show:102550
← PrevPage 86 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified