SOTAVerified

Benchmarking

Papers

Showing 39013950 of 5548 papers

TitleStatusHype
Learn-to-Race Challenge 2022: Benchmarking Safe Learning and Cross-domain Generalisation in Autonomous Racing0
Surface Reconstruction from Point Clouds: A Survey and a Benchmark0
Creating a Forensic Database of Shoeprints from Online Shoe Tread PhotosCode1
On Continual Model Refinement in Out-of-Distribution Data Streams0
Training Mixed-Domain Translation Models via Federated Learning0
To Find Waldo You Need Contextual Cues: Debiasing Who’s WaldoCode0
MMCoQA: Conversational Question Answering over Text, Tables, and ImagesCode0
MSAMSum: Towards Benchmarking Multi-lingual Dialogue SummarizationCode0
Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset for Narrative Comprehension0
Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny DetectionCode0
Continual Learning with Foundation Models: An Empirical Study of Latent ReplayCode1
Answer Consolidation: Formulation and BenchmarkingCode0
Watts: Infrastructure for Open-Ended LearningCode0
A Collection of Quality Diversity Optimization Problems Derived from Hyperparameter Optimization of Machine Learning ModelsCode0
Foundations for learning from noisy quantum experiments0
Benchmarking the Hooke-Jeeves Method, MTS-LS1, and BSrr on the Large-scale BBOB Function SetCode0
Label Anchored Contrastive Learning for Language Understanding0
Deeper Insights into the Robustness of ViTs towards Common Corruptions0
Causal Reasoning Meets Visual Representation Learning: A Prospective Study0
Transformation-Interaction-Rational Representation for Symbolic RegressionCode0
A global analysis of metrics used for measuring performance in natural language processingCode1
MOLE: Digging Tunnels Through Multimodal Multi-Objective LandscapesCode0
Changepoint Detection in Noisy Data Using a Novel Residuals Permutation-Based Method (RESPERM): Benchmarking and Application to Single Trial ERPsCode0
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics0
Learning to Fold Real Garments with One Arm: A Case Study in Cloud-Based Robotics Research0
Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations0
Analyzing the Impact of Undersampling on the Benchmarking and Configuration of Evolutionary Algorithms0
K-LITE: Learning Transferable Visual Models with External KnowledgeCode2
Radio Galaxy Zoo: Using semi-supervised learning to leverage large unlabelled data-sets for radio galaxy classification under data-set shiftCode0
Label Efficient Regularization and Propagation for Graph Node Classification0
Benchmarking Domain Generalization on EEG-based Emotion Recognition0
NICO++: Towards Better Benchmarking for Domain GeneralizationCode1
Stress-Testing Point Cloud Registration on Automotive LiDARCode1
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP TasksCode3
Deep learning model solves change point detection for multiple change typesCode1
SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos0
From Environmental Sound Representation to Robustness of 2D CNN Models Against Adversarial Attacks0
Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery LocalizationCode1
Benchmarking Active Learning Strategies for Materials Optimization and Discovery0
EVOPS Benchmark: Evaluation of Plane Segmentation from RGBD and LiDAR Data0
From Modern CNNs to Vision Transformers: Assessing the Performance, Robustness, and Classification Strategies of Deep Learning Models in HistopathologyCode0
Data Splits and Metrics for Method Benchmarking on Surgical Action Triplet DatasetsCode1
Metaethical Perspectives on 'Benchmarking' AI Ethics0
Benchmarking for Public Health Surveillance tasks on Social Media with a Domain-Specific Pretrained Language Model0
BioRED: A Rich Biomedical Relation Extraction DatasetCode1
Disability prediction in multiple sclerosis using performance outcome measures and demographic data0
tmVar 3.0: an improved variant concept recognition and normalization tool0
Deep Visual Geo-localization BenchmarkCode2
The Moral Integrity Corpus: A Benchmark for Ethical Dialogue SystemsCode1
CLEAVE: Scalable and Edge-native Benchmarking of Networked Control SystemsCode0
Show:102550
← PrevPage 79 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified