SOTAVerified

Benchmarking

Papers

Showing 39013925 of 5548 papers

TitleStatusHype
Learn-to-Race Challenge 2022: Benchmarking Safe Learning and Cross-domain Generalisation in Autonomous Racing0
Surface Reconstruction from Point Clouds: A Survey and a Benchmark0
Creating a Forensic Database of Shoeprints from Online Shoe Tread PhotosCode1
On Continual Model Refinement in Out-of-Distribution Data Streams0
Training Mixed-Domain Translation Models via Federated Learning0
To Find Waldo You Need Contextual Cues: Debiasing Who’s WaldoCode0
MMCoQA: Conversational Question Answering over Text, Tables, and ImagesCode0
MSAMSum: Towards Benchmarking Multi-lingual Dialogue SummarizationCode0
Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset for Narrative Comprehension0
Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny DetectionCode0
Continual Learning with Foundation Models: An Empirical Study of Latent ReplayCode1
Answer Consolidation: Formulation and BenchmarkingCode0
Watts: Infrastructure for Open-Ended LearningCode0
A Collection of Quality Diversity Optimization Problems Derived from Hyperparameter Optimization of Machine Learning ModelsCode0
Foundations for learning from noisy quantum experiments0
Benchmarking the Hooke-Jeeves Method, MTS-LS1, and BSrr on the Large-scale BBOB Function SetCode0
Label Anchored Contrastive Learning for Language Understanding0
Deeper Insights into the Robustness of ViTs towards Common Corruptions0
Causal Reasoning Meets Visual Representation Learning: A Prospective Study0
Transformation-Interaction-Rational Representation for Symbolic RegressionCode0
A global analysis of metrics used for measuring performance in natural language processingCode1
MOLE: Digging Tunnels Through Multimodal Multi-Objective LandscapesCode0
Changepoint Detection in Noisy Data Using a Novel Residuals Permutation-Based Method (RESPERM): Benchmarking and Application to Single Trial ERPsCode0
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics0
Learning to Fold Real Garments with One Arm: A Case Study in Cloud-Based Robotics Research0
Show:102550
← PrevPage 157 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified