SOTAVerified

Benchmarking

Papers

Showing 41514200 of 5548 papers

TitleStatusHype
CLMB: deep contrastive learning for robust metagenomic binningCode0
Benchmarking and scaling of deep learning models for land cover image classificationCode1
Benchmarking Quality-Dependent and Cost-Sensitive Score-Level Multimodal Biometric Fusion Algorithms0
MSAMSum: Towards Benchmarking Multi-lingual Dialogue Summarization0
Fantastic Questions and Where to Find Them: FairytaleQA--An Authentic Dataset for Narrative Comprehension0
FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding0
Mukayese: Turkish NLP Strikes Back0
CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning AlgorithmsCode3
Multiclass Optimal Classification Trees with SVM-splits0
Benchmarking deep generative models for diverse antibody sequence design0
ADCB: An Alzheimer's disease benchmark for evaluating observational estimators of causal effects0
Bi-Discriminator Class-Conditional Tabular GAN0
MLHarness: A Scalable Benchmarking System for MLCommons0
Which priors matter? Benchmarking models for learning latent dynamicsCode1
EvoLearner: Learning Description Logics with Evolutionary AlgorithmsCode0
Practical, Fast and Robust Point Cloud Registration for 3D Scene Stitching and Object Localization0
Characterizing the adversarial vulnerability of speech self-supervised learning0
Graph Robustness Benchmark: Benchmarking the Adversarial Robustness of Graph Machine LearningCode1
Personalized Benchmarking with the Ludwig Benchmarking ToolkitCode3
IOHexperimenter: Benchmarking Platform for Iterative Optimization HeuristicsCode1
Benchmarking Data-driven Surrogate Simulators for Artificial Electromagnetic MaterialsCode1
A new baseline for retinal vessel segmentation: Numerical identification and correction of methodological inconsistencies affecting 100+ papersCode0
Benchmarking Multimodal AutoML for Tabular Data with Text FieldsCode3
B-Pref: Benchmarking Preference-Based Reinforcement LearningCode1
OpenFWI: Large-Scale Multi-Structural Benchmark Datasets for Seismic Full Waveform InversionCode1
Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies0
Virus-MNIST: Machine Learning Baseline Calculations for Image Classification0
Procedural Generalization by Planning with Self-Supervised World Models0
Don’t be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue SystemCode1
Constructing a Psychometric Testbed for Fair Natural Language ProcessingCode0
Benchmarking Meta-embeddings: What Works and What Does NotCode1
Automatic Resolution of Domain Name DisputesCode0
Who’s on First?: Probing the Learning and Representation Capabilities of Language Models on Deterministic Closed DomainsCode0
OPF-Learn: An Open-Source Framework for Creating Representative AC Optimal Power Flow DatasetsCode1
AdaPool: Exponential Adaptive Pooling for Information-Retaining DownsamplingCode1
Livestock Monitoring with Transformer0
Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image SegmentationCode0
Towards a Taxonomy of Graph Learning Datasets0
FTNet: Feature Transverse Network for Thermal Image Semantic SegmentationCode1
Quantum Boosting using Domain-Partitioning HypothesesCode0
Which Model to Trust: Assessing the Influence of Models on the Performance of Reinforcement Learning Algorithms for Continuous Control TasksCode0
Identifying and Benchmarking Natural Out-of-Context Prediction ProblemsCode0
Scientific Machine Learning Benchmarks0
Benchmarking of Lightweight Deep Learning Architectures for Skin Cancer Classification using ISIC 2017 Dataset0
Learning with Noisy Labels Revisited: A Study Using Real-World Human AnnotationsCode1
MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems0
OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit SynthesisCode1
Text-Based Person Search with Limited DataCode1
Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair PredictionCode0
An Open Natural Language Processing Development Framework for EHR-based Clinical Research: A case demonstration using the National COVID Cohort Collaborative (N3C)0
Show:102550
← PrevPage 84 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified