| Chaos as an interpretable benchmark for forecasting and data-driven modelling | Oct 11, 2021 | BenchmarkingSymbolic Regression | CodeCode Available | 1 |
| SERAB: A multi-lingual benchmark for speech emotion recognition | Oct 7, 2021 | BenchmarkingEmotion Recognition | CodeCode Available | 1 |
| EntQA: Entity Linking as Question Answering | Oct 5, 2021 | BenchmarkingEntity Linking | CodeCode Available | 1 |
| Revisiting Self-Training for Few-Shot Learning of Language Model | Oct 4, 2021 | BenchmarkingFew-Shot Learning | CodeCode Available | 1 |
| Machine Learning with Knowledge Constraints for Process Optimization of Open-Air Perovskite Solar Cell Manufacturing | Oct 1, 2021 | Bayesian OptimizationBenchmarking | CodeCode Available | 1 |
| Phonetic Word Embeddings | Sep 30, 2021 | BenchmarkingWord Embeddings | CodeCode Available | 1 |
| MedPerf: Open Benchmarking Platform for Medical Artificial Intelligence using Federated Evaluation | Sep 29, 2021 | BenchmarkingPhilosophy | CodeCode Available | 1 |
| Benchmarking Graph Neural Networks on Dynamic Link Prediction | Sep 29, 2021 | BenchmarkingDynamic Link Prediction | CodeCode Available | 1 |
| "How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken Conversations | Sep 28, 2021 | BenchmarkingDialogue State Tracking | CodeCode Available | 1 |
| FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding | Sep 27, 2021 | BenchmarkingNatural Language Understanding | CodeCode Available | 1 |
| PASS: An ImageNet replacement for self-supervised pretraining without humans | Sep 27, 2021 | BenchmarkingEthics | CodeCode Available | 1 |
| Disentangled Feature Representation for Few-shot Image Classification | Sep 26, 2021 | BenchmarkingClassification | CodeCode Available | 1 |
| Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System | Sep 23, 2021 | BenchmarkingResponse Generation | CodeCode Available | 1 |
| SubseasonalClimateUSA: A Dataset for Subseasonal Forecasting and Benchmarking | Sep 21, 2021 | Benchmarking | CodeCode Available | 1 |
| AI Accelerator Survey and Trends | Sep 18, 2021 | BenchmarkingComputational Efficiency | CodeCode Available | 1 |
| Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge Graphs | Sep 18, 2021 | BenchmarkingComplex Query Answering | CodeCode Available | 1 |
| Benchmarking Commonsense Knowledge Base Population with an Effective Evaluation Dataset | Sep 16, 2021 | BenchmarkingKnowledge Base Population | CodeCode Available | 1 |
| OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication | Sep 16, 2021 | 3D Object DetectionBenchmarking | CodeCode Available | 1 |
| Benchmarking the Spectrum of Agent Capabilities | Sep 14, 2021 | Benchmarking | CodeCode Available | 1 |
| RobustART: Benchmarking Robustness on Architecture Design and Training Techniques | Sep 11, 2021 | Adversarial RobustnessBenchmarking | CodeCode Available | 1 |
| Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through Lexica | Sep 6, 2021 | Benchmarking | CodeCode Available | 1 |
| Scikit-dimension: a Python package for intrinsic dimension estimation | Sep 6, 2021 | Benchmarking | CodeCode Available | 1 |
| Biomedical Data-to-Text Generation via Fine-Tuning Transformers | Sep 3, 2021 | BenchmarkingData-to-Text Generation | CodeCode Available | 1 |
| ReMeDi: Resources for Multi-domain, Multi-service, Medical Dialogues | Sep 1, 2021 | BenchmarkingContrastive Learning | CodeCode Available | 1 |
| Semi-Supervised Exaggeration Detection of Health Science Press Releases | Aug 30, 2021 | ArticlesBenchmarking | CodeCode Available | 1 |
| Tune It or Don't Use It: Benchmarking Data-Efficient Image Classification | Aug 30, 2021 | Benchmarkingimage-classification | CodeCode Available | 1 |
| KO codes: Inventing Nonlinear Encoding and Decoding for Reliable Wireless Communication via Deep-learning | Aug 29, 2021 | BenchmarkingDecoder | CodeCode Available | 1 |
| Searching for an Effective Defender: Benchmarking Defense against Adversarial Word Substitution | Aug 29, 2021 | Benchmarking | CodeCode Available | 1 |
| Pulling Up by the Causal Bootstraps: Causal Data Augmentation for Pre-training Debiasing | Aug 27, 2021 | BenchmarkingData Augmentation | CodeCode Available | 1 |
| A Unified Taxonomy and Multimodal Dataset for Events in Invasion Games | Aug 25, 2021 | BenchmarkingVideo Classification | CodeCode Available | 1 |
| Generative Wind Power Curve Modeling Via Machine Vision: A Self-learning Deep Convolutional Network Based Method | Aug 19, 2021 | BenchmarkingSynthetic Data Generation | CodeCode Available | 1 |
| SSH: A Self-Supervised Framework for Image Harmonization | Aug 15, 2021 | BenchmarkingData Augmentation | CodeCode Available | 1 |
| A Dataset for Answering Time-Sensitive Questions | Aug 13, 2021 | Benchmarking | CodeCode Available | 1 |
| Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate | Aug 12, 2021 | Benchmarking | CodeCode Available | 1 |
| A Systematic Benchmarking Analysis of Transfer Learning for Medical Image Analysis | Aug 12, 2021 | BenchmarkingMedical Image Analysis | CodeCode Available | 1 |
| Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach | Aug 5, 2021 | Benchmarking | CodeCode Available | 1 |
| CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms | Aug 2, 2021 | Benchmarkingcounterfactual | CodeCode Available | 1 |
| Quantum machine learning of large datasets using randomized measurements | Aug 2, 2021 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 1 |
| Benchmarking: Past, Present and Future | Aug 1, 2021 | BenchmarkingReading Comprehension | CodeCode Available | 1 |
| Contemporary Symbolic Regression Methods and their Relative Performance | Jul 29, 2021 | Benchmarkingparameter estimation | CodeCode Available | 1 |
| A multi-schematic classifier-independent oversampling approach for imbalanced datasets | Jul 15, 2021 | Benchmarking | CodeCode Available | 1 |
| Hierarchical graph neural nets can capture long-range interactions | Jul 15, 2021 | BenchmarkingMolecular Property Prediction | CodeCode Available | 1 |
| Generative and reproducible benchmarks for comprehensive evaluation of machine learning classifiers | Jul 14, 2021 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 1 |
| MECT: Multi-Metadata Embedding based Cross-Transformer for Chinese Named Entity Recognition | Jul 12, 2021 | BenchmarkingChinese Named Entity Recognition | CodeCode Available | 1 |
| Benchmarking for Biomedical Natural Language Processing Tasks with a Domain Specific ALBERT | Jul 9, 2021 | BenchmarkingDocument Classification | CodeCode Available | 1 |
| Benchpress: A Scalable and Versatile Workflow for Benchmarking Structure Learning Algorithms | Jul 8, 2021 | Benchmarking | CodeCode Available | 1 |
| The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification | Jul 5, 2021 | BenchmarkingBrain Tumor Segmentation | CodeCode Available | 1 |
| Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning | Jul 2, 2021 | BenchmarkingCausal Discovery | CodeCode Available | 1 |
| Benchmarking Knowledge-driven Zero-shot Learning | Jun 29, 2021 | AttributeBenchmarking | CodeCode Available | 1 |
| Kimera-Multi: Robust, Distributed, Dense Metric-Semantic SLAM for Multi-Robot Systems | Jun 28, 2021 | 3D ReconstructionBenchmarking | CodeCode Available | 1 |