| Dyna-bAbI: unlocking bAbI’s potential with dynamic synthetic benchmarking | Jul 1, 2022 | BenchmarkingNatural Language Understanding | —Unverified | 0 |
| HATE-ITA: New Baselines for Hate Speech Detection in Italian | Jul 1, 2022 | BenchmarkingHate Speech Detection | CodeCode Available | 0 |
| Benchmarking Intersectional Biases in NLP | Jul 1, 2022 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| SentSpace: Large-Scale Benchmarking and Evaluation of Text using Cognitively Motivated Lexical, Syntactic, and Semantic Features | Jul 1, 2022 | BenchmarkingSentence | —Unverified | 0 |
| Local manifold learning and its link to domain-based physics knowledge | Jul 1, 2022 | BenchmarkingDimensionality Reduction | CodeCode Available | 0 |
| Analyzing the behaviour of D'WAVE quantum annealer: fine-tuning parameterization and tests with restrictive Hamiltonian formulations | Jul 1, 2022 | BenchmarkingCombinatorial Optimization | —Unverified | 0 |
| Benchmarking Language-agnostic Intent Classification for Virtual Assistant Platforms | Jul 1, 2022 | BenchmarkingClassification | CodeCode Available | 0 |
| Beyond Emotion: A Multi-Modal Dataset for Human Desire Understanding | Jul 1, 2022 | Benchmarking | —Unverified | 0 |
| Computer-aided diagnosis and prediction in brain disorders | Jun 29, 2022 | BenchmarkingDecision Making | —Unverified | 0 |
| An extensible Benchmarking Graph-Mesh dataset for studying Steady-State Incompressible Navier-Stokes Equations | Jun 29, 2022 | Benchmarking | CodeCode Available | 0 |
| Toward an ImageNet Library of Functions for Global Optimization Benchmarking | Jun 27, 2022 | Benchmarkingglobal-optimization | —Unverified | 0 |
| VRKitchen2.0-IndoorKit: A Tutorial for Augmented Indoor Scene Building in Omniverse | Jun 23, 2022 | BenchmarkingIndoor Scene Synthesis | CodeCode Available | 0 |
| Beyond Uniform Lipschitz Condition in Differentially Private Optimization | Jun 21, 2022 | Benchmarkingregression | —Unverified | 0 |
| BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs | Jun 21, 2022 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| ConvGeN: Convex space learning improves deep-generative oversampling for tabular imbalanced classification on smaller datasets | Jun 20, 2022 | BenchmarkingFraud Detection | CodeCode Available | 0 |
| Design of Supervision-Scalable Learning Systems: Methodology and Performance Benchmarking | Jun 18, 2022 | Benchmarkingimage-classification | —Unverified | 0 |
| Motley: Benchmarking Heterogeneity and Personalization in Federated Learning | Jun 18, 2022 | BenchmarkingFairness | CodeCode Available | 0 |
| Colonoscopy 3D Video Dataset with Paired Depth from 2D-3D Registration | Jun 17, 2022 | BenchmarkingDepth Estimation | —Unverified | 0 |
| Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning | Jun 16, 2022 | BenchmarkingClustering | CodeCode Available | 0 |
| Pythae: Unifying Generative Autoencoders in Python -- A Benchmarking Use Case | Jun 16, 2022 | BenchmarkingDensity Estimation | —Unverified | 0 |
| SATBench: Benchmarking the speed-accuracy tradeoff in object recognition by humans and dynamic neural networks | Jun 16, 2022 | BenchmarkingDynamic neural networks | CodeCode Available | 0 |
| Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability | Jun 16, 2022 | BenchmarkingFeature Importance | —Unverified | 0 |
| Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models | Jun 16, 2022 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| BEHAVIOR in Habitat 2.0: Simulator-Independent Logical Task Description for Benchmarking Embodied AI Agents | Jun 13, 2022 | Benchmarking | —Unverified | 0 |
| EmProx: Neural Network Performance Estimation For Neural Architecture Search | Jun 13, 2022 | BenchmarkingDecoder | CodeCode Available | 0 |
| CodeS: Towards Code Model Generalization Under Distribution Shift | Jun 11, 2022 | BenchmarkingCode Classification | CodeCode Available | 0 |
| SAIBench: Benchmarking AI for Science | Jun 11, 2022 | BenchmarkingFriction | —Unverified | 0 |
| Functional Code Building Genetic Programming | Jun 9, 2022 | BenchmarkingProgram Synthesis | —Unverified | 0 |
| FedHPO-B: A Benchmark Suite for Federated Hyperparameter Optimization | Jun 8, 2022 | BenchmarkingFederated Learning | —Unverified | 0 |
| Benchmarking Bayesian neural networks and evaluation metrics for regression tasks | Jun 8, 2022 | BenchmarkingOpen-Ended Question Answering | —Unverified | 0 |
| Scaling laws in global corporations as a benchmarking approach to assess environmental performance | Jun 7, 2022 | BenchmarkingOpen-Ended Question Answering | —Unverified | 0 |
| MorisienMT: A Dataset for Mauritian Creole Machine Translation | Jun 6, 2022 | BenchmarkingMachine Translation | —Unverified | 0 |
| Which models are innately best at uncertainty estimation? | Jun 5, 2022 | BenchmarkingOut-of-Distribution Detection | —Unverified | 0 |
| Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning Rates | Jun 2, 2022 | Benchmarking | CodeCode Available | 0 |
| Evaluation of Three Welsh Language POS Taggers | Jun 1, 2022 | BenchmarkingPOS | —Unverified | 0 |
| Benchmarking Language Models for Cyberbullying Identification and Classification from Social-media Texts | Jun 1, 2022 | BenchmarkingBinary Classification | —Unverified | 0 |
| Deep One-Class Hate Speech Detection Model | Jun 1, 2022 | BenchmarkingBinary Classification | —Unverified | 0 |
| Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof<->French | Jun 1, 2022 | BenchmarkingLow Resource Neural Machine Translation | —Unverified | 0 |
| A Semi-Automated Live Interlingual Communication Workflow Featuring Intralingual Respeaking: Evaluation and Benchmarking | Jun 1, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Introducing RezoJDM16k: a French KnowledgeGraph DataSet for Link Prediction | Jun 1, 2022 | 16kBenchmarking | —Unverified | 0 |
| MTLens: Machine Translation Output Debugging | Jun 1, 2022 | BenchmarkingMachine Translation | —Unverified | 0 |
| Hide and Seek: on the Stealthiness of Attacks against Deep Learning Systems | May 31, 2022 | Benchmarking | —Unverified | 0 |
| NEWTS: A Corpus for News Topic-Focused Summarization | May 31, 2022 | BenchmarkingText Summarization | —Unverified | 0 |
| bsnsing: A decision tree induction method based on recursive optimal boolean rule composition | May 30, 2022 | Benchmarking | CodeCode Available | 0 |
| AI-enabled Sound Pattern Recognition on Asthma Medication Adherence: Evaluation with the RDA Benchmark Suite | May 30, 2022 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| Benchmarking Unsupervised Anomaly Detection and Localization | May 30, 2022 | Anomaly DetectionBenchmarking | —Unverified | 0 |
| A Framework for Generating Informative Benchmark Instances | May 29, 2022 | Benchmarking | CodeCode Available | 0 |
| Bias Reduction via Cooperative Bargaining in Synthetic Graph Dataset Generation | May 27, 2022 | BenchmarkingDataset Generation | CodeCode Available | 0 |
| Benchmarking of Deep Learning models on 2D Laminar Flow behind Cylinder | May 26, 2022 | BenchmarkingDeep Learning | —Unverified | 0 |
| Large Language Models are Few-Shot Clinical Information Extractors | May 25, 2022 | Benchmarkingcoreference-resolution | —Unverified | 0 |