| A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care | Sep 16, 2022 | BenchmarkingDeep Learning | CodeCode Available | 1 |
| ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots | Sep 16, 2022 | BenchmarkingQuestion Answering | CodeCode Available | 1 |
| Benchmarking Multimodal Variational Autoencoders: CdSprites+ Dataset and Toolkit | Sep 7, 2022 | Benchmarking | CodeCode Available | 1 |
| nnOOD: A Framework for Benchmarking Self-supervised Anomaly Localisation Methods | Sep 2, 2022 | Anomaly DetectionBenchmarking | CodeCode Available | 1 |
| Structural Bias for Aspect Sentiment Triplet Extraction | Sep 2, 2022 | Aspect Sentiment Triplet ExtractionBenchmarking | CodeCode Available | 1 |
| Benchmarking Compositionality with Formal Languages | Aug 17, 2022 | BenchmarkingOpen-Ended Question Answering | CodeCode Available | 1 |
| A Multifaceted Benchmarking of Synthetic Electronic Health Record Generation Models | Aug 2, 2022 | BenchmarkingSynthetic Data Generation | CodeCode Available | 1 |
| CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methods | Aug 2, 2022 | BenchmarkingCausal Discovery | CodeCode Available | 1 |
| Accelerated and interpretable oblique random survival forests | Aug 1, 2022 | BenchmarkingComputational Efficiency | CodeCode Available | 1 |
| Tracking Every Thing in the Wild | Jul 26, 2022 | BenchmarkingClassification | CodeCode Available | 1 |
| ArtFID: Quantitative Evaluation of Neural Style Transfer | Jul 25, 2022 | BenchmarkingMeta-Learning | CodeCode Available | 1 |
| Physiology-based simulation of the retinal vasculature enables annotation-free segmentation of OCT angiographs | Jul 22, 2022 | BenchmarkingRetinal Vessel Segmentation | CodeCode Available | 1 |
| ALTO: A Large-Scale Dataset for UAV Visual Place Recognition and Localization | Jul 19, 2022 | BenchmarkingImage Registration | CodeCode Available | 1 |
| Detecting beats in the photoplethysmogram: benchmarking open-source algorithms | Jul 19, 2022 | BenchmarkingPhotoplethysmography (PPG) beat detection | CodeCode Available | 1 |
| Initial recommendations for performing, benchmarking, and reporting single-cell proteomics experiments | Jul 19, 2022 | BenchmarkingExperimental Design | CodeCode Available | 1 |
| Benchmarking Omni-Vision Representation through the Lens of Visual Realms | Jul 14, 2022 | BenchmarkingContrastive Learning | CodeCode Available | 1 |
| TASKOGRAPHY: Evaluating robot task planning over large 3D scene graphs | Jul 11, 2022 | BenchmarkingRepresentation Learning | CodeCode Available | 1 |
| Graph Generative Model for Benchmarking Graph Neural Networks | Jul 10, 2022 | BenchmarkingGraph Generation | CodeCode Available | 1 |
| Can Language Models Make Fun? A Case Study in Chinese Comical Crosstalk | Jul 2, 2022 | BenchmarkingMachine Translation | CodeCode Available | 1 |
| Less Is More: A Comparison of Active Learning Strategies for 3D Medical Image Segmentation | Jul 2, 2022 | Active LearningBenchmarking | CodeCode Available | 1 |
| DFGC 2022: The Second DeepFake Game Competition | Jun 30, 2022 | BenchmarkingFace Swapping | CodeCode Available | 1 |
| Benchmarking the Robustness of Deep Neural Networks to Common Corruptions in Digital Pathology | Jun 30, 2022 | BenchmarkingDiagnostic | CodeCode Available | 1 |
| Beyond neural scaling laws: beating power law scaling via data pruning | Jun 29, 2022 | Benchmarking | CodeCode Available | 1 |
| Summarizing Videos using Concentrated Attention and Considering the Uniqueness and Diversity of the Video Frames | Jun 29, 2022 | BenchmarkingDiversity | CodeCode Available | 1 |
| The DEBS 2022 Grand Challenge: Detecting Trading Trends in Financial Tick Data | Jun 23, 2022 | Benchmarking | CodeCode Available | 1 |
| GEMv2: Multilingual NLG Benchmarking in a Single Line of Code | Jun 22, 2022 | BenchmarkingText Generation | CodeCode Available | 1 |
| OpenXAI: Towards a Transparent Evaluation of Model Explanations | Jun 22, 2022 | BenchmarkingExplainable Artificial Intelligence (XAI) | CodeCode Available | 1 |
| Benchmarking Constraint Inference in Inverse Reinforcement Learning | Jun 20, 2022 | Autonomous DrivingBenchmarking | CodeCode Available | 1 |
| What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs | Jun 19, 2022 | BenchmarkingImage Captioning | CodeCode Available | 1 |
| NAS-Bench-Graph: Benchmarking Graph Neural Architecture Search | Jun 18, 2022 | BenchmarkingGraph Neural Network | CodeCode Available | 1 |
| SMPL: Simulated Industrial Manufacturing and Process Control Learning Environments | Jun 17, 2022 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 1 |
| Long Range Graph Benchmark | Jun 16, 2022 | BenchmarkingGraph Classification | CodeCode Available | 1 |
| Taxonomy of Benchmarks in Graph Representation Learning | Jun 15, 2022 | BenchmarkingGraph Representation Learning | CodeCode Available | 1 |
| Evaluating histopathology transfer learning with ChampKit | Jun 14, 2022 | BenchmarkingCell Detection | CodeCode Available | 1 |
| ISLES 2022: A multi-center magnetic resonance imaging stroke lesion segmentation dataset | Jun 14, 2022 | BenchmarkingIschemic Stroke Lesion Segmentation | CodeCode Available | 1 |
| Data-Driven Denoising of Stationary Accelerometer Signals | Jun 13, 2022 | BenchmarkingDenoising | CodeCode Available | 1 |
| SwinCheX: Multi-label classification on chest X-ray images with transformers | Jun 9, 2022 | BenchmarkingMulti-Label Classification | CodeCode Available | 1 |
| Do We Need Another Explainable AI Method? Toward Unifying Post-hoc XAI Evaluation Methods into an Interactive and Multi-dimensional Benchmark | Jun 8, 2022 | BenchmarkingExplainable Artificial Intelligence (XAI) | CodeCode Available | 1 |
| Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clustering | Jun 6, 2022 | BenchmarkingClustering | CodeCode Available | 1 |
| Revisiting the "Video" in Video-Language Understanding | Jun 3, 2022 | BenchmarkingQuestion Answering | CodeCode Available | 1 |
| Needle In A Haystack, Fast: Benchmarking Image Perceptual Similarity Metrics At Scale | Jun 1, 2022 | Benchmarking | CodeCode Available | 1 |
| Jojajovai: A Parallel Guarani-Spanish Corpus for MT Benchmarking | Jun 1, 2022 | BenchmarkingSentence | CodeCode Available | 1 |
| A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog Domain | Jun 1, 2022 | BenchmarkingEmotion Recognition | CodeCode Available | 1 |
| Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection | May 30, 2022 | 3D Object DetectionAutonomous Driving | CodeCode Available | 1 |
| Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions | May 27, 2022 | BenchmarkingFew-Shot Image Classification | CodeCode Available | 1 |
| Failure Detection in Medical Image Classification: A Reality Check and Benchmarking Testbed | May 27, 2022 | BenchmarkingBinary Classification | CodeCode Available | 1 |
| MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection for Domain Generalization Task | May 27, 2022 | BenchmarkingDomain Generalization | CodeCode Available | 1 |
| GENEVA: Benchmarking Generalizability for Event Argument Extraction with Hundreds of Event Types and Argument Roles | May 25, 2022 | BenchmarkingEvent Argument Extraction | CodeCode Available | 1 |
| Optimizing Performance of Federated Person Re-identification: Benchmarking and Analysis | May 24, 2022 | BenchmarkingFederated Learning | CodeCode Available | 1 |
| PyRelationAL: a python library for active learning research and development | May 23, 2022 | Active LearningBenchmarking | CodeCode Available | 1 |