| UDTIRI: An Online Open-Source Intelligent Road Inspection Benchmark Suite | Apr 18, 2023 | BenchmarkingInstance Segmentation | —Unverified | 0 |
| OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images | Apr 17, 2023 | 3D Pose EstimationBenchmarking | —Unverified | 0 |
| Towards Computational Performance Engineering for Unsupervised Concept Drift Detection -- Complexities, Benchmarking, Performance Analysis | Apr 17, 2023 | BenchmarkingDrift Detection | CodeCode Available | 0 |
| Dialogue Games for Benchmarking Language Understanding: Motivation, Taxonomy, Strategy | Apr 14, 2023 | Benchmarking | —Unverified | 0 |
| Improving Items and Contexts Understanding with Descriptive Graph for Conversational Recommendation | Apr 11, 2023 | BenchmarkingConversational Recommendation | —Unverified | 0 |
| Benchmarking the Physical-world Adversarial Robustness of Vehicle Detection | Apr 11, 2023 | Adversarial AttackAdversarial Robustness | —Unverified | 0 |
| OpenAGI: When LLM Meets Domain Experts | Apr 10, 2023 | BenchmarkingNatural Language Queries | CodeCode Available | 4 |
| NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems | Apr 10, 2023 | Benchmarking | CodeCode Available | 1 |
| Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence | Apr 10, 2023 | Benchmarkingspeech-recognition | CodeCode Available | 0 |
| On Evaluation of Bangla Word Analogies | Apr 10, 2023 | BenchmarkingWord Embeddings | —Unverified | 0 |
| ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit | Apr 10, 2023 | BenchmarkingSimultaneous Speech-to-Text Translation | —Unverified | 0 |
| RoboPianist: Dexterous Piano Playing with Deep Reinforcement Learning | Apr 9, 2023 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 2 |
| ForamViT-GAN: Exploring New Paradigms in Deep Learning for Micropaleontological Image Analysis | Apr 9, 2023 | BenchmarkingDeep Learning | —Unverified | 0 |
| Benchmarking the Robustness of Quantized Models | Apr 8, 2023 | BenchmarkingQuantization | —Unverified | 0 |
| SimbaML: Connecting Mechanistic Models and Machine Learning with Augmented Data | Apr 8, 2023 | BenchmarkingData Augmentation | CodeCode Available | 0 |
| Probing Conceptual Understanding of Large Visual-Language Models | Apr 7, 2023 | Benchmarking | CodeCode Available | 0 |
| Interpretable statistical representations of neural population dynamics and geometry | Apr 6, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 |
| Benchmarking Robustness to Text-Guided Corruptions | Apr 6, 2023 | BenchmarkingData Augmentation | CodeCode Available | 0 |
| DRAC: Diabetic Retinopathy Analysis Challenge with Ultra-Wide Optical Coherence Tomography Angiography Images | Apr 5, 2023 | BenchmarkingData Augmentation | —Unverified | 0 |
| MMVC: Learned Multi-Mode Video Compression with Block-based Prediction Mode Selection and Density-Adaptive Entropy Coding | Apr 5, 2023 | BenchmarkingMS-SSIM | CodeCode Available | 1 |
| LogoNet: a fine-grained network for instance-level logo sketch retrieval | Apr 5, 2023 | 2kBenchmarking | CodeCode Available | 0 |
| IHCV: Discovery of Hidden Time-Dependent Control Variables in Non-Linear Dynamical Systems | Apr 5, 2023 | Benchmarking | CodeCode Available | 0 |
| The Saudi Privacy Policy Dataset | Apr 5, 2023 | Benchmarking | CodeCode Available | 0 |
| OpenContrails: Benchmarking Contrail Detection on GOES-16 ABI | Apr 4, 2023 | Benchmarking | —Unverified | 0 |
| SLPerf: a Unified Framework for Benchmarking Split Learning | Apr 4, 2023 | BenchmarkingDiversity | CodeCode Available | 1 |
| Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam Detection | Apr 3, 2023 | BenchmarkingSentence | CodeCode Available | 1 |
| ScandEval: A Benchmark for Scandinavian Natural Language Processing | Apr 3, 2023 | BenchmarkingCross-Lingual Transfer | CodeCode Available | 1 |
| Vision-Language Models for Vision Tasks: A Survey | Apr 3, 2023 | BenchmarkingKnowledge Distillation | CodeCode Available | 4 |
| A Latent Fingerprint in the Wild Database | Apr 3, 2023 | Benchmarking | —Unverified | 0 |
| ENRICH: Multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetry | Apr 1, 2023 | 3D Reconstruction3D Scene Reconstruction | CodeCode Available | 1 |
| A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive Noise Models | Mar 31, 2023 | BenchmarkingCausal Discovery | CodeCode Available | 1 |
| What Makes for Effective Few-shot Point Cloud Classification? | Mar 31, 2023 | BenchmarkingClassification | CodeCode Available | 1 |
| LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision Transformers | Mar 31, 2023 | Benchmarkingimage-classification | CodeCode Available | 0 |
| Benchmarking FedAvg and FedCurv for Image Classification Tasks | Mar 31, 2023 | BenchmarkingClassification | —Unverified | 0 |
| Why is the winner the best? | Mar 30, 2023 | BenchmarkingMulti-Task Learning | —Unverified | 0 |
| Prediction of cancer driver genes and mutations: the potential of integrative computational frameworks | Mar 30, 2023 | Benchmarking | —Unverified | 0 |
| ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing | Mar 30, 2023 | AttributeBenchmarking | CodeCode Available | 1 |
| CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X | Mar 30, 2023 | BenchmarkingCode Generation | CodeCode Available | 5 |
| From Private to Public: Benchmarking GANs in the Context of Private Time Series Classification | Mar 28, 2023 | BenchmarkingPrivacy Preserving | —Unverified | 0 |
| Open the box of digital neuromorphic processor: Towards effective algorithm-hardware co-design | Mar 27, 2023 | BenchmarkingEdge-computing | —Unverified | 0 |
| Hyperparameter optimization, quantum-assisted model performance prediction, and benchmarking of AI-based High Energy Physics workloads using HPC | Mar 27, 2023 | BenchmarkingHyperparameter Optimization | —Unverified | 0 |
| GeoNet: Benchmarking Unsupervised Adaptation across Geographies | Mar 27, 2023 | BenchmarkingDomain Adaptation | —Unverified | 0 |
| Exploring Continual Learning of Diffusion Models | Mar 27, 2023 | BenchmarkingContinual Learning | —Unverified | 0 |
| MGTBench: Benchmarking Machine-Generated Text Detection | Mar 26, 2023 | BenchmarkingQuestion Answering | CodeCode Available | 1 |
| Balancing policy constraint and ensemble size in uncertainty-based offline reinforcement learning | Mar 26, 2023 | Behavioural cloningBenchmarking | CodeCode Available | 0 |
| Benchmarking the Impact of Noise on Deep Learning-based Classification of Atrial Fibrillation in 12-Lead ECG | Mar 24, 2023 | Atrial Fibrillation DetectionBenchmarking | —Unverified | 0 |
| Vulnerability of Face Morphing Attacks: A Case Study on Lookalike and Identical Twins | Mar 24, 2023 | BenchmarkingFace Recognition | —Unverified | 0 |
| Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance | Mar 23, 2023 | BenchmarkingData Augmentation | —Unverified | 0 |
| MEGA: Multilingual Evaluation of Generative AI | Mar 22, 2023 | Benchmarking | CodeCode Available | 1 |
| Automated deep learning segmentation of high-resolution 7 T postmortem MRI for quantitative analysis of structure-pathology correlations in neurodegenerative diseases | Mar 21, 2023 | AnatomyBenchmarking | CodeCode Available | 0 |