| Datasets and Benchmarks for Offline Safe Reinforcement Learning | Jun 15, 2023 | Autonomous DrivingBenchmarking | CodeCode Available | 2 |
| MUBen: Benchmarking the Uncertainty of Molecular Representation Models | Jun 14, 2023 | BenchmarkingDrug Discovery | CodeCode Available | 0 |
| RRSIS: Referring Remote Sensing Image Segmentation | Jun 14, 2023 | BenchmarkingImage Segmentation | —Unverified | 0 |
| A Cloud-based Machine Learning Pipeline for the Efficient Extraction of Insights from Customer Reviews | Jun 13, 2023 | BenchmarkingKeyword Extraction | —Unverified | 0 |
| detrex: Benchmarking Detection Transformers | Jun 12, 2023 | Benchmarkingobject-detection | —Unverified | 0 |
| Benchmarking Neural Network Training Algorithms | Jun 12, 2023 | Benchmarking | CodeCode Available | 4 |
| Contribution à l'Optimisation d'un Comportement Collectif pour un Groupe de Robots Autonomes | Jun 10, 2023 | BenchmarkingDiversity | —Unverified | 0 |
| Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine Perception | Jun 10, 2023 | 3D Object DetectionBenchmarking | CodeCode Available | 2 |
| NeuroGraph: Benchmarks for Graph Machine Learning in Brain Connectomics | Jun 9, 2023 | BenchmarkingDataset Generation | CodeCode Available | 1 |
| Share, Collaborate, Benchmark: Advancing Travel Demand Research through rigorous open-source collaboration | Jun 9, 2023 | BenchmarkingTime Series | —Unverified | 0 |
| A Large-Scale Analysis on Self-Supervised Video Representation Learning | Jun 9, 2023 | BenchmarkingRepresentation Learning | —Unverified | 0 |
| DynamoRep: Trajectory-Based Population Dynamics for Classification of Black-box Optimization Problems | Jun 8, 2023 | BenchmarkingDescriptive | CodeCode Available | 0 |
| FedSecurity: Benchmarking Attacks and Defenses in Federated Learning and Federated LLMs | Jun 8, 2023 | BenchmarkingFederated Learning | CodeCode Available | 0 |
| Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML | Jun 8, 2023 | BenchmarkingKidney Function | CodeCode Available | 1 |
| DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language Models | Jun 8, 2023 | BenchmarkingFairness | CodeCode Available | 0 |
| FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems | Jun 8, 2023 | BenchmarkingEdge-computing | —Unverified | 0 |
| Reference Matters: Benchmarking Factual Error Correction for Dialogue Summarization with Fine-grained Evaluation Framework | Jun 8, 2023 | Benchmarking | CodeCode Available | 0 |
| On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic Writing | Jun 7, 2023 | BenchmarkingPrompt Engineering | CodeCode Available | 1 |
| Improved statistical benchmarking of digital pathology models using pairwise frames evaluation | Jun 7, 2023 | BenchmarkingClassification | —Unverified | 0 |
| RD-Suite: A Benchmark for Ranking Distillation | Jun 7, 2023 | Benchmarking | —Unverified | 0 |
| Knowing-how & Knowing-that: A New Task for Machine Comprehension of User Manuals | Jun 7, 2023 | BenchmarkingMachine Reading Comprehension | CodeCode Available | 0 |
| Benchmarking Foundation Models with Language-Model-as-an-Examiner | Jun 7, 2023 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Self-Adjusting Weighted Expected Improvement for Bayesian Optimization | Jun 7, 2023 | Bayesian OptimizationBenchmarking | CodeCode Available | 0 |
| ICON^2: Reliably Benchmarking Predictive Inequity in Object Detection | Jun 7, 2023 | AttributeAutonomous Driving | —Unverified | 0 |
| Benchmarking Robustness of AI-Enabled Multi-sensor Fusion Systems: Challenges and Opportunities | Jun 6, 2023 | BenchmarkingDepth Completion | —Unverified | 0 |
| Explainable AI using expressive Boolean formulas | Jun 6, 2023 | BenchmarkingExplainable Artificial Intelligence (XAI) | —Unverified | 0 |
| Applying Standards to Advance Upstream & Downstream Ethics in Large Language Models | Jun 6, 2023 | BenchmarkingEthics | —Unverified | 0 |
| Financial Numeric Extreme Labelling: A Dataset and Benchmarking for XBRL Tagging | Jun 6, 2023 | BenchmarkingSentence | —Unverified | 0 |
| LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning | Jun 5, 2023 | Benchmarking | CodeCode Available | 3 |
| Str2Str: A Score-based Framework for Zero-shot Protein Conformation Sampling | Jun 5, 2023 | BenchmarkingDenoising | CodeCode Available | 1 |
| N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition | Jun 5, 2023 | Arabic Speech RecognitionBenchmarking | —Unverified | 0 |
| Benchmarking Middle-Trained Language Models for Neural Search | Jun 5, 2023 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset | Jun 5, 2023 | BenchmarkingMultiple-choice | CodeCode Available | 1 |
| LibAUC: A Deep Learning Library for X-Risk Optimization | Jun 5, 2023 | BenchmarkingClassification | CodeCode Available | 2 |
| RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems | Jun 5, 2023 | BenchmarkingC++ code | CodeCode Available | 1 |
| EfficientSRFace: An Efficient Network with Super-Resolution Enhancement for Accurate Face Detection | Jun 4, 2023 | BenchmarkingFace Detection | —Unverified | 0 |
| MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning | Jun 4, 2023 | BenchmarkingContrastive Learning | —Unverified | 0 |
| TransDocAnalyser: A Framework for Offline Semi-structured Handwritten Document Analysis in the Legal Domain | Jun 3, 2023 | BenchmarkingDecoder | CodeCode Available | 1 |
| Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models | Jun 3, 2023 | Benchmarking | —Unverified | 0 |
| ACI-BENCH: a Novel Ambient Clinical Intelligence Dataset for Benchmarking Automatic Visit Note Generation | Jun 3, 2023 | Benchmarking | —Unverified | 0 |
| Multilingual Conceptual Coverage in Text-to-Image Models | Jun 2, 2023 | Benchmarking | CodeCode Available | 1 |
| BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models | Jun 2, 2023 | BenchmarkingLanguage Acquisition | CodeCode Available | 1 |
| Spatially Resolved Gene Expression Prediction from H&E Histology Images via Bi-modal Contrastive Learning | Jun 2, 2023 | BenchmarkingContrastive Learning | CodeCode Available | 1 |
| Break a Lag: Triple Exponential Moving Average for Enhanced Optimization | Jun 2, 2023 | Benchmarkingimage-classification | —Unverified | 0 |
| Hybrid Long Document Summarization using C2F-FAR and ChatGPT: A Practical Study | Jun 1, 2023 | ArticlesBenchmarking | —Unverified | 0 |
| The Brain Tumor Segmentation (BraTS-METS) Challenge 2023: Brain Metastasis Segmentation on Pre-treatment MRI | Jun 1, 2023 | BenchmarkingBrain Tumor Segmentation | —Unverified | 0 |
| Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment | Jun 1, 2023 | BenchmarkingHate Speech Detection | CodeCode Available | 0 |
| End-to-end Knowledge Retrieval with Multi-modal Queries | Jun 1, 2023 | BenchmarkingCross-Modal Retrieval | CodeCode Available | 1 |
| Speech Self-Supervised Representation Benchmarking: Are We Doing it Right? | Jun 1, 2023 | BenchmarkingDecoder | CodeCode Available | 0 |
| Improving and Benchmarking Offline Reinforcement Learning Algorithms | Jun 1, 2023 | AttributeBenchmarking | CodeCode Available | 1 |