| Share, Collaborate, Benchmark: Advancing Travel Demand Research through rigorous open-source collaboration | Jun 9, 2023 | BenchmarkingTime Series | —Unverified | 0 |
| Reference Matters: Benchmarking Factual Error Correction for Dialogue Summarization with Fine-grained Evaluation Framework | Jun 8, 2023 | Benchmarking | CodeCode Available | 0 |
| FedSecurity: Benchmarking Attacks and Defenses in Federated Learning and Federated LLMs | Jun 8, 2023 | BenchmarkingFederated Learning | CodeCode Available | 0 |
| DynamoRep: Trajectory-Based Population Dynamics for Classification of Black-box Optimization Problems | Jun 8, 2023 | BenchmarkingDescriptive | CodeCode Available | 0 |
| FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems | Jun 8, 2023 | BenchmarkingEdge-computing | —Unverified | 0 |
| DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language Models | Jun 8, 2023 | BenchmarkingFairness | CodeCode Available | 0 |
| RD-Suite: A Benchmark for Ranking Distillation | Jun 7, 2023 | Benchmarking | —Unverified | 0 |
| Self-Adjusting Weighted Expected Improvement for Bayesian Optimization | Jun 7, 2023 | Bayesian OptimizationBenchmarking | CodeCode Available | 0 |
| Benchmarking Foundation Models with Language-Model-as-an-Examiner | Jun 7, 2023 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| ICON^2: Reliably Benchmarking Predictive Inequity in Object Detection | Jun 7, 2023 | AttributeAutonomous Driving | —Unverified | 0 |
| Knowing-how & Knowing-that: A New Task for Machine Comprehension of User Manuals | Jun 7, 2023 | BenchmarkingMachine Reading Comprehension | CodeCode Available | 0 |
| Improved statistical benchmarking of digital pathology models using pairwise frames evaluation | Jun 7, 2023 | BenchmarkingClassification | —Unverified | 0 |
| Benchmarking Robustness of AI-Enabled Multi-sensor Fusion Systems: Challenges and Opportunities | Jun 6, 2023 | BenchmarkingDepth Completion | —Unverified | 0 |
| Applying Standards to Advance Upstream & Downstream Ethics in Large Language Models | Jun 6, 2023 | BenchmarkingEthics | —Unverified | 0 |
| Explainable AI using expressive Boolean formulas | Jun 6, 2023 | BenchmarkingExplainable Artificial Intelligence (XAI) | —Unverified | 0 |
| Financial Numeric Extreme Labelling: A Dataset and Benchmarking for XBRL Tagging | Jun 6, 2023 | BenchmarkingSentence | —Unverified | 0 |
| Benchmarking Middle-Trained Language Models for Neural Search | Jun 5, 2023 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition | Jun 5, 2023 | Arabic Speech RecognitionBenchmarking | —Unverified | 0 |
| MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning | Jun 4, 2023 | BenchmarkingContrastive Learning | —Unverified | 0 |
| EfficientSRFace: An Efficient Network with Super-Resolution Enhancement for Accurate Face Detection | Jun 4, 2023 | BenchmarkingFace Detection | —Unverified | 0 |
| Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models | Jun 3, 2023 | Benchmarking | —Unverified | 0 |
| ACI-BENCH: a Novel Ambient Clinical Intelligence Dataset for Benchmarking Automatic Visit Note Generation | Jun 3, 2023 | Benchmarking | —Unverified | 0 |
| Break a Lag: Triple Exponential Moving Average for Enhanced Optimization | Jun 2, 2023 | Benchmarkingimage-classification | —Unverified | 0 |
| Hybrid Long Document Summarization using C2F-FAR and ChatGPT: A Practical Study | Jun 1, 2023 | ArticlesBenchmarking | —Unverified | 0 |
| The Brain Tumor Segmentation (BraTS-METS) Challenge 2023: Brain Metastasis Segmentation on Pre-treatment MRI | Jun 1, 2023 | BenchmarkingBrain Tumor Segmentation | —Unverified | 0 |
| Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment | Jun 1, 2023 | BenchmarkingHate Speech Detection | CodeCode Available | 0 |
| Speech Self-Supervised Representation Benchmarking: Are We Doing it Right? | Jun 1, 2023 | BenchmarkingDecoder | CodeCode Available | 0 |
| HySpecNet-11k: A Large-Scale Hyperspectral Dataset for Benchmarking Learning-Based Hyperspectral Image Compression Methods | Jun 1, 2023 | BenchmarkingHyperspectral image analysis | —Unverified | 0 |
| The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects | Jun 1, 2023 | BenchmarkingObject | —Unverified | 0 |
| Dynamic Neighborhood Construction for Structured Large Discrete Action Spaces | May 31, 2023 | BenchmarkingRecommendation Systems | CodeCode Available | 0 |
| ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning | May 30, 2023 | BenchmarkingIn-Context Learning | CodeCode Available | 0 |
| ShuffleMix: Improving Representations via Channel-Wise Shuffle of Interpolated Hidden States | May 30, 2023 | BenchmarkingData Augmentation | CodeCode Available | 0 |
| Design and implementation of intelligent packet filtering in IoT microcontroller-based devices | May 30, 2023 | Benchmarking | CodeCode Available | 0 |
| Large-scale Ridesharing DARP Instances Based on Real Travel Demand | May 30, 2023 | Benchmarking | CodeCode Available | 0 |
| Human Body Shape Classification Based on a Single Image | May 29, 2023 | BenchmarkingClassification | —Unverified | 0 |
| InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual Illusion | May 28, 2023 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Exploring the Practicality of Generative Retrieval on Dynamic Corpora | May 27, 2023 | BenchmarkingInformation Retrieval | —Unverified | 0 |
| BASED: Benchmarking, Analysis, and Structural Estimation of Deblurring | May 27, 2023 | BenchmarkingDeblurring | CodeCode Available | 0 |
| Benchmarking Diverse-Modal Entity Linking with Generative Models | May 27, 2023 | BenchmarkingDecoder | —Unverified | 0 |
| Learning from Integral Losses in Physics Informed Neural Networks | May 27, 2023 | Benchmarking | CodeCode Available | 0 |
| Benchmarking state-of-the-art gradient boosting algorithms for classification | May 26, 2023 | Bayesian OptimizationBenchmarking | —Unverified | 0 |
| CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset | May 25, 2023 | BenchmarkingText to SQL | CodeCode Available | 0 |
| Investigation of UAV Detection in Images with Complex Backgrounds and Rainy Artifacts | May 25, 2023 | Benchmarkingobject-detection | CodeCode Available | 0 |
| Analysis of modular CMA-ES on strict box-constrained problems in the SBOX-COST benchmarking suite | May 24, 2023 | Benchmarking | —Unverified | 0 |
| GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking | May 24, 2023 | BenchmarkingGraph Mining | CodeCode Available | 0 |
| BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer | May 24, 2023 | BenchmarkingCross-Lingual Transfer | —Unverified | 0 |
| LAraBench: Benchmarking Arabic AI with Large Language Models | May 24, 2023 | BenchmarkingFew-Shot Learning | —Unverified | 0 |
| Barkour: Benchmarking Animal-level Agility with Quadruped Robots | May 24, 2023 | BenchmarkingNavigate | —Unverified | 0 |
| R2H: Building Multimodal Navigation Helpers that Respond to Help Requests | May 23, 2023 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| When the Music Stops: Tip-of-the-Tongue Retrieval for Music | May 23, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |