| PhilHumans: Benchmarking Machine Learning for Personal Health | May 4, 2024 | Action AnticipationBenchmarking | —Unverified | 0 |
| A Normative Framework for Benchmarking Consumer Fairness in Large Language Model Recommender System | May 3, 2024 | BenchmarkingCollaborative Filtering | —Unverified | 0 |
| Single and Multi-Hop Question-Answering Datasets for Reticular Chemistry with GPT-4-Turbo | May 3, 2024 | BenchmarkingMulti-hop Question Answering | CodeCode Available | 0 |
| Toward end-to-end interpretable convolutional neural networks for waveform signals | May 3, 2024 | BenchmarkingEmotion Recognition | —Unverified | 0 |
| CityLearn v2: Energy-flexible, resilient, occupant-centric, and carbon-aware management of grid-interactive communities | May 2, 2024 | BenchmarkingManagement | —Unverified | 0 |
| Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods | May 2, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking Representations for Speech, Music, and Acoustic Events | May 2, 2024 | Audio ClassificationBenchmarking | CodeCode Available | 2 |
| The Role of Model Architecture and Scale in Predicting Molecular Properties: Insights from Fine-Tuning RoBERTa, BART, and LLaMA | May 2, 2024 | BenchmarkingDrug Discovery | CodeCode Available | 0 |
| A Hong Kong Sign Language Corpus Collected from Sign-interpreted TV News | May 2, 2024 | BenchmarkingSign Language Recognition | —Unverified | 0 |
| HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond | May 1, 2024 | BenchmarkingHigh-Level Synthesis | CodeCode Available | 2 |
| ATOMMIC: An Advanced Toolbox for Multitask Medical Imaging Consistency to facilitate Artificial Intelligence applications from acquisition to analysis in Magnetic Resonance Imaging | Apr 30, 2024 | BenchmarkingImage Reconstruction | CodeCode Available | 1 |
| Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting | Apr 30, 2024 | BenchmarkingDepth Completion | —Unverified | 0 |
| Evaluating Deep Clustering Algorithms on Non-Categorical 3D CAD Models | Apr 29, 2024 | BenchmarkingClustering | —Unverified | 0 |
| Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methods | Apr 29, 2024 | BenchmarkingDrug Discovery | CodeCode Available | 0 |
| MileBench: Benchmarking MLLMs in Long Context | Apr 29, 2024 | BenchmarkingDiagnostic | —Unverified | 0 |
| SIDBench: A Python Framework for Reliably Assessing Synthetic Image Detection Methods | Apr 29, 2024 | BenchmarkingImage Generation | CodeCode Available | 2 |
| On the Impact of Data Heterogeneity in Federated Learning Environments with Application to Healthcare Networks | Apr 29, 2024 | BenchmarkingFederated Learning | —Unverified | 0 |
| Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations? | Apr 29, 2024 | Answer GenerationBenchmarking | CodeCode Available | 1 |
| Detecting critical treatment effect bias in small subgroups | Apr 29, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Benchmarking Benchmark Leakage in Large Language Models | Apr 29, 2024 | BenchmarkingMathematical Reasoning | CodeCode Available | 2 |
| 4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs | Apr 28, 2024 | Benchmarking | CodeCode Available | 1 |
| Multi-Stream Cellular Test-Time Adaptation of Real-Time Models Evolving in Dynamic Environments | Apr 27, 2024 | Autonomous VehiclesBenchmarking | CodeCode Available | 1 |
| Efficient Exploration of Image Classifier Failures with Bayesian Optimization and Text-to-Image Models | Apr 26, 2024 | AttributeBayesian Optimization | —Unverified | 0 |
| Stochastic Spiking Neural Networks with First-to-Spike Coding | Apr 26, 2024 | Benchmarking | —Unverified | 0 |
| CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching | Apr 25, 2024 | BenchmarkingData Augmentation | CodeCode Available | 0 |