| Comparative analysis of neural network architectures for short-term FOREX forecasting | May 13, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness | May 13, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity Recognition | May 13, 2024 | Benchmarkingnamed-entity-recognition | CodeCode Available | 0 |
| oTTC: Object Time-to-Contact for Motion Estimation in Autonomous Driving | May 13, 2024 | AttributeAutonomous Driving | —Unverified | 0 |
| Benchmarking Cross-Domain Audio-Visual Deception Detection | May 11, 2024 | BenchmarkingDeception Detection | —Unverified | 0 |
| Replication Study and Benchmarking of Real-Time Object Detection Models | May 11, 2024 | Benchmarkingobject-detection | CodeCode Available | 0 |
| Benchmarking Classical and Learning-Based Multibeam Point Cloud Registration | May 10, 2024 | BenchmarkingPoint Cloud Registration | CodeCode Available | 1 |
| Automating Code Adaptation for MLOps -- A Benchmarking Study on LLMs | May 10, 2024 | BenchmarkingHyperparameter Optimization | —Unverified | 0 |
| Are EEG-to-Text Models Working? | May 10, 2024 | BenchmarkingEEG | CodeCode Available | 3 |
| LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit | May 9, 2024 | BenchmarkingComputational Efficiency | CodeCode Available | 4 |
| Agent-oriented Joint Decision Support for Data Owners in Auction-based Federated Learning | May 9, 2024 | BenchmarkingFederated Learning | —Unverified | 0 |
| Aequitas Flow: Streamlining Fair ML Experimentation | May 9, 2024 | BenchmarkingFairness | CodeCode Available | 4 |
| OpenFactCheck: Building, Benchmarking Customized Fact-Checking Systems and Evaluating the Factuality of Claims and LLMs | May 9, 2024 | BenchmarkingFact Checking | CodeCode Available | 2 |
| Benchmarking Educational Program Repair | May 8, 2024 | BenchmarkingProgram Repair | CodeCode Available | 0 |
| Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking | May 7, 2024 | BenchmarkingModel Selection | —Unverified | 0 |
| AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets | May 7, 2024 | BenchmarkingCancer Classification | CodeCode Available | 1 |
| Refining Joint Text and Source Code Embeddings for Retrieval Task with Parameter-Efficient Fine-Tuning | May 7, 2024 | BenchmarkingContrastive Learning | CodeCode Available | 0 |
| ACEGEN: Reinforcement learning of generative chemical agents for drug discovery | May 7, 2024 | BenchmarkingDecision Making | CodeCode Available | 3 |
| UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images | May 6, 2024 | Benchmarking | —Unverified | 0 |
| ATG: Benchmarking Automated Theorem Generation for Generative Language Models | May 5, 2024 | Automated Theorem ProvingBenchmarking | —Unverified | 0 |
| Performance Evaluation of Real-Time Object Detection for Electric Scooters | May 5, 2024 | Autonomous VehiclesBenchmarking | CodeCode Available | 0 |
| Revisiting a Pain in the Neck: Semantic Phrase Processing Benchmark for Language Models | May 5, 2024 | Benchmarking | CodeCode Available | 0 |
| iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval | May 5, 2024 | BenchmarkingComposed Image Retrieval (CoIR) | CodeCode Available | 2 |
| Systematic Review: Anomaly Detection in Connected and Autonomous Vehicles | May 4, 2024 | Anomaly DetectionArticles | —Unverified | 0 |
| Position: Quo Vadis, Unsupervised Time Series Anomaly Detection? | May 4, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 1 |
| PhilHumans: Benchmarking Machine Learning for Personal Health | May 4, 2024 | Action AnticipationBenchmarking | —Unverified | 0 |
| A Normative Framework for Benchmarking Consumer Fairness in Large Language Model Recommender System | May 3, 2024 | BenchmarkingCollaborative Filtering | —Unverified | 0 |
| Single and Multi-Hop Question-Answering Datasets for Reticular Chemistry with GPT-4-Turbo | May 3, 2024 | BenchmarkingMulti-hop Question Answering | CodeCode Available | 0 |
| Toward end-to-end interpretable convolutional neural networks for waveform signals | May 3, 2024 | BenchmarkingEmotion Recognition | —Unverified | 0 |
| CityLearn v2: Energy-flexible, resilient, occupant-centric, and carbon-aware management of grid-interactive communities | May 2, 2024 | BenchmarkingManagement | —Unverified | 0 |
| Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods | May 2, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking Representations for Speech, Music, and Acoustic Events | May 2, 2024 | Audio ClassificationBenchmarking | CodeCode Available | 2 |
| The Role of Model Architecture and Scale in Predicting Molecular Properties: Insights from Fine-Tuning RoBERTa, BART, and LLaMA | May 2, 2024 | BenchmarkingDrug Discovery | CodeCode Available | 0 |
| A Hong Kong Sign Language Corpus Collected from Sign-interpreted TV News | May 2, 2024 | BenchmarkingSign Language Recognition | —Unverified | 0 |
| HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond | May 1, 2024 | BenchmarkingHigh-Level Synthesis | CodeCode Available | 2 |
| ATOMMIC: An Advanced Toolbox for Multitask Medical Imaging Consistency to facilitate Artificial Intelligence applications from acquisition to analysis in Magnetic Resonance Imaging | Apr 30, 2024 | BenchmarkingImage Reconstruction | CodeCode Available | 1 |
| Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting | Apr 30, 2024 | BenchmarkingDepth Completion | —Unverified | 0 |
| Evaluating Deep Clustering Algorithms on Non-Categorical 3D CAD Models | Apr 29, 2024 | BenchmarkingClustering | —Unverified | 0 |
| Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methods | Apr 29, 2024 | BenchmarkingDrug Discovery | CodeCode Available | 0 |
| MileBench: Benchmarking MLLMs in Long Context | Apr 29, 2024 | BenchmarkingDiagnostic | —Unverified | 0 |
| SIDBench: A Python Framework for Reliably Assessing Synthetic Image Detection Methods | Apr 29, 2024 | BenchmarkingImage Generation | CodeCode Available | 2 |
| On the Impact of Data Heterogeneity in Federated Learning Environments with Application to Healthcare Networks | Apr 29, 2024 | BenchmarkingFederated Learning | —Unverified | 0 |
| Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations? | Apr 29, 2024 | Answer GenerationBenchmarking | CodeCode Available | 1 |
| Detecting critical treatment effect bias in small subgroups | Apr 29, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Benchmarking Benchmark Leakage in Large Language Models | Apr 29, 2024 | BenchmarkingMathematical Reasoning | CodeCode Available | 2 |
| 4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs | Apr 28, 2024 | Benchmarking | CodeCode Available | 1 |
| Multi-Stream Cellular Test-Time Adaptation of Real-Time Models Evolving in Dynamic Environments | Apr 27, 2024 | Autonomous VehiclesBenchmarking | CodeCode Available | 1 |
| Efficient Exploration of Image Classifier Failures with Bayesian Optimization and Text-to-Image Models | Apr 26, 2024 | AttributeBayesian Optimization | —Unverified | 0 |
| Stochastic Spiking Neural Networks with First-to-Spike Coding | Apr 26, 2024 | Benchmarking | —Unverified | 0 |
| CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching | Apr 25, 2024 | BenchmarkingData Augmentation | CodeCode Available | 0 |