| Comparative analysis of neural network architectures for short-term FOREX forecasting | May 13, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness | May 13, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity Recognition | May 13, 2024 | Benchmarkingnamed-entity-recognition | CodeCode Available | 0 |
| oTTC: Object Time-to-Contact for Motion Estimation in Autonomous Driving | May 13, 2024 | AttributeAutonomous Driving | —Unverified | 0 |
| Benchmarking Cross-Domain Audio-Visual Deception Detection | May 11, 2024 | BenchmarkingDeception Detection | —Unverified | 0 |
| Replication Study and Benchmarking of Real-Time Object Detection Models | May 11, 2024 | Benchmarkingobject-detection | CodeCode Available | 0 |
| Benchmarking Classical and Learning-Based Multibeam Point Cloud Registration | May 10, 2024 | BenchmarkingPoint Cloud Registration | CodeCode Available | 1 |
| Automating Code Adaptation for MLOps -- A Benchmarking Study on LLMs | May 10, 2024 | BenchmarkingHyperparameter Optimization | —Unverified | 0 |
| Are EEG-to-Text Models Working? | May 10, 2024 | BenchmarkingEEG | CodeCode Available | 3 |
| Agent-oriented Joint Decision Support for Data Owners in Auction-based Federated Learning | May 9, 2024 | BenchmarkingFederated Learning | —Unverified | 0 |
| LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit | May 9, 2024 | BenchmarkingComputational Efficiency | CodeCode Available | 4 |
| Aequitas Flow: Streamlining Fair ML Experimentation | May 9, 2024 | BenchmarkingFairness | CodeCode Available | 4 |
| OpenFactCheck: Building, Benchmarking Customized Fact-Checking Systems and Evaluating the Factuality of Claims and LLMs | May 9, 2024 | BenchmarkingFact Checking | CodeCode Available | 2 |
| Benchmarking Educational Program Repair | May 8, 2024 | BenchmarkingProgram Repair | CodeCode Available | 0 |
| Refining Joint Text and Source Code Embeddings for Retrieval Task with Parameter-Efficient Fine-Tuning | May 7, 2024 | BenchmarkingContrastive Learning | CodeCode Available | 0 |
| Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking | May 7, 2024 | BenchmarkingModel Selection | —Unverified | 0 |
| AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets | May 7, 2024 | BenchmarkingCancer Classification | CodeCode Available | 1 |
| ACEGEN: Reinforcement learning of generative chemical agents for drug discovery | May 7, 2024 | BenchmarkingDecision Making | CodeCode Available | 3 |
| UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images | May 6, 2024 | Benchmarking | —Unverified | 0 |
| ATG: Benchmarking Automated Theorem Generation for Generative Language Models | May 5, 2024 | Automated Theorem ProvingBenchmarking | —Unverified | 0 |
| Performance Evaluation of Real-Time Object Detection for Electric Scooters | May 5, 2024 | Autonomous VehiclesBenchmarking | CodeCode Available | 0 |
| iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval | May 5, 2024 | BenchmarkingComposed Image Retrieval (CoIR) | CodeCode Available | 2 |
| Revisiting a Pain in the Neck: Semantic Phrase Processing Benchmark for Language Models | May 5, 2024 | Benchmarking | CodeCode Available | 0 |
| PhilHumans: Benchmarking Machine Learning for Personal Health | May 4, 2024 | Action AnticipationBenchmarking | —Unverified | 0 |
| Systematic Review: Anomaly Detection in Connected and Autonomous Vehicles | May 4, 2024 | Anomaly DetectionArticles | —Unverified | 0 |