| NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity Recognition | May 13, 2024 | Benchmarkingnamed-entity-recognition | CodeCode Available | 0 |
| Comparative analysis of neural network architectures for short-term FOREX forecasting | May 13, 2024 | Benchmarking | —Unverified | 0 |
| UCCIX: Irish-eXcellence Large Language Model | May 13, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Divergent Creativity in Humans and Large Language Models | May 13, 2024 | Benchmarking | CodeCode Available | 0 |
| oTTC: Object Time-to-Contact for Motion Estimation in Autonomous Driving | May 13, 2024 | AttributeAutonomous Driving | —Unverified | 0 |
| Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness | May 13, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| Benchmarking Cross-Domain Audio-Visual Deception Detection | May 11, 2024 | BenchmarkingDeception Detection | —Unverified | 0 |
| Replication Study and Benchmarking of Real-Time Object Detection Models | May 11, 2024 | Benchmarkingobject-detection | CodeCode Available | 0 |
| Automating Code Adaptation for MLOps -- A Benchmarking Study on LLMs | May 10, 2024 | BenchmarkingHyperparameter Optimization | —Unverified | 0 |
| Agent-oriented Joint Decision Support for Data Owners in Auction-based Federated Learning | May 9, 2024 | BenchmarkingFederated Learning | —Unverified | 0 |
| Benchmarking Educational Program Repair | May 8, 2024 | BenchmarkingProgram Repair | CodeCode Available | 0 |
| Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking | May 7, 2024 | BenchmarkingModel Selection | —Unverified | 0 |
| Refining Joint Text and Source Code Embeddings for Retrieval Task with Parameter-Efficient Fine-Tuning | May 7, 2024 | BenchmarkingContrastive Learning | CodeCode Available | 0 |
| UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images | May 6, 2024 | Benchmarking | —Unverified | 0 |
| Performance Evaluation of Real-Time Object Detection for Electric Scooters | May 5, 2024 | Autonomous VehiclesBenchmarking | CodeCode Available | 0 |
| ATG: Benchmarking Automated Theorem Generation for Generative Language Models | May 5, 2024 | Automated Theorem ProvingBenchmarking | —Unverified | 0 |
| Revisiting a Pain in the Neck: Semantic Phrase Processing Benchmark for Language Models | May 5, 2024 | Benchmarking | CodeCode Available | 0 |
| Systematic Review: Anomaly Detection in Connected and Autonomous Vehicles | May 4, 2024 | Anomaly DetectionArticles | —Unverified | 0 |
| PhilHumans: Benchmarking Machine Learning for Personal Health | May 4, 2024 | Action AnticipationBenchmarking | —Unverified | 0 |
| A Normative Framework for Benchmarking Consumer Fairness in Large Language Model Recommender System | May 3, 2024 | BenchmarkingCollaborative Filtering | —Unverified | 0 |
| Single and Multi-Hop Question-Answering Datasets for Reticular Chemistry with GPT-4-Turbo | May 3, 2024 | BenchmarkingMulti-hop Question Answering | CodeCode Available | 0 |
| Toward end-to-end interpretable convolutional neural networks for waveform signals | May 3, 2024 | BenchmarkingEmotion Recognition | —Unverified | 0 |
| CityLearn v2: Energy-flexible, resilient, occupant-centric, and carbon-aware management of grid-interactive communities | May 2, 2024 | BenchmarkingManagement | —Unverified | 0 |
| A Hong Kong Sign Language Corpus Collected from Sign-interpreted TV News | May 2, 2024 | BenchmarkingSign Language Recognition | —Unverified | 0 |
| Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods | May 2, 2024 | Benchmarking | —Unverified | 0 |
| The Role of Model Architecture and Scale in Predicting Molecular Properties: Insights from Fine-Tuning RoBERTa, BART, and LLaMA | May 2, 2024 | BenchmarkingDrug Discovery | CodeCode Available | 0 |
| Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting | Apr 30, 2024 | BenchmarkingDepth Completion | —Unverified | 0 |
| Evaluating Deep Clustering Algorithms on Non-Categorical 3D CAD Models | Apr 29, 2024 | BenchmarkingClustering | —Unverified | 0 |
| On the Impact of Data Heterogeneity in Federated Learning Environments with Application to Healthcare Networks | Apr 29, 2024 | BenchmarkingFederated Learning | —Unverified | 0 |
| MileBench: Benchmarking MLLMs in Long Context | Apr 29, 2024 | BenchmarkingDiagnostic | —Unverified | 0 |
| Detecting critical treatment effect bias in small subgroups | Apr 29, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methods | Apr 29, 2024 | BenchmarkingDrug Discovery | CodeCode Available | 0 |
| Efficient Exploration of Image Classifier Failures with Bayesian Optimization and Text-to-Image Models | Apr 26, 2024 | AttributeBayesian Optimization | —Unverified | 0 |
| Stochastic Spiking Neural Networks with First-to-Spike Coding | Apr 26, 2024 | Benchmarking | —Unverified | 0 |
| CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching | Apr 25, 2024 | BenchmarkingData Augmentation | CodeCode Available | 0 |
| Benchmarking Mobile Device Control Agents across Diverse Configurations | Apr 25, 2024 | BenchmarkingImitation Learning | —Unverified | 0 |
| DPO: A Differential and Pointwise Control Approach to Reinforcement Learning | Apr 24, 2024 | Benchmarkingreinforcement-learning | —Unverified | 0 |
| ApisTox: a new benchmark dataset for the classification of small molecules toxicity on honey bees | Apr 24, 2024 | BenchmarkingMolecular Property Prediction | CodeCode Available | 0 |
| Empirical Analysis of the Dynamic Binary Value Problem with IOHprofiler | Apr 24, 2024 | Benchmarking | —Unverified | 0 |
| Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image Classification | Apr 23, 2024 | BenchmarkingHyperspectral Image Classification | CodeCode Available | 0 |
| The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking | Apr 22, 2024 | BenchmarkingMisinformation | —Unverified | 0 |
| Benchmarking Advanced Text Anonymisation Methods: A Comparative Study on Novel and Traditional Approaches | Apr 22, 2024 | BenchmarkingDiversity | —Unverified | 0 |
| Open Datasets for Satellite Radio Resource Control | Apr 22, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch Videos | Apr 22, 2024 | BenchmarkingMulti-Object Tracking | —Unverified | 0 |
| EnzChemRED, a rich enzyme chemistry relation extraction dataset | Apr 22, 2024 | Benchmarkingnamed-entity-recognition | —Unverified | 0 |
| In-situ process monitoring and adaptive quality enhancement in laser additive manufacturing: a critical review | Apr 21, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| Authentic Emotion Mapping: Benchmarking Facial Expressions in Real News | Apr 21, 2024 | BenchmarkingEmotion Recognition | CodeCode Available | 0 |
| Bridging the Gap Between Theory and Practice: Benchmarking Transfer Evolutionary Optimization | Apr 20, 2024 | Benchmarking | —Unverified | 0 |
| Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning | Apr 19, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| Integrated Sensing and Communication enabled Multiple Base Stations Cooperative UAV Detection | Apr 19, 2024 | BenchmarkingIntegrated sensing and communication | —Unverified | 0 |