| BLADE: Benchmarking Language Model Agents for Data-Driven Science | Aug 19, 2024 | BenchmarkingDecision Making | CodeCode Available | 1 |
| Large Language Models for Classical Chinese Poetry Translation: Benchmarking, Evaluating, and Improving | Aug 19, 2024 | BenchmarkingMachine Translation | —Unverified | 0 |
| Benchmarking quantum machine learning kernel training for classification tasks | Aug 17, 2024 | BenchmarkingQuantum Machine Learning | CodeCode Available | 0 |
| PADetBench: Towards Benchmarking Physical Attacks against Object Detection | Aug 17, 2024 | Adversarial RobustnessBenchmarking | CodeCode Available | 1 |
| Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors | Aug 15, 2024 | BenchmarkingManagement | —Unverified | 0 |
| SER Evals: In-domain and Out-of-domain Benchmarking for Speech Emotion Recognition | Aug 14, 2024 | Automatic Speech RecognitionBenchmarking | CodeCode Available | 1 |
| SustainDC: Benchmarking for Sustainable Data Center Control | Aug 14, 2024 | BenchmarkingManagement | CodeCode Available | 2 |
| TabularBench: Benchmarking Adversarial Robustness for Tabular Deep Learning in Real-world Use-cases | Aug 14, 2024 | Adversarial RobustnessBenchmarking | CodeCode Available | 1 |
| XCompress: LLM assisted Python-based text compression toolkit | Aug 12, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| Benchmarking tree species classification from proximally-sensed laser scanning data: introducing the FOR-species20K dataset | Aug 12, 2024 | Benchmarking | CodeCode Available | 1 |
| A Novel Momentum-Based Deep Learning Techniques for Medical Image Classification and Segmentation | Aug 11, 2024 | Benchmarkingimage-classification | —Unverified | 0 |
| A Meta-Engine Framework for Interleaved Task and Motion Planning using Topological Refinements | Aug 11, 2024 | BenchmarkingMotion Planning | —Unverified | 0 |
| Benchmarking Conventional and Learned Video Codecs with a Low-Delay Configuration | Aug 9, 2024 | BenchmarkingVideo Compression | —Unverified | 0 |
| Capsule Vision 2024 Challenge: Multi-Class Abnormality Classification for Video Capsule Endoscopy | Aug 9, 2024 | BenchmarkingMedical Image Analysis | CodeCode Available | 0 |
| h4rm3l: A language for Composable Jailbreak Attack Synthesis | Aug 9, 2024 | BenchmarkingProgram Synthesis | —Unverified | 0 |
| UAV-Enhanced Combination to Application: Comprehensive Analysis and Benchmarking of a Human Detection Dataset for Disaster Scenarios | Aug 9, 2024 | BenchmarkingHuman Detection | CodeCode Available | 1 |
| The impact of internal variability on benchmarking deep learning climate emulators | Aug 9, 2024 | BenchmarkingDeep Learning | CodeCode Available | 1 |
| FedAD-Bench: A Unified Benchmark for Federated Unsupervised Anomaly Detection in Tabular Data | Aug 8, 2024 | Anomaly DetectionBenchmarking | —Unverified | 0 |
| SegXAL: Explainable Active Learning for Semantic Segmentation in Driving Scene Scenarios | Aug 8, 2024 | Active LearningBenchmarking | —Unverified | 0 |
| Towards Explainable Network Intrusion Detection using Large Language Models | Aug 8, 2024 | BenchmarkingIntrusion Detection | —Unverified | 0 |
| WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models | Aug 7, 2024 | AI and SafetyBenchmarking | CodeCode Available | 1 |
| Online Model-based Anomaly Detection in Multivariate Time Series: Taxonomy, Survey, Research Challenges and Future Directions | Aug 7, 2024 | Anomaly DetectionBenchmarking | —Unverified | 0 |
| Soft-Hard Attention U-Net Model and Benchmark Dataset for Multiscale Image Shadow Removal | Aug 7, 2024 | BenchmarkingHard Attention | —Unverified | 0 |
| Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond | Aug 7, 2024 | BenchmarkingLanguage Identification | CodeCode Available | 1 |
| Segment Anything in Medical Images and Videos: Benchmark and Deployment | Aug 6, 2024 | BenchmarkingSegmentation | CodeCode Available | 7 |