| Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents | Oct 3, 2024 | Autonomous DrivingBackdoor Attack | CodeCode Available | 3 |
| A Real Benchmark Swell Noise Dataset for Performing Seismic Data Denoising via Deep Learning | Oct 2, 2024 | BenchmarkingDenoising | —Unverified | 0 |
| CALF: Benchmarking Evaluation of LFQA Using Chinese Examinations | Oct 2, 2024 | BenchmarkingLong Form Question Answering | —Unverified | 0 |
| MONICA: Benchmarking on Long-tailed Medical Image Classification | Oct 2, 2024 | BenchmarkingClassification | CodeCode Available | 1 |
| Emo3D: Metric and Benchmarking Dataset for 3D Facial Expression Generation from Emotion Description | Oct 2, 2024 | BenchmarkingFacial expression generation | —Unverified | 0 |
| OmniGenBench: Automating Large-scale in-silico Benchmarking for Genomic Foundation Models | Oct 2, 2024 | Benchmarking | CodeCode Available | 3 |
| StringLLM: Understanding the String Processing Capability of Large Language Models | Oct 2, 2024 | Benchmarking | CodeCode Available | 1 |
| ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving | Oct 2, 2024 | BenchmarkingDocument Summarization | —Unverified | 0 |
| shapiq: Shapley Interactions for Machine Learning | Oct 2, 2024 | BenchmarkingData Valuation | CodeCode Available | 4 |
| Deep Unlearn: Benchmarking Machine Unlearning | Oct 2, 2024 | BenchmarkingMachine Unlearning | —Unverified | 0 |