| Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices | Jun 6, 2024 | BenchmarkingRAG | —Unverified | 0 |
| BEADs: Bias Evaluation Across Domains | Jun 6, 2024 | BenchmarkingBias Detection | —Unverified | 0 |
| MLVU: Benchmarking Multi-task Long Video Understanding | Jun 6, 2024 | BenchmarkingVideo Understanding | CodeCode Available | 3 |
| TIDMAD: Time Series Dataset for Discovering Dark Matter with AI Denoising | Jun 5, 2024 | BenchmarkingDenoising | CodeCode Available | 1 |
| Comparative Benchmarking of Failure Detection Methods in Medical Image Segmentation: Unveiling the Role of Confidence Aggregation | Jun 5, 2024 | BenchmarkingImage Segmentation | —Unverified | 0 |
| CommonPower: A Framework for Safe Data-Driven Smart Grid Control | Jun 5, 2024 | Benchmarkingenergy management | CodeCode Available | 1 |
| A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection | Jun 5, 2024 | Anomaly DetectionBenchmarking | —Unverified | 0 |
| CattleFace-RGBT: RGB-T Cattle Facial Landmark Benchmark | Jun 5, 2024 | Benchmarking | CodeCode Available | 1 |
| Hyperbolic Benchmarking Unveils Network Topology-Feature Relationship in GNN Performance | Jun 4, 2024 | BenchmarkingDrug Discovery | CodeCode Available | 0 |
| MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset | Jun 4, 2024 | Benchmarking | CodeCode Available | 0 |