| Ward: Provable RAG Dataset Inference via LLM Watermarks | Oct 4, 2024 | BenchmarkingRAG | —Unverified | 0 |
| Lightning UQ Box: A Comprehensive Framework for Uncertainty Quantification in Deep Learning | Oct 4, 2024 | BenchmarkingUncertainty Quantification | —Unverified | 0 |
| AutoPenBench: Benchmarking Generative Agents for Penetration Testing | Oct 4, 2024 | Benchmarking | CodeCode Available | 2 |
| Towards a Benchmark for Large Language Models for Business Process Management Tasks | Oct 4, 2024 | BenchmarkingManagement | CodeCode Available | 0 |
| Repurposing Foundation Model for Generalizable Medical Time Series Classification | Oct 3, 2024 | BenchmarkingDiagnostic | —Unverified | 0 |
| LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services | Oct 3, 2024 | BenchmarkingGPU | CodeCode Available | 1 |
| DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects | Oct 3, 2024 | BenchmarkingImitation Learning | CodeCode Available | 1 |
| Large Language Model for Multi-Domain Translation: Benchmarking and Domain CoT Fine-tuning | Oct 3, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| MANTRA: The Manifold Triangulations Assemblage | Oct 3, 2024 | Benchmarking | CodeCode Available | 0 |
| IoT-LLM: Enhancing Real-World IoT Task Reasoning with Large Language Models | Oct 3, 2024 | BenchmarkingIn-Context Learning | —Unverified | 0 |