| PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms | Oct 5, 2024 | BenchmarkingGPU | —Unverified | 0 |
| Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning | Oct 5, 2024 | BenchmarkingDrug Design | CodeCode Available | 1 |
| Implicit to Explicit Entropy Regularization: Benchmarking ViT Fine-tuning under Noisy Labels | Oct 5, 2024 | Benchmarking | —Unverified | 0 |
| TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable Questions | Oct 5, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension | Oct 4, 2024 | BenchmarkingComputational chemistry | —Unverified | 0 |
| ActPlan-1K: Benchmarking the Procedural Planning Ability of Visual Language Models in Household Activities | Oct 4, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| Understanding Large Language Models in Your Pockets: Performance Study on COTS Mobile Devices | Oct 4, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Benchmarking the Fidelity and Utility of Synthetic Relational Data | Oct 4, 2024 | BenchmarkingFeature Importance | —Unverified | 0 |
| PersoBench: Benchmarking Personalized Response Generation in Large Language Models | Oct 4, 2024 | BenchmarkingDialogue Generation | CodeCode Available | 0 |
| Ward: Provable RAG Dataset Inference via LLM Watermarks | Oct 4, 2024 | BenchmarkingRAG | —Unverified | 0 |