| Prediction Accuracy & Reliability: Classification and Object Localization under Distribution Shift | Sep 5, 2024 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Benchmarking Spurious Bias in Few-Shot Image Classifiers | Sep 4, 2024 | AttributeBenchmarking | CodeCode Available | 0 |
| PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation | Sep 4, 2024 | Benchmarking | —Unverified | 0 |
| NUMOSIM: A Synthetic Mobility Dataset with Anomaly Detection Benchmarks | Sep 4, 2024 | Anomaly DetectionBenchmarking | —Unverified | 0 |
| EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision | Sep 3, 2024 | BenchmarkingMixed Reality | —Unverified | 0 |
| Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study | Sep 3, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| Benchmarking Cognitive Domains for LLMs: Insights from Taiwanese Hakka Culture | Sep 3, 2024 | BenchmarkingRAG | —Unverified | 0 |
| From Grounding to Planning: Benchmarking Bottlenecks in Web Agents | Sep 3, 2024 | Benchmarking | —Unverified | 0 |
| Revisiting Safe Exploration in Safe Reinforcement learning | Sep 2, 2024 | Benchmarkingreinforcement-learning | —Unverified | 0 |
| Landscape-Aware Automated Algorithm Configuration using Multi-output Mixed Regression and Classification | Sep 2, 2024 | Benchmarking | —Unverified | 0 |