| Evaluating Robustness of LLMs on Crisis-Related Microblogs across Events, Information Types, and Linguistic Features | Dec 8, 2024 | Benchmarking | —Unverified | 0 |
| Thermal Image-based Fault Diagnosis in Induction Machines via Self-Organized Operational Neural Networks | Dec 8, 2024 | BenchmarkingDiagnostic | —Unverified | 0 |
| A Dataset Similarity Evaluation Framework for Wireless Communications and Sensing | Dec 7, 2024 | BenchmarkingDimensionality Reduction | —Unverified | 0 |
| ACT-Bench: Towards Action Controllable World Models for Autonomous Driving | Dec 6, 2024 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| The BrowserGym Ecosystem for Web Agent Research | Dec 6, 2024 | Benchmarking | CodeCode Available | 5 |
| ConQRet: Benchmarking Fine-Grained Evaluation of Retrieval Augmented Argumentation with LLM Judges | Dec 6, 2024 | BenchmarkingRetrieval | CodeCode Available | 0 |
| An Experimental Evaluation of Imputation Models for Spatial-Temporal Traffic Data | Dec 6, 2024 | BenchmarkingImputation | CodeCode Available | 0 |
| MozzaVID: Mozzarella Volumetric Image Dataset | Dec 6, 2024 | BenchmarkingComputed Tomography (CT) | —Unverified | 0 |
| MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects | Dec 6, 2024 | 2kAnomaly Detection | —Unverified | 0 |
| Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models | Dec 6, 2024 | BenchmarkingDialogue Understanding | —Unverified | 0 |