| Position: There are no Champions in Long-Term Time Series Forecasting | Feb 19, 2025 | BenchmarkingPosition | —Unverified | 0 |
| Benchmarking of Different YOLO Models for CAPTCHAs Detection and Classification | Feb 19, 2025 | Benchmarking | —Unverified | 0 |
| GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking | Feb 19, 2025 | Benchmarking | —Unverified | 0 |
| VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare | Feb 19, 2025 | BenchmarkingDiversity | —Unverified | 0 |
| STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models | Feb 18, 2025 | BenchmarkingLarge Language Model | —Unverified | 0 |
| Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics | Feb 18, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation | Feb 18, 2025 | BenchmarkingText Generation | —Unverified | 0 |
| EquiBench: Benchmarking Large Language Models' Understanding of Program Semantics via Equivalence Checking | Feb 18, 2025 | BenchmarkingBinary Classification | —Unverified | 0 |
| Text2World: Benchmarking Large Language Models for Symbolic World Model Generation | Feb 18, 2025 | Benchmarking | —Unverified | 0 |
| A new pathway to generative artificial intelligence by minimizing the maximum entropy | Feb 18, 2025 | Benchmarking | —Unverified | 0 |