| Benchmarking MedMNIST dataset on real quantum hardware | Feb 18, 2025 | Benchmarkingimage-classification | —Unverified | 0 |
| Integrating Expert Knowledge into Logical Programs via LLMs | Feb 17, 2025 | BenchmarkingLogical Reasoning | CodeCode Available | 0 |
| Positional Encoding in Transformer-Based Time Series Models: A Survey | Feb 17, 2025 | Anomaly DetectionBenchmarking | CodeCode Available | 1 |
| ILIAS: Instance-Level Image retrieval At Scale | Feb 17, 2025 | BenchmarkingImage Retrieval | CodeCode Available | 1 |
| Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models | Feb 17, 2025 | Benchmarking | —Unverified | 0 |
| Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption | Feb 17, 2025 | BenchmarkingCode Summarization | —Unverified | 0 |
| Knowledge-aware contrastive heterogeneous molecular graph learning | Feb 17, 2025 | BenchmarkingContrastive Learning | —Unverified | 0 |
| Defining and Evaluating Visual Language Models' Basic Spatial Abilities: A Perspective from Psychometrics | Feb 17, 2025 | BenchmarkingDiagnostic | —Unverified | 0 |
| Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance | Feb 17, 2025 | BenchmarkingDependency Parsing | —Unverified | 0 |
| Plant in Cupboard, Orange on Rably, Inat Aphone. Benchmarking Incremental Learning of Situation and Language Model using a Text-Simulated Situated Environment | Feb 17, 2025 | BenchmarkingCommon Sense Reasoning | —Unverified | 0 |