| Text2World: Benchmarking Large Language Models for Symbolic World Model Generation | Feb 18, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis | Feb 18, 2025 | BenchmarkingMamba | CodeCode Available | 0 |
| Benchmarking MedMNIST dataset on real quantum hardware | Feb 18, 2025 | Benchmarkingimage-classification | —Unverified | 0 |
| Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption | Feb 17, 2025 | BenchmarkingCode Summarization | —Unverified | 0 |
| Ansatz-free Hamiltonian learning with Heisenberg-limited scaling | Feb 17, 2025 | Benchmarking | —Unverified | 0 |
| Integrating Expert Knowledge into Logical Programs via LLMs | Feb 17, 2025 | BenchmarkingLogical Reasoning | CodeCode Available | 0 |
| Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models | Feb 17, 2025 | Benchmarking | —Unverified | 0 |
| Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance | Feb 17, 2025 | BenchmarkingDependency Parsing | —Unverified | 0 |
| Knowledge-aware contrastive heterogeneous molecular graph learning | Feb 17, 2025 | BenchmarkingContrastive Learning | —Unverified | 0 |
| Plant in Cupboard, Orange on Rably, Inat Aphone. Benchmarking Incremental Learning of Situation and Language Model using a Text-Simulated Situated Environment | Feb 17, 2025 | BenchmarkingCommon Sense Reasoning | —Unverified | 0 |