| LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs - No Silver Bullet for LC or RAG Routing | Feb 14, 2025 | BenchmarkingRAG | CodeCode Available | 0 |
| Standardisation of Convex Ultrasound Data Through Geometric Analysis and Augmentation | Feb 13, 2025 | Benchmarking | —Unverified | 0 |
| AT-Drone: Benchmarking Adaptive Teaming in Multi-Drone Pursuit | Feb 13, 2025 | BenchmarkingEdge-computing | —Unverified | 0 |
| Machine learning for modelling unstructured grid data in computational physics: a review | Feb 13, 2025 | Benchmarking | —Unverified | 0 |
| Beyond the Singular: The Essential Role of Multiple Generations in Effective Benchmark Evaluation and Analysis | Feb 13, 2025 | Benchmarking | —Unverified | 0 |
| SkyRover: A Modular Simulator for Cross-Domain Pathfinding | Feb 13, 2025 | Benchmarking | —Unverified | 0 |
| A Survey on LLM-based News Recommender Systems | Feb 13, 2025 | BenchmarkingFairness | —Unverified | 0 |
| EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents | Feb 13, 2025 | Benchmarking | —Unverified | 0 |
| MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency | Feb 13, 2025 | BenchmarkingMath | —Unverified | 0 |
| Zero-shot generation of synthetic neurosurgical data with large language models | Feb 13, 2025 | BenchmarkingDe-identification | CodeCode Available | 0 |