| Benchmarking and Enhancing LLM Agents in Localizing Linux Kernel Bugs | May 26, 2025 | BenchmarkingFault localization | CodeCode Available | 0 |
| PathBench: A comprehensive comparison benchmark for pathology foundation models towards precision oncology | May 26, 2025 | BenchmarkingPrognosis | —Unverified | 0 |
| Calibrating Pre-trained Language Classifiers on LLM-generated Noisy Labels via Iterative Refinement | May 26, 2025 | Benchmarking | CodeCode Available | 0 |
| Transformers in Protein: A Survey | May 26, 2025 | BenchmarkingDrug Discovery | —Unverified | 0 |
| StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs | May 26, 2025 | Benchmarking | —Unverified | 0 |
| AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and Healthcare | May 26, 2025 | BenchmarkingMedical Diagnosis | CodeCode Available | 0 |
| Beyond Specialization: Benchmarking LLMs for Transliteration of Indian Languages | May 26, 2025 | BenchmarkingTransliteration | —Unverified | 0 |
| Automated Text-to-Table for Reasoning-Intensive Table QA: Pipeline Design and Benchmarking Insights | May 26, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking | May 26, 2025 | BenchmarkingOptical Flow Estimation | —Unverified | 0 |
| FinLoRA: Benchmarking LoRA Methods for Fine-Tuning LLMs on Financial Datasets | May 26, 2025 | BenchmarkingGPU | —Unverified | 0 |