| MedBrowseComp: Benchmarking Medical Deep Research and Computer Use | May 20, 2025 | Benchmarking | —Unverified | 0 |
| DECASTE: Unveiling Caste Stereotypes in Large Language Models through Multi-Dimensional Bias Analysis | May 20, 2025 | BenchmarkingFairness | —Unverified | 0 |
| Explaining Unreliable Perception in Automated Driving: A Fuzzy-based Monitoring Approach | May 20, 2025 | Benchmarking | —Unverified | 0 |
| LLM-based Evaluation Policy Extraction for Ecological Modeling | May 20, 2025 | BenchmarkingLarge Language Model | —Unverified | 0 |
| SurvUnc: A Meta-Model Based Uncertainty Quantification Framework for Survival Analysis | May 20, 2025 | BenchmarkingModel Optimization | CodeCode Available | 0 |
| TransBench: Benchmarking Machine Translation for Industrial-Scale Applications | May 20, 2025 | BenchmarkingMachine Translation | —Unverified | 0 |
| OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking | May 20, 2025 | Benchmarking | CodeCode Available | 3 |
| A Data-Driven Method to Identify IBRs with Dominant Participation in Sub-Synchronous Oscillations | May 20, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking data encoding methods in Quantum Machine Learning | May 20, 2025 | BenchmarkingQuantum Machine Learning | —Unverified | 0 |
| ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations | May 20, 2025 | Benchmarking | —Unverified | 0 |