| FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering | May 27, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| MoE-Gyro: Self-Supervised Over-Range Reconstruction and Denoising for MEMS Gyroscopes | May 27, 2025 | BenchmarkingDenoising | —Unverified | 0 |
| Bencher: Simple and Reproducible Benchmarking for Black-Box Optimization | May 27, 2025 | Benchmarking | CodeCode Available | 1 |
| AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs | May 27, 2025 | BenchmarkingQuestion Selection | CodeCode Available | 0 |
| LLaMEA-BO: A Large Language Model Evolutionary Algorithm for Automatically Generating Bayesian Optimization Algorithms | May 27, 2025 | Bayesian OptimizationBenchmarking | CodeCode Available | 2 |
| DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding | May 27, 2025 | BenchmarkingChange Detection | —Unverified | 0 |
| FM-Planner: Foundation Model Guided Path Planning for Autonomous Drone Navigation | May 27, 2025 | BenchmarkingDecision Making | CodeCode Available | 1 |
| SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge | May 27, 2025 | BenchmarkingMultiple-choice | —Unverified | 0 |
| Laparoscopic Image Desmoking Using the U-Net with New Loss Function and Integrated Differentiable Wiener Filter | May 27, 2025 | Benchmarking | CodeCode Available | 0 |
| Fedivertex: a Graph Dataset based on Decentralized Social Networks for Trustworthy Machine Learning | May 27, 2025 | Benchmarking | CodeCode Available | 0 |