| Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook | May 1, 2025 | BenchmarkingChange Detection | CodeCode Available | 2 |
| MINERVA: Evaluating Complex Video Reasoning | May 1, 2025 | BenchmarkingTemporal Localization | CodeCode Available | 2 |
| GEOM-Drugs Revisited: Toward More Chemically Accurate Benchmarks for 3D Molecule Generation | Apr 30, 2025 | 3D Molecule GenerationBenchmarking | CodeCode Available | 1 |
| Towards Robust and Generalizable Gerchberg Saxton based Physics Inspired Neural Networks for Computer Generated Holography: A Sensitivity Analysis Framework | Apr 30, 2025 | BenchmarkingLearning Theory | —Unverified | 0 |
| From Precision to Perception: User-Centred Evaluation of Keyword Extraction Algorithms for Internet-Scale Contextual Advertising | Apr 30, 2025 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Sadeed: Advancing Arabic Diacritization Through Small Language Model | Apr 30, 2025 | Arabic Text DiacritizationBenchmarking | —Unverified | 0 |
| Galvatron: An Automatic Distributed System for Efficient Foundation Model Training | Apr 30, 2025 | Benchmarking | —Unverified | 0 |
| Evaluating Generative Models for Tabular Data: Novel Metrics and Benchmarking | Apr 29, 2025 | BenchmarkingIntrusion Detection | —Unverified | 0 |
| OSVBench: Benchmarking LLMs on Specification Generation Tasks for Operating System Verification | Apr 29, 2025 | BenchmarkingCode Generation | CodeCode Available | 1 |
| The Leaderboard Illusion | Apr 29, 2025 | BenchmarkingChatbot | —Unverified | 0 |