| Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey | May 3, 2025 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| PhytoSynth: Leveraging Multi-modal Generative Models for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach | May 3, 2025 | BenchmarkingImage-to-Image Translation | —Unverified | 0 |
| EvalxNLP: A Framework for Benchmarking Post-Hoc Explainability Methods on NLP Models | May 2, 2025 | Benchmarking | CodeCode Available | 0 |
| Overview and practical recommendations on using Shapley Values for identifying predictive biomarkers via CATE modeling | May 2, 2025 | Benchmarking | —Unverified | 0 |
| Can Foundation Models Really Segment Tumors? A Benchmarking Odyssey in Lung CT Imaging | May 2, 2025 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Parameterized Argumentation-based Reasoning Tasks for Benchmarking Generative Language Models | May 2, 2025 | Benchmarking | CodeCode Available | 0 |
| Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation | May 1, 2025 | BenchmarkingPosition | —Unverified | 0 |
| EnronQA: Towards Personalized RAG over Private Documents | May 1, 2025 | BenchmarkingMemorization | —Unverified | 0 |
| InterLoc: LiDAR-based Intersection Localization using Road Segmentation with Automated Evaluation Method | May 1, 2025 | BenchmarkingMotion Planning | —Unverified | 0 |
| MINERVA: Evaluating Complex Video Reasoning | May 1, 2025 | BenchmarkingTemporal Localization | CodeCode Available | 2 |
| AI-ready Snow Radar Echogram Dataset (SRED) for climate change monitoring | May 1, 2025 | BenchmarkingDeep Learning | —Unverified | 0 |
| Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook | May 1, 2025 | BenchmarkingChange Detection | CodeCode Available | 2 |
| GEOM-Drugs Revisited: Toward More Chemically Accurate Benchmarks for 3D Molecule Generation | Apr 30, 2025 | 3D Molecule GenerationBenchmarking | CodeCode Available | 1 |
| From Precision to Perception: User-Centred Evaluation of Keyword Extraction Algorithms for Internet-Scale Contextual Advertising | Apr 30, 2025 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Towards Robust and Generalizable Gerchberg Saxton based Physics Inspired Neural Networks for Computer Generated Holography: A Sensitivity Analysis Framework | Apr 30, 2025 | BenchmarkingLearning Theory | —Unverified | 0 |
| Sadeed: Advancing Arabic Diacritization Through Small Language Model | Apr 30, 2025 | Arabic Text DiacritizationBenchmarking | —Unverified | 0 |
| Galvatron: An Automatic Distributed System for Efficient Foundation Model Training | Apr 30, 2025 | Benchmarking | —Unverified | 0 |
| Evaluating Generative Models for Tabular Data: Novel Metrics and Benchmarking | Apr 29, 2025 | BenchmarkingIntrusion Detection | —Unverified | 0 |
| Hydra: Marker-Free RGB-D Hand-Eye Calibration | Apr 29, 2025 | Benchmarking | —Unverified | 0 |
| OSVBench: Benchmarking LLMs on Specification Generation Tasks for Operating System Verification | Apr 29, 2025 | BenchmarkingCode Generation | CodeCode Available | 1 |
| The Leaderboard Illusion | Apr 29, 2025 | BenchmarkingChatbot | —Unverified | 0 |
| TrueFake: A Real World Case Dataset of Last Generation Fake Images also Shared on Social Networks | Apr 29, 2025 | BenchmarkingMisinformation | CodeCode Available | 1 |
| On the Potential of Large Language Models to Solve Semantics-Aware Process Mining Tasks | Apr 29, 2025 | Anomaly DetectionBenchmarking | —Unverified | 0 |
| LMME3DHF: Benchmarking and Evaluating Multimodal 3D Human Face Generation with LMMs | Apr 29, 2025 | BenchmarkingFace Generation | —Unverified | 0 |
| SecRepoBench: Benchmarking LLMs for Secure Code Generation in Real-World Repositories | Apr 29, 2025 | BenchmarkingCode Generation | —Unverified | 0 |