| MIRAI: Evaluating LLM Agents for Event Forecasting | Jul 1, 2024 | ArticlesBenchmarking | —Unverified | 0 |
| Task-oriented Over-the-air Computation for Edge-device Co-inference with Balanced Classification Accuracy | Jul 1, 2024 | Benchmarking | —Unverified | 0 |
| GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing | Jun 30, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| Commute Graph Neural Networks | Jun 30, 2024 | Benchmarking | —Unverified | 0 |
| PerSEval: Assessing Personalization in Text Summarizers | Jun 29, 2024 | BenchmarkingHuman Judgment Correlation | —Unverified | 0 |
| Benchmarking M6 Competitors: An Analysis of Financial Metrics and Discussion of Incentives | Jun 27, 2024 | Benchmarking | —Unverified | 0 |
| Generative AI for Synthetic Data Across Multiple Medical Modalities: A Systematic Review of Recent Developments and Challenges | Jun 27, 2024 | BenchmarkingClinical Knowledge | —Unverified | 0 |
| Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI | Jun 26, 2024 | BenchmarkingCrop Type Mapping | —Unverified | 0 |
| Quantum-tunnelling deep neural network for optical illusion recognition | Jun 26, 2024 | Autonomous VehiclesBenchmarking | —Unverified | 0 |
| XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis | Jun 26, 2024 | Autonomous DrivingBenchmarking | —Unverified | 0 |