| Automatic detection of passable roads after floods in remote sensed and social media data | Jan 10, 2019 | BenchmarkingTransfer Learning | —Unverified | 0 |
| From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems | Oct 24, 2024 | BenchmarkingCommon Sense Reasoning | —Unverified | 0 |
| A Line-of-Sight Channel Model for the 100-450 Gigahertz Frequency Band | Feb 12, 2020 | Benchmarking | —Unverified | 0 |
| A Continuously Growing Dataset of Sentential Paraphrases | Aug 1, 2017 | BenchmarkingParaphrase Identification | —Unverified | 0 |
| From Code to Play: Benchmarking Program Search for Games Using Large Language Models | Dec 5, 2024 | Atari GamesBenchmarking | —Unverified | 0 |
| From Generation to Detection: A Multimodal Multi-Task Dataset for Benchmarking Health Misinformation | May 24, 2025 | ArticlesBenchmarking | —Unverified | 0 |
| Benchmarking Time Series Forecasting Models: From Statistical Techniques to Foundation Models in Real-World Applications | Feb 5, 2025 | BenchmarkingFeature Engineering | —Unverified | 0 |
| Model Performance-Guided Evaluation Data Selection for Effective Prompt Optimization | May 15, 2025 | BenchmarkingClustering | —Unverified | 0 |
| Free Performance Gain from Mixing Multiple Partially Labeled Samples in Multi-label Image Classification | May 24, 2024 | BenchmarkingData Augmentation | —Unverified | 0 |
| Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation | Mar 5, 2024 | BenchmarkingIn-Context Learning | —Unverified | 0 |