| Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving | Jan 14, 2025 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Benchmarking Graph Representations and Graph Neural Networks for Multivariate Time Series Classification | Jan 14, 2025 | BenchmarkingGraph Representation Learning | CodeCode Available | 0 |
| Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles | Jan 13, 2025 | ArticlesBenchmarking | —Unverified | 0 |
| Stronger Than You Think: Benchmarking Weak Supervision on Realistic Tasks | Jan 13, 2025 | Benchmarking | CodeCode Available | 0 |
| The Paradox of Success in Evolutionary and Bioinspired Optimization: Revisiting Critical Issues, Key Studies, and Methodological Pathways | Jan 13, 2025 | BenchmarkingMetaheuristic Optimization | —Unverified | 0 |
| Lessons From Red Teaming 100 Generative AI Products | Jan 13, 2025 | BenchmarkingRed Teaming | —Unverified | 0 |
| TimberVision: A Multi-Task Dataset and Framework for Log-Component Segmentation and Tracking in Autonomous Forestry Operations | Jan 13, 2025 | BenchmarkingDomain Adaptation | CodeCode Available | 1 |
| Understanding and Benchmarking Artificial Intelligence: OpenAI's o3 Is Not AGI | Jan 13, 2025 | ARCBenchmarking | —Unverified | 0 |
| WebWalker: Benchmarking LLMs in Web Traversal | Jan 13, 2025 | BenchmarkingOpen-Domain Question Answering | CodeCode Available | 11 |
| Benchmarking YOLOv8 for Optimal Crack Detection in Civil Infrastructure | Jan 12, 2025 | BenchmarkingHyperparameter Optimization | —Unverified | 0 |