| A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking | Feb 28, 2023 | Adversarial RobustnessBenchmarking | —Unverified | 0 |
| Graph-based Deep-Tree Recursive Neural Network (DTRNN) for Text Classification | Sep 4, 2018 | BenchmarkingGeneral Classification | —Unverified | 0 |
| GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra | Mar 5, 2021 | BenchmarkingGraph Mining | —Unverified | 0 |
| Benchmarking Safe Deep Reinforcement Learning in Aquatic Navigation | Dec 16, 2021 | BenchmarkingDeep Reinforcement Learning | —Unverified | 0 |
| Benchmarking Rotary Position Embeddings for Automatic Speech Recognition | Jan 10, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| 7th AI Driving Olympics: 1st Place Report for Panoptic Tracking | Dec 9, 2021 | BenchmarkingPanoptic Segmentation | —Unverified | 0 |
| A Theory of Dynamic Benchmarks | Oct 6, 2022 | Benchmarking | —Unverified | 0 |
| Variational Laplace for Bayesian neural networks | Nov 20, 2020 | BenchmarkingVariational Inference | —Unverified | 0 |
| ATG: Benchmarking Automated Theorem Generation for Generative Language Models | May 5, 2024 | Automated Theorem ProvingBenchmarking | —Unverified | 0 |
| Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games | Aug 28, 2024 | Atari GamesBenchmarking | —Unverified | 0 |
| A Comprehensive Study on Dataset Distillation: Performance, Privacy, Robustness and Fairness | May 5, 2023 | BenchmarkingDataset Distillation | —Unverified | 0 |
| GPTs and Language Barrier: A Cross-Lingual Legal QA Examination | Mar 26, 2024 | ArticlesBenchmarking | —Unverified | 0 |
| Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities | May 13, 2025 | automatic-speech-translationBenchmarking | —Unverified | 0 |
| Benchmarking Robustness of Deep Reinforcement Learning approaches to Online Portfolio Management | Jun 19, 2023 | BenchmarkingDeep Reinforcement Learning | —Unverified | 0 |
| Benchmarking Robustness of Deep Learning Classifiers Using Two-Factor Perturbation | Mar 2, 2022 | BenchmarkingDeep Learning | —Unverified | 0 |
| A tale of two toolkits, report the first: benchmarking time series classification algorithms for correctness and efficiency | Sep 12, 2019 | BenchmarkingGeneral Classification | —Unverified | 0 |
| Benchmarking Robustness of Contrastive Learning Models for Medical Image-Report Retrieval | Jan 15, 2025 | BenchmarkingContrastive Learning | —Unverified | 0 |
| Benchmarking Robustness of AI-Enabled Multi-sensor Fusion Systems: Challenges and Opportunities | Jun 6, 2023 | BenchmarkingDepth Completion | —Unverified | 0 |
| A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models | Jun 17, 2024 | BenchmarkingSurvey | —Unverified | 0 |
| Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models | Jun 3, 2023 | Benchmarking | —Unverified | 0 |
| AI vs. Human Judgment of Content Moderation: LLM-as-a-Judge and Ethics-Based Response Refusals | May 21, 2025 | BenchmarkingChatbot | —Unverified | 0 |
| GreenPCO: An Unsupervised Lightweight Point Cloud Odometry Method | Dec 8, 2021 | BenchmarkingObject | —Unverified | 0 |
| Granular Change Accuracy: A More Accurate Performance Metric for Dialogue State Tracking | Mar 17, 2024 | BenchmarkingDialogue State Tracking | —Unverified | 0 |
| Benchmarking Robustness in Neural Radiance Fields | Jan 10, 2023 | BenchmarkingCamera Calibration | —Unverified | 0 |
| A Systematic Evaluation of Domain Adaptation Algorithms On Time Series Data | Sep 29, 2021 | BenchmarkingDomain Adaptation | —Unverified | 0 |
| Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO | Aug 30, 2023 | BenchmarkingReinforcement Learning (RL) | —Unverified | 0 |
| Benchmarking Robot Manipulation with the Rubik's Cube | Feb 14, 2022 | BenchmarkingRobot Manipulation | —Unverified | 0 |
| A Comprehensive Multi-Illuminant Dataset for Benchmarking of the Intrinsic Image Algorithms | Dec 1, 2015 | BenchmarkingImage Generation | —Unverified | 0 |
| Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness | May 13, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| A Systematic Analysis of Hybrid Linear Attention | Jul 8, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Benchmarking Retrieval-Augmented Generation for Chemistry | May 12, 2025 | BenchmarkingRAG | —Unverified | 0 |
| Self-Aligning Depth-regularized Radiance Fields for Asynchronous RGB-D Sequences | Nov 14, 2022 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Airport Capacity and Performance in Europe -- A study of transport economics, service quality and sustainability | Feb 4, 2021 | Benchmarking | —Unverified | 0 |
| Benchmarking Resource Usage for Efficient Distributed Deep Learning | Jan 28, 2022 | BenchmarkingDeep Learning | —Unverified | 0 |
| Goal-Driven Sequential Data Abstraction | Jul 29, 2019 | BenchmarkingGeneral Reinforcement Learning | —Unverified | 0 |
| A Survey on Vision Autoregressive Model | Nov 13, 2024 | 3D GenerationBenchmarking | —Unverified | 0 |
| A Survey on Temporal Sentence Grounding in Videos | Sep 16, 2021 | Action LocalizationBenchmarking | —Unverified | 0 |
| Benchmarking Reinforcement Learning Methods for Dexterous Robotic Manipulation with a Three-Fingered Gripper | Aug 27, 2024 | BenchmarkingReinforcement Learning (RL) | —Unverified | 0 |
| 4Seasons: Benchmarking Visual SLAM and Long-Term Localization for Autonomous Driving in Challenging Conditions | Dec 31, 2022 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Domain Adaptation with Joint Learning for Generic, Optical Car Part Recognition and Detection Systems (Go-CaRD) | Jun 15, 2020 | BenchmarkingDomain Adaptation | —Unverified | 0 |
| GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models | Apr 10, 2024 | BenchmarkingDenoising | —Unverified | 0 |
| Graph Alignment for Benchmarking Graph Neural Networks and Learning Positional Encodings | May 19, 2025 | BenchmarkingCombinatorial Optimization | —Unverified | 0 |
| Greening AI-enabled Systems with Software Engineering: A Research Agenda for Environmentally Sustainable AI Practices | Jun 2, 2025 | Benchmarking | —Unverified | 0 |
| Helsinki Deblur Challenge 2021: description of photographic data | May 21, 2021 | BenchmarkingDeblurring | —Unverified | 0 |
| A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data Streams | Jun 16, 2021 | Active LearningBenchmarking | —Unverified | 0 |
| A Survey on Preserving Fairness Guarantees in Changing Environments | Nov 14, 2022 | BenchmarkingDecision Making | —Unverified | 0 |
| Benchmarking Reasoning Robustness in Large Language Models | Mar 6, 2025 | BenchmarkingMath | —Unverified | 0 |
| Benchmarking real-time monitoring strategies for ethanol production from lignocellulosic biomass | Jan 29, 2021 | Benchmarking | —Unverified | 0 |
| Global Wheat Head Dataset 2021: more diversity to improve the benchmarking of wheat head localization methods | May 17, 2021 | BenchmarkingDiversity | —Unverified | 0 |
| Feasibility of BERT Embeddings For Domain-Specific Knowledge Mining | Jan 16, 2022 | BenchmarkingLanguage Modelling | —Unverified | 0 |