| Understanding and Benchmarking Artificial Intelligence: OpenAI's o3 Is Not AGI | Jan 13, 2025 | ARCBenchmarking | —Unverified | 0 | 0 |
| Quantifying Social Biases Using Templates is Unreliable | Oct 9, 2022 | AttributeBenchmarking | —Unverified | 0 | 0 |
| Quantifying the Complexity of Standard Benchmarking Datasets for Long-Term Human Trajectory Prediction | May 28, 2020 | BenchmarkingPrediction | —Unverified | 0 | 0 |
| Quantifying the Impact of Boundary Constraint Handling Methods on Differential Evolution | May 14, 2021 | Benchmarking | —Unverified | 0 | 0 |
| A Comparison of Pooling Methods on LSTM Models for Rare Acoustic Event Classification | Feb 14, 2020 | BenchmarkingClassification | —Unverified | 0 | 0 |
| Quantitative Benchmarking of Anomaly Detection Methods in Digital Pathology | Jun 24, 2025 | Anomaly DetectionArtifact Detection | —Unverified | 0 | 0 |
| A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking | May 26, 2025 | BenchmarkingOptical Flow Estimation | —Unverified | 0 | 0 |
| Quantitative evaluation of brain-inspired vision sensors in high-speed robotic perception | Apr 27, 2025 | BenchmarkingEvent-based vision | —Unverified | 0 | 0 |
| A Unified Framework for Provably Efficient Algorithms to Estimate Shapley Values | Jun 5, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Understanding Foundation Models: Are We Back in 1924? | Sep 11, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Quantitative Metrics for Benchmarking Medical Image Harmonization | Feb 6, 2024 | AnatomyBenchmarking | —Unverified | 0 | 0 |
| Benchmarking Bayesian neural networks and evaluation metrics for regression tasks | Jun 8, 2022 | BenchmarkingOpen-Ended Question Answering | —Unverified | 0 | 0 |
| A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models | Feb 21, 2024 | BenchmarkingImage to text | —Unverified | 0 | 0 |
| Quantum-Assisted Learning of Hardware-Embedded Probabilistic Graphical Models | Sep 8, 2016 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 | 0 |
| Understanding or Manipulation: Rethinking Online Performance Gains of Modern Recommender Systems | Oct 11, 2022 | BenchmarkingRecommendation Systems | —Unverified | 0 | 0 |
| Quantum classification of the MNIST dataset with Slow Feature Analysis | May 22, 2018 | BenchmarkingClassification | —Unverified | 0 | 0 |
| Quantum Cognitively Motivated Decision Fusion for Video Sentiment Analysis | Jan 12, 2021 | BenchmarkingDecision Making | —Unverified | 0 | 0 |
| A Comparison of Directional Distances for Hand Pose Estimation | Apr 3, 2017 | BenchmarkingHand Pose Estimation | —Unverified | 0 | 0 |
| Quantum Kernel Methods under Scrutiny: A Benchmarking Study | Sep 6, 2024 | BenchmarkingQuantum Machine Learning | —Unverified | 0 | 0 |
| Quantum Long Short-Term Memory (QLSTM) vs Classical LSTM in Time Series Forecasting: A Comparative Study in Solar Power Forecasting | Oct 25, 2023 | BenchmarkingHyperparameter Optimization | —Unverified | 0 | 0 |
| Quantum Kernel Learning for Small Dataset Modeling in Semiconductor Fabrication: Application to Ohmic Contact | Sep 17, 2024 | BenchmarkingQuantum Machine Learning | —Unverified | 0 | 0 |
| Quantum-tunnelling deep neural network for optical illusion recognition | Jun 26, 2024 | Autonomous VehiclesBenchmarking | —Unverified | 0 | 0 |
| QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture | Jan 3, 2025 | BenchmarkingQuestion Answering | —Unverified | 0 | 0 |
| Stereotype Detection in LLMs: A Multiclass, Explainable, and Benchmark-Driven Approach | Apr 2, 2024 | BenchmarkingCommon Sense Reasoning | —Unverified | 0 | 0 |
| Understanding Recurrent Neural Architectures by Analyzing and Synthesizing Long Distance Dependencies in Benchmark Sequential Datasets | Oct 6, 2018 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| Yet Another ADNI Machine Learning Paper? Paving The Way Towards Fully-reproducible Research on Classification of Alzheimer's Disease | Sep 21, 2017 | BenchmarkingClassification | —Unverified | 0 | 0 |
| Understanding the Limits of Lifelong Knowledge Editing in LLMs | Mar 7, 2025 | Benchmarkingknowledge editing | —Unverified | 0 | 0 |
| Who Wins the Game of Thrones? How Sentiments Improve the Prediction of Candidate Choice | Feb 29, 2020 | BenchmarkingHoldout Set | —Unverified | 0 | 0 |
| Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective | Jun 19, 2024 | BenchmarkingContinual Pretraining | —Unverified | 0 | 0 |
| Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture | Apr 21, 2025 | Benchmarkingclass-incremental learning | —Unverified | 0 | 0 |
| A Two-Step Framework for Multi-Material Decomposition of Dual Energy Computed Tomography from Projection Domain | Oct 31, 2023 | BenchmarkingDiagnostic | —Unverified | 0 | 0 |
| R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models | Jun 3, 2024 | BenchmarkingCode Completion | —Unverified | 0 | 0 |
| R2H: Building Multimodal Navigation Helpers that Respond to Help Requests | May 23, 2023 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation | May 29, 2025 | BenchmarkingImage Generation | —Unverified | 0 | 0 |
| R3L: Connecting Deep Reinforcement Learning to Recurrent Neural Networks for Image Denoising via Residual Recovery | Jul 12, 2021 | BenchmarkingDeep Reinforcement Learning | —Unverified | 0 | 0 |
| A Two-Stage Neural-Filter Pareto Front Extractor and the need for Benchmarking | Sep 29, 2021 | BenchmarkingMulti-Task Learning | —Unverified | 0 | 0 |
| RadFusion: Benchmarking Performance and Fairness for Multimodal Pulmonary Embolism Detection from CT and EHR | Nov 23, 2021 | BenchmarkingComputed Tomography (CT) | —Unverified | 0 | 0 |
| A tutorial on multi-view autoencoders using the multi-view-AE library | Mar 12, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Understanding the User: An Intent-Based Ranking Dataset | Aug 30, 2024 | BenchmarkingInformation Retrieval | —Unverified | 0 | 0 |
| RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems | Jun 25, 2024 | BenchmarkingRAG | —Unverified | 0 | 0 |
| Attention versus Contrastive Learning of Tabular Data -- A Data-centric Benchmarking | Jan 8, 2024 | BenchmarkingContrastive Learning | —Unverified | 0 | 0 |
| A Theory of Dynamic Benchmarks | Oct 6, 2022 | Benchmarking | —Unverified | 0 | 0 |
| RAG-Reward: Optimizing RAG with Reward Modeling and RLHF | Jan 22, 2025 | BenchmarkingHallucination | —Unverified | 0 | 0 |
| Rail-5k: a Real-World Dataset for Rail Surface Defects Detection | Jun 28, 2021 | 4kBenchmarking | —Unverified | 0 | 0 |
| On the Evaluation of Engineering Artificial General Intelligence | May 15, 2025 | Benchmarking | —Unverified | 0 | 0 |
| A Comparison of Deep Learning MOS Predictors for Speech Synthesis Quality | Apr 5, 2022 | BenchmarkingSelf-Supervised Learning | —Unverified | 0 | 0 |
| RAN-GNNs: breaking the capacity limits of graph neural networks | Mar 29, 2021 | AttributeBenchmarking | —Unverified | 0 | 0 |
| ATG: Benchmarking Automated Theorem Generation for Generative Language Models | May 5, 2024 | Automated Theorem ProvingBenchmarking | —Unverified | 0 | 0 |
| A Comparison of Cryptocurrency Volatility-benchmarking New and Mature Asset Classes | Apr 7, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games | Aug 28, 2024 | Atari GamesBenchmarking | —Unverified | 0 | 0 |