| AI vs. Human Judgment of Content Moderation: LLM-as-a-Judge and Ethics-Based Response Refusals | May 21, 2025 | BenchmarkingChatbot | —Unverified | 0 | 0 |
| Exploration of TPUs for AI Applications | Sep 16, 2023 | BenchmarkingEdge-computing | —Unverified | 0 | 0 |
| Exploring and Benchmarking the Planning Capabilities of Large Language Models | Jun 18, 2024 | BenchmarkingIn-Context Learning | —Unverified | 0 | 0 |
| Exploring Capabilities of Time Series Foundation Models in Building Analytics | Oct 28, 2024 | Benchmarkingenergy management | —Unverified | 0 | 0 |
| A Benchmarking Environment for Reinforcement Learning Based Task Oriented Dialogue Management | Nov 29, 2017 | BenchmarkingDeep Reinforcement Learning | —Unverified | 0 | 0 |
| Exploring Continual Learning of Diffusion Models | Mar 27, 2023 | BenchmarkingContinual Learning | —Unverified | 0 | 0 |
| Capsa: A Unified Framework for Quantifying Risk in Deep Neural Networks | Aug 1, 2023 | Benchmarking | —Unverified | 0 | 0 |
| CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era | Mar 16, 2025 | BenchmarkingImage Captioning | —Unverified | 0 | 0 |
| Airport Capacity and Performance in Europe -- A study of transport economics, service quality and sustainability | Feb 4, 2021 | Benchmarking | —Unverified | 0 | 0 |
| Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation | Feb 10, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Can we hop in general? A discussion of benchmark selection and design using the Hopper environment | Oct 11, 2024 | BenchmarkingReinforcement Learning (RL) | —Unverified | 0 | 0 |
| Exploring the Adversarial Frontier: Quantifying Robustness via Adversarial Hypervolume | Mar 8, 2024 | Adversarial RobustnessBenchmarking | —Unverified | 0 | 0 |
| Exploring the Impact of a Transformer's Latent Space Geometry on Downstream Task Performance | Jun 18, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Visual Attention on the Sun: What Do Existing Models Actually Predict? | Nov 25, 2018 | BenchmarkingDeep Attention | —Unverified | 0 | 0 |
| Exploring Thermography Technology: A Comprehensive Facial Dataset for Face Detection, Recognition, and Emotion | May 28, 2024 | BenchmarkingEmotion Recognition | —Unverified | 0 | 0 |
| Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs | Feb 16, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Can time series forecasting be automated? A benchmark and analysis | Jul 23, 2024 | BenchmarkingDecision Making | —Unverified | 0 | 0 |
| Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning | Jun 16, 2024 | BenchmarkingMath | —Unverified | 0 | 0 |
| Can Machines “Learn” Halide Perovskite Crystal Formation without Accurate Physicochemical Features? | May 26, 2020 | Benchmarking | —Unverified | 0 | 0 |
| Extended Labeled Faces in-the-Wild (ELFW): Augmenting Classes for Face Segmentation | Jun 24, 2020 | BenchmarkingData Augmentation | —Unverified | 0 | 0 |
| Extensible Logging and Empirical Attainment Function for IOHexperimenter | Sep 28, 2021 | Benchmarking | —Unverified | 0 | 0 |
| Extraction of clinical information from the non-invasive fetal electrocardiogram | May 27, 2016 | BenchmarkingHeart Rate Variability | —Unverified | 0 | 0 |
| Extraction of Research Objectives, Machine Learning Model Names, and Dataset Names from Academic Papers and Analysis of Their Interrelationships Using LLM and Network Analysis | Aug 22, 2024 | Benchmarking | —Unverified | 0 | 0 |
| ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content | Mar 13, 2025 | BenchmarkingImage Generation | —Unverified | 0 | 0 |
| Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning | Apr 19, 2024 | Benchmarkingcounterfactual | —Unverified | 0 | 0 |
| Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates | May 28, 2025 | BenchmarkingDiversity | —Unverified | 0 | 0 |
| Face Detection on Surveillance Images | Oct 22, 2019 | BenchmarkingFace Detection | —Unverified | 0 | 0 |
| Face Morphing Attack Generation & Detection: A Comprehensive Survey | Nov 3, 2020 | BenchmarkingFace Recognition | —Unverified | 0 | 0 |
| FACT: Learning Governing Abstractions Behind Integer Sequences | Sep 20, 2022 | Benchmarking | —Unverified | 0 | 0 |
| FactLens: Benchmarking Fine-Grained Fact Verification | Nov 8, 2024 | BenchmarkingFact Verification | —Unverified | 0 | 0 |
| Factuality or Fiction? Benchmarking Modern LLMs on Ambiguous QA with Citations | Dec 23, 2024 | BenchmarkingQuestion Answering | —Unverified | 0 | 0 |
| Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets | Apr 28, 2025 | ArticlesBenchmarking | —Unverified | 0 | 0 |
| TDDBench: A Benchmark for Training data detection | Nov 5, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 | 0 |
| A Normative Framework for Benchmarking Consumer Fairness in Large Language Model Recommender System | May 3, 2024 | BenchmarkingCollaborative Filtering | —Unverified | 0 | 0 |
| FAIRification of MLC data | Nov 23, 2022 | BenchmarkingManagement | —Unverified | 0 | 0 |
| Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind | May 18, 2025 | BenchmarkingScene Understanding | —Unverified | 0 | 0 |
| FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs | Oct 25, 2024 | BenchmarkingFairness | —Unverified | 0 | 0 |
| Fairness-Aware Graph Neural Networks: A Survey | Jul 8, 2023 | BenchmarkingFairness | —Unverified | 0 | 0 |
| Fairness Index Measures to Evaluate Bias in Biometric Recognition | Jun 19, 2023 | BenchmarkingFairness | —Unverified | 0 | 0 |
| TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs | May 26, 2025 | BenchmarkingLarge Language Model | —Unverified | 0 | 0 |
| FakeWatch ElectionShield: A Benchmarking Framework to Detect Fake News for Credible US Elections | Nov 27, 2023 | ArticlesBenchmarking | —Unverified | 0 | 0 |
| TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch Videos | Apr 22, 2024 | BenchmarkingMulti-Object Tracking | —Unverified | 0 | 0 |
| Teaspoon: A comprehensive python package for topological signal processing | Oct 10, 2020 | BenchmarkingTopological Data Analysis | —Unverified | 0 | 0 |
| FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning | May 12, 2025 | 16kBenchmarking | —Unverified | 0 | 0 |
| Fantastic Questions and Where to Find Them: FairytaleQA--An Authentic Dataset for Narrative Comprehension | Nov 16, 2021 | BenchmarkingQuestion Answering | —Unverified | 0 | 0 |
| Can Language Models Serve as Text-Based World Simulators? | Jun 10, 2024 | BenchmarkingDecision Making | —Unverified | 0 | 0 |
| Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset for Narrative Comprehension | May 1, 2022 | BenchmarkingQuestion Answering | —Unverified | 0 | 0 |
| FarsBase-KBP: A Knowledge Base Population System for the Persian Knowledge Graph | May 4, 2020 | BenchmarkingEntity Linking | —Unverified | 0 | 0 |
| Can humans help BERT gain "confidence"? | Aug 31, 2023 | BenchmarkingEEG | —Unverified | 0 | 0 |
| Technical report of a DMD-based Characterization Method for Vision Sensors | Mar 4, 2025 | BenchmarkingDataset Generation | —Unverified | 0 | 0 |