| BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games | Nov 20, 2024 | BenchmarkingNetHack | —Unverified | 0 |
| Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization | Apr 14, 2025 | BenchmarkingEarth Observation | —Unverified | 0 |
| A Meta-Engine Framework for Interleaved Task and Motion Planning using Topological Refinements | Aug 11, 2024 | BenchmarkingMotion Planning | —Unverified | 0 |
| BERT-GT: Cross-sentence n-ary relation extraction with BERT and Graph Transformer | Jan 11, 2021 | BenchmarkingBinary Relation Extraction | —Unverified | 0 |
| A Benchmark Dataset and Saliency-guided Stacked Autoencoders for Video-based Salient Object Detection | Nov 1, 2016 | BenchmarkingObject | —Unverified | 0 |
| BERT-based Chinese Text Classification for Emergency Domain with a Novel Loss Function | Apr 9, 2021 | BenchmarkingGeneral Classification | —Unverified | 0 |
| Balanced Random Survival Forests for Extremely Unbalanced, Right Censored Data | Mar 24, 2018 | BenchmarkingPrediction | —Unverified | 0 |
| Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets | May 16, 2025 | BenchmarkingKnowledge Graphs | —Unverified | 0 |
| Benefits and Challenges of Dynamic Modelling of Cascading Failures in Power Systems | Jul 7, 2022 | Benchmarking | —Unverified | 0 |
| BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving | Mar 6, 2024 | Automated Theorem ProvingBenchmarking | —Unverified | 0 |
| Bench to the Future: A Pastcasting Benchmark for Forecasting Agents | Jun 11, 2025 | Benchmarking | —Unverified | 0 |
| A Metadata-Driven Approach to Understand Graph Neural Networks | Oct 30, 2023 | BenchmarkingGraph Learning | —Unverified | 0 |
| Foundations for learning from noisy quantum experiments | Apr 28, 2022 | Benchmarking | —Unverified | 0 |
| BenchMARL: Benchmarking Multi-Agent Reinforcement Learning | Dec 3, 2023 | BenchmarkingMulti-agent Reinforcement Learning | —Unverified | 0 |
| BAGELS: Benchmarking the Automated Generation and Extraction of Limitations from Scholarly Text | May 22, 2025 | BenchmarkingRAG | —Unverified | 0 |
| ACT-Bench: Towards Action Controllable World Models for Autonomous Driving | Dec 6, 2024 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Benchmarks as Microscopes: A Call for Model Metrology | Jul 22, 2024 | Benchmarkingmodel | —Unverified | 0 |
| Formal Covariate Benchmarking to Bound Omitted Variable Bias | Jun 18, 2023 | BenchmarkingSensitivity | —Unverified | 0 |
| Benchmark of Segmentation Techniques for Pelvic Fracture in CT and X-ray: Summary of the PENGWIN 2024 Challenge | Apr 3, 2025 | AnatomyBenchmarking | —Unverified | 0 |
| FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents | Jun 2, 2025 | BenchmarkingForm | —Unverified | 0 |
| Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods | May 2, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking zero-shot stance detection with FlanT5-XXL: Insights from training data, prompting, and decoding strategies into its near-SoTA performance | Mar 1, 2024 | BenchmarkingStance Detection | —Unverified | 0 |
| ALT: A Python Package for Lightweight Feature Representation in Time Series Classification | Apr 17, 2025 | BenchmarkingTime Series | —Unverified | 0 |
| FOR-instance: a UAV laser scanning benchmark dataset for semantic and instance segmentation of individual trees | Sep 3, 2023 | BenchmarkingInstance Segmentation | —Unverified | 0 |
| Benchmarking zero-shot and few-shot approaches for tokenization, tagging, and dependency parsing of Tagalog text | Aug 3, 2022 | BenchmarkingData Augmentation | —Unverified | 0 |
| Benchmarking YOLOv8 for Optimal Crack Detection in Civil Infrastructure | Jan 12, 2025 | BenchmarkingHyperparameter Optimization | —Unverified | 0 |
| AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs | Jun 5, 2025 | BenchmarkingVideo Understanding | —Unverified | 0 |
| Benchmarking XAI Explanations with Human-Aligned Evaluations | Nov 4, 2024 | Benchmarking | —Unverified | 0 |
| A critical look at the current train/test split in machine learning | Jun 8, 2021 | Active LearningBenchmarking | —Unverified | 0 |
| Forecasting NIFTY 50 benchmark Index using Seasonal ARIMA time series models | Jan 9, 2020 | BenchmarkingTime Series | —Unverified | 0 |
| FORLAPS: An Innovative Data-Driven Reinforcement Learning Approach for Prescriptive Process Monitoring | Jan 17, 2025 | BenchmarkingData Augmentation | —Unverified | 0 |
| Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate | May 28, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking with MIMIC-IV, an irregular, spare clinical time series dataset | Jan 27, 2024 | BenchmarkingTime Series | —Unverified | 0 |
| A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval | Nov 30, 2023 | BenchmarkingRetrieval | —Unverified | 0 |
| Alpha Excel Benchmark | May 7, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking Waitlist Mortality Prediction in Heart Transplantation Through Time-to-Event Modeling using New Longitudinal UNOS Dataset | Jul 9, 2025 | BenchmarkingDecision Making | —Unverified | 0 |
| Benchmarking VLMs' Reasoning About Persuasive Atypical Images | Sep 16, 2024 | BenchmarkingObject Recognition | —Unverified | 0 |
| A Bayesian Committee Machine Potential for Oxygen-containing Organic Compounds | Mar 2, 2024 | BenchmarkingPosition | —Unverified | 0 |
| Benchmarking Visual-Inertial Deep Multimodal Fusion for Relative Pose Regression and Odometry-aided Absolute Pose Regression | Aug 1, 2022 | Benchmarkingregression | —Unverified | 0 |
| AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels | Aug 30, 2022 | Benchmarking | —Unverified | 0 |
| Benchmarking Vision Language Models on German Factual Data | Apr 15, 2025 | Benchmarking | —Unverified | 0 |
| Auto-tuning TensorFlow Threading Model for CPU Backend | Dec 4, 2018 | BenchmarkingCPU | —Unverified | 0 |
| ForamViT-GAN: Exploring New Paradigms in Deep Learning for Micropaleontological Image Analysis | Apr 9, 2023 | BenchmarkingDeep Learning | —Unverified | 0 |
| Benchmarking Vision Language Models for Cultural Understanding | Jul 15, 2024 | BenchmarkingQuestion Answering | —Unverified | 0 |
| ALP: Action-Aware Embodied Learning for Perception | Jun 16, 2023 | Benchmarkingobject-detection | —Unverified | 0 |
| Autoregressive Stochastic Clock Jitter Compensation in Analog-to-Digital Converters | May 8, 2025 | Benchmarking | —Unverified | 0 |
| A critical analysis of metrics used for measuring progress in artificial intelligence | Aug 6, 2020 | Benchmarking | —Unverified | 0 |
| Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving | Jan 14, 2025 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Benchmarking Vision-Based Object Tracking for USVs in Complex Maritime Environments | Dec 10, 2024 | Benchmarkingobject-detection | —Unverified | 0 |
| Benchmarking Video Frame Interpolation | Mar 25, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |