| Bayesian Multi-type Mean Field Multi-agent Imitation Learning | Dec 1, 2020 | BenchmarkingImitation Learning | —Unverified | 0 |
| A Bayesian Model for Bivariate Causal Inference | Dec 24, 2018 | BenchmarkingCausal Inference | —Unverified | 0 |
| Beyond Emotion: A Multi-Modal Dataset for Human Desire Understanding | Jul 1, 2022 | Benchmarking | —Unverified | 0 |
| Beyond Emotion: A Multi-Modal Dataset for Human Desire Understanding | Jan 16, 2022 | Benchmarking | —Unverified | 0 |
| Financial Numeric Extreme Labelling: A Dataset and Benchmarking for XBRL Tagging | Jun 6, 2023 | BenchmarkingSentence | —Unverified | 0 |
| Findings of the Shared Task on Offensive Language Identification in Tamil, Malayalam, and Kannada | Apr 1, 2021 | BenchmarkingLanguage Identification | —Unverified | 0 |
| AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous Driving | Sep 12, 2023 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models | Apr 14, 2025 | BenchmarkingDescriptive | —Unverified | 0 |
| Beyond Black-Box Benchmarking: Observability, Analytics, and Optimization of Agentic Systems | Mar 9, 2025 | Benchmarking | —Unverified | 0 |
| Finance Language Model Evaluation (FLaME) | Jun 18, 2025 | BenchmarkingLanguage Model Evaluation | —Unverified | 0 |
| Beyond Benchmarks: On The False Promise of AI Regulation | Jan 26, 2025 | Benchmarking | —Unverified | 0 |
| Beyond Benchmarking: A New Paradigm for Evaluation and Assessment of Large Language Models | Jul 10, 2024 | Benchmarking | —Unverified | 0 |
| Active Learning for Community Detection in Stochastic Block Models | May 8, 2016 | Active LearningBenchmarking | —Unverified | 0 |
| Filter Methods for Feature Selection in Supervised Machine Learning Applications -- Review and Benchmark | Nov 23, 2021 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 |
| Fine-Grained Classification of Pedestrians in Video: Benchmark and State of the Art | May 20, 2016 | BenchmarkingGeneral Classification | —Unverified | 0 |
| FISBe: A Real-World Benchmark Dataset for Instance Segmentation of Long-Range Thin Filamentous Structures | Jan 1, 2024 | BenchmarkingInstance Segmentation | —Unverified | 0 |
| Better Practices for Domain Adaptation | Sep 7, 2023 | BenchmarkingDomain Adaptation | —Unverified | 0 |
| Barkour: Benchmarking Animal-level Agility with Quadruped Robots | May 24, 2023 | BenchmarkingNavigate | —Unverified | 0 |
| Active Evaluation Acquisition for Efficient LLM Benchmarking | Oct 8, 2024 | Benchmarking | —Unverified | 0 |
| AMLgentex: Mobilizing Data-Driven Research to Combat Money Laundering | Jun 3, 2025 | Benchmarking | —Unverified | 0 |
| FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding | Nov 16, 2021 | BenchmarkingNatural Language Understanding | —Unverified | 0 |
| Few-Shot Defect Segmentation Leveraging Abundant Normal Training Samples Through Normal Background Regularization and Crop-and-Paste Operation | Jul 18, 2020 | Anomaly DetectionBenchmarking | —Unverified | 0 |
| Better Bill GPT: Comparing Large Language Models against Legal Invoice Reviewers | Apr 2, 2025 | BenchmarkingManagement | —Unverified | 0 |
| BestServe: Serving Strategies with Optimal Goodput in Collocation and Disaggregation Architectures | Jun 6, 2025 | BenchmarkingCPU | —Unverified | 0 |
| BanglaNLP at BLP-2023 Task 1: Benchmarking different Transformer Models for Violence Inciting Text Detection in Bengali | Oct 16, 2023 | BenchmarkingData Augmentation | —Unverified | 0 |