| Beyond the Hype: Benchmarking LLM-Evolved Heuristics for Bin Packing | Jan 20, 2025 | BenchmarkingEvolutionary Algorithms | —Unverified | 0 |
| A CUDA-Based Real Parameter Optimization Benchmark | Jul 29, 2014 | BenchmarkingCPU | —Unverified | 0 |
| Beyond Text: A Deep Dive into Large Language Models' Ability on Understanding Graph Data | Oct 7, 2023 | Benchmarking | —Unverified | 0 |
| BEADs: Bias Evaluation Across Domains | Jun 6, 2024 | BenchmarkingBias Detection | —Unverified | 0 |
| Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency | Jan 30, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets | Oct 7, 2023 | Benchmarkingnamed-entity-recognition | —Unverified | 0 |
| FinTMMBench: Benchmarking Temporal-Aware Multi-Modal RAG in Finance | Mar 7, 2025 | ArticlesBenchmarking | —Unverified | 0 |
| Energy Models for Better Pseudo-Labels: Improving Semi-Supervised Classification with the 1-Laplacian Graph Energy | Jun 20, 2019 | BenchmarkingMulti-class Classification | —Unverified | 0 |
| Beyond Static Models and Test Sets: Benchmarking the Potential of Pre-trained Models Across Tasks and Languages | May 12, 2022 | BenchmarkingDiversity | —Unverified | 0 |
| Beyond Specialization: Benchmarking LLMs for Transliteration of Indian Languages | May 26, 2025 | BenchmarkingTransliteration | —Unverified | 0 |
| BEACON: A Benchmark for Efficient and Accurate Counting of Subgraphs | Apr 15, 2025 | BenchmarkingSubgraph Counting | —Unverified | 0 |
| FIMP: Foundation Model-Informed Message Passing for Graph Neural Networks | Oct 17, 2022 | BenchmarkingGraph Neural Network | —Unverified | 0 |
| FineText: Text Classification via Attention-based Language Model Fine-tuning | Oct 25, 2019 | BenchmarkingClassification | —Unverified | 0 |
| Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms | Mar 1, 2024 | BenchmarkingStochastic Optimization | —Unverified | 0 |
| Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems | Feb 20, 2025 | BenchmarkingDecision Making | —Unverified | 0 |
| ActPlan-1K: Benchmarking the Procedural Planning Ability of Visual Language Models in Household Activities | Oct 4, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| BBOB Instance Analysis: Landscape Properties and Algorithm Performance across Problem Instances | Nov 29, 2022 | Benchmarking | —Unverified | 0 |
| A Benchmark for Multi-speaker Anonymization | Jul 8, 2024 | BenchmarkingDisentanglement | —Unverified | 0 |
| FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking | Apr 2, 2025 | 3D Scene ReconstructionBenchmarking | —Unverified | 0 |
| A Modular Framework for Centrality and Clustering in Complex Networks | Nov 23, 2021 | BenchmarkingClustering | —Unverified | 0 |
| Beyond Monocular Deraining: Stereo Image Deraining via Semantic Understanding | Aug 1, 2020 | BenchmarkingRain Removal | —Unverified | 0 |
| Beyond Monocular Deraining: Parallel Stereo Deraining Network Via Semantic Prior | May 9, 2021 | BenchmarkingRain Removal | —Unverified | 0 |
| Bayesian Neural Networks at Scale: A Performance Analysis and Pruning Study | May 23, 2020 | BenchmarkingNetwork Pruning | —Unverified | 0 |
| SPINEX-TimeSeries: Similarity-based Predictions with Explainable Neighbors Exploration for Time Series and Forecasting Problems | Aug 4, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks | Jul 29, 2024 | BenchmarkingLanguage Model Evaluation | —Unverified | 0 |
| Bayesian Multi-type Mean Field Multi-agent Imitation Learning | Dec 1, 2020 | BenchmarkingImitation Learning | —Unverified | 0 |
| A Bayesian Model for Bivariate Causal Inference | Dec 24, 2018 | BenchmarkingCausal Inference | —Unverified | 0 |
| Beyond Emotion: A Multi-Modal Dataset for Human Desire Understanding | Jul 1, 2022 | Benchmarking | —Unverified | 0 |
| Beyond Emotion: A Multi-Modal Dataset for Human Desire Understanding | Jan 16, 2022 | Benchmarking | —Unverified | 0 |
| Financial Numeric Extreme Labelling: A Dataset and Benchmarking for XBRL Tagging | Jun 6, 2023 | BenchmarkingSentence | —Unverified | 0 |
| Findings of the Shared Task on Offensive Language Identification in Tamil, Malayalam, and Kannada | Apr 1, 2021 | BenchmarkingLanguage Identification | —Unverified | 0 |
| AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous Driving | Sep 12, 2023 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models | Apr 14, 2025 | BenchmarkingDescriptive | —Unverified | 0 |
| Beyond Black-Box Benchmarking: Observability, Analytics, and Optimization of Agentic Systems | Mar 9, 2025 | Benchmarking | —Unverified | 0 |
| Finance Language Model Evaluation (FLaME) | Jun 18, 2025 | BenchmarkingLanguage Model Evaluation | —Unverified | 0 |
| Beyond Benchmarks: On The False Promise of AI Regulation | Jan 26, 2025 | Benchmarking | —Unverified | 0 |
| Beyond Benchmarking: A New Paradigm for Evaluation and Assessment of Large Language Models | Jul 10, 2024 | Benchmarking | —Unverified | 0 |
| Active Learning for Community Detection in Stochastic Block Models | May 8, 2016 | Active LearningBenchmarking | —Unverified | 0 |
| Filter Methods for Feature Selection in Supervised Machine Learning Applications -- Review and Benchmark | Nov 23, 2021 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 |
| Fine-Grained Classification of Pedestrians in Video: Benchmark and State of the Art | May 20, 2016 | BenchmarkingGeneral Classification | —Unverified | 0 |
| FISBe: A Real-World Benchmark Dataset for Instance Segmentation of Long-Range Thin Filamentous Structures | Jan 1, 2024 | BenchmarkingInstance Segmentation | —Unverified | 0 |
| Better Practices for Domain Adaptation | Sep 7, 2023 | BenchmarkingDomain Adaptation | —Unverified | 0 |
| Barkour: Benchmarking Animal-level Agility with Quadruped Robots | May 24, 2023 | BenchmarkingNavigate | —Unverified | 0 |
| Active Evaluation Acquisition for Efficient LLM Benchmarking | Oct 8, 2024 | Benchmarking | —Unverified | 0 |
| AMLgentex: Mobilizing Data-Driven Research to Combat Money Laundering | Jun 3, 2025 | Benchmarking | —Unverified | 0 |
| FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding | Nov 16, 2021 | BenchmarkingNatural Language Understanding | —Unverified | 0 |
| Few-Shot Defect Segmentation Leveraging Abundant Normal Training Samples Through Normal Background Regularization and Crop-and-Paste Operation | Jul 18, 2020 | Anomaly DetectionBenchmarking | —Unverified | 0 |
| Better Bill GPT: Comparing Large Language Models against Legal Invoice Reviewers | Apr 2, 2025 | BenchmarkingManagement | —Unverified | 0 |
| BestServe: Serving Strategies with Optimal Goodput in Collocation and Disaggregation Architectures | Jun 6, 2025 | BenchmarkingCPU | —Unverified | 0 |
| BanglaNLP at BLP-2023 Task 1: Benchmarking different Transformer Models for Violence Inciting Text Detection in Bengali | Oct 16, 2023 | BenchmarkingData Augmentation | —Unverified | 0 |