| GPTs and Language Barrier: A Cross-Lingual Legal QA Examination | Mar 26, 2024 | ArticlesBenchmarking | —Unverified | 0 | 0 |
| Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models | Apr 14, 2025 | BenchmarkingDescriptive | —Unverified | 0 | 0 |
| Beyond Black-Box Benchmarking: Observability, Analytics, and Optimization of Agentic Systems | Mar 9, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Variational Laplace for Bayesian neural networks | Nov 20, 2020 | BenchmarkingVariational Inference | —Unverified | 0 | 0 |
| Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities | May 13, 2025 | automatic-speech-translationBenchmarking | —Unverified | 0 | 0 |
| Granular Change Accuracy: A More Accurate Performance Metric for Dialogue State Tracking | Mar 17, 2024 | BenchmarkingDialogue State Tracking | —Unverified | 0 | 0 |
| Graph Alignment for Benchmarking Graph Neural Networks and Learning Positional Encodings | May 19, 2025 | BenchmarkingCombinatorial Optimization | —Unverified | 0 | 0 |
| Beyond Benchmarks: On The False Promise of AI Regulation | Jan 26, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Graph Attention-based Decentralized Actor-Critic for Dual-Objective Control of Multi-UAV Swarms | Jun 10, 2025 | BenchmarkingGraph Attention | —Unverified | 0 | 0 |
| Graph-based Deep-Tree Recursive Neural Network (DTRNN) for Text Classification | Sep 4, 2018 | BenchmarkingGeneral Classification | —Unverified | 0 | 0 |
| Graph-based Prediction and Planning Policy Network (GP3Net) for scalable self-driving in dynamic environments using Deep Reinforcement Learning | Dec 10, 2023 | Autonomous VehiclesBenchmarking | —Unverified | 0 | 0 |
| Graph clustering with Boltzmann machines | Mar 4, 2022 | BenchmarkingClustering | —Unverified | 0 | 0 |
| A Benchmark Dataset and Saliency-guided Stacked Autoencoders for Video-based Salient Object Detection | Nov 1, 2016 | BenchmarkingObject | —Unverified | 0 | 0 |
| GraphEval2000: Benchmarking and Improving Large Language Models on Graph Datasets | Jun 23, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Beyond Benchmarking: A New Paradigm for Evaluation and Assessment of Large Language Models | Jul 10, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Label Efficient Regularization and Propagation for Graph Node Classification | Apr 19, 2022 | AttributeBenchmarking | —Unverified | 0 | 0 |
| Graph Joint Attention Networks | Sep 28, 2020 | BenchmarkingGraph Attention | —Unverified | 0 | 0 |
| A Bayesian Committee Machine Potential for Oxygen-containing Organic Compounds | Mar 2, 2024 | BenchmarkingPosition | —Unverified | 0 | 0 |
| GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra | Mar 5, 2021 | BenchmarkingGraph Mining | —Unverified | 0 | 0 |
| Better Practices for Domain Adaptation | Sep 7, 2023 | BenchmarkingDomain Adaptation | —Unverified | 0 | 0 |
| 1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation | Jun 8, 2024 | BenchmarkingInstance Segmentation | —Unverified | 0 | 0 |
| Better Bill GPT: Comparing Large Language Models against Legal Invoice Reviewers | Apr 2, 2025 | BenchmarkingManagement | —Unverified | 0 | 0 |
| BestServe: Serving Strategies with Optimal Goodput in Collocation and Disaggregation Architectures | Jun 6, 2025 | BenchmarkingCPU | —Unverified | 0 | 0 |
| The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal | Sep 12, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| Best Practices in Pool-based Active Learning for Image Classification | Sep 29, 2021 | Active LearningBenchmarking | —Unverified | 0 | 0 |
| Abasy Atlas v2.2: The most comprehensive and up-to-date inventory of meta-curated, historical, bacterial regulatory networks, their completeness and system-level characterization | May 5, 2020 | Benchmarking | —Unverified | 0 | 0 |
| The Convergent Ethics of AI? Analyzing Moral Foundation Priorities in Large Language Models with a Multi-Framework Approach | Apr 27, 2025 | BenchmarkingDecision Making | —Unverified | 0 | 0 |
| BERT-GT: Cross-sentence n-ary relation extraction with BERT and Graph Transformer | Jan 11, 2021 | BenchmarkingBinary Relation Extraction | —Unverified | 0 | 0 |
| Greening AI-enabled Systems with Software Engineering: A Research Agenda for Environmentally Sustainable AI Practices | Jun 2, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Grid Search Hyperparameter Benchmarking of BERT, ALBERT, and LongFormer on DuoRC | Jan 15, 2021 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| BERT-based Chinese Text Classification for Emergency Domain with a Novel Loss Function | Apr 9, 2021 | BenchmarkingGeneral Classification | —Unverified | 0 | 0 |
| AgoraSpeech: A multi-annotated comprehensive dataset of political discourse through the lens of humans and AI | Jan 9, 2025 | Benchmarkingnamed-entity-recognition | —Unverified | 0 | 0 |
| Benefits and Challenges of Dynamic Modelling of Cascading Failures in Power Systems | Jul 7, 2022 | Benchmarking | —Unverified | 0 | 0 |
| AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models | Sep 5, 2023 | BenchmarkingZero-Shot Learning | —Unverified | 0 | 0 |
| Bench to the Future: A Pastcasting Benchmark for Forecasting Agents | Jun 11, 2025 | Benchmarking | —Unverified | 0 | 0 |
| BenchMARL: Benchmarking Multi-Agent Reinforcement Learning | Dec 3, 2023 | BenchmarkingMulti-agent Reinforcement Learning | —Unverified | 0 | 0 |
| gSuite: A Flexible and Framework Independent Benchmark Suite for Graph Neural Network Inference on GPUs | Oct 20, 2022 | BenchmarkingComputational Efficiency | —Unverified | 0 | 0 |
| GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation | Jul 8, 2024 | BenchmarkingGraph Embedding | —Unverified | 0 | 0 |
| Benchmarks as Microscopes: A Call for Model Metrology | Jul 22, 2024 | Benchmarkingmodel | —Unverified | 0 | 0 |
| The Curious Case of Integrator Reach Sets, Part I: Basic Theory | Feb 23, 2021 | Benchmarking | —Unverified | 0 | 0 |
| Guidelines for Fine-grained Sentence-level Arabic Readability Annotation | Oct 11, 2024 | BenchmarkingSentence | —Unverified | 0 | 0 |
| Guidelines for the Quality Assessment of Energy-Aware NAS Benchmarks | May 21, 2025 | BenchmarkingGPU | —Unverified | 0 | 0 |
| Benchmark of Segmentation Techniques for Pelvic Fracture in CT and X-ray: Summary of the PENGWIN 2024 Challenge | Apr 3, 2025 | AnatomyBenchmarking | —Unverified | 0 | 0 |
| Benchmarking zero-shot stance detection with FlanT5-XXL: Insights from training data, prompting, and decoding strategies into its near-SoTA performance | Mar 1, 2024 | BenchmarkingStance Detection | —Unverified | 0 | 0 |
| VoiceWukong: Benchmarking Deepfake Voice Detection | Sep 10, 2024 | BenchmarkingFace Swapping | —Unverified | 0 | 0 |
| h4rm3l: A language for Composable Jailbreak Attack Synthesis | Aug 9, 2024 | BenchmarkingProgram Synthesis | —Unverified | 0 | 0 |
| Benchmarking zero-shot and few-shot approaches for tokenization, tagging, and dependency parsing of Tagalog text | Aug 3, 2022 | BenchmarkingData Augmentation | —Unverified | 0 | 0 |
| Benchmarking YOLOv8 for Optimal Crack Detection in Civil Infrastructure | Jan 12, 2025 | BenchmarkingHyperparameter Optimization | —Unverified | 0 | 0 |
| AgentRecBench: Benchmarking LLM Agent-based Personalized Recommender Systems | May 26, 2025 | BenchmarkingRecommendation Systems | —Unverified | 0 | 0 |
| HandCraft: Anatomically Correct Restoration of Malformed Hands in Diffusion Generated Images | Nov 7, 2024 | AnatomyBenchmarking | —Unverified | 0 | 0 |