| Transcending the Attention Paradigm: Representation Learning from Geospatial Social Media Data | Oct 9, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE Corpus | Oct 8, 2023 | BenchmarkingMachine Translation | CodeCode Available | 0 |
| Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems | Oct 8, 2023 | Benchmarking | CodeCode Available | 0 |
| Simple GNNs with Low Rank Non-parametric Aggregators | Oct 8, 2023 | BenchmarkingNode Classification | CodeCode Available | 0 |
| Benchmarking Large Language Models with Augmented Instructions for Fine-grained Information Extraction | Oct 8, 2023 | BenchmarkingDecoder | —Unverified | 0 |
| FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets | Oct 7, 2023 | Benchmarkingnamed-entity-recognition | —Unverified | 0 |
| Beyond Text: A Deep Dive into Large Language Models' Ability on Understanding Graph Data | Oct 7, 2023 | Benchmarking | —Unverified | 0 |
| Full-scale modal testing of a Hawk T1A aircraft for benchmarking vibration-based methods | Oct 6, 2023 | BenchmarkingExperimental Design | —Unverified | 0 |
| CIFAR-10-Warehouse: Broad and More Realistic Testbeds in Model Generalization Analysis | Oct 6, 2023 | BenchmarkingDomain Generalization | —Unverified | 0 |
| LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation | Oct 6, 2023 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| AKFruitYield: Modular benchmarking and video analysis software for Azure Kinect cameras for fruit size and fruit yield estimation in apple orchards | Oct 6, 2023 | Benchmarking | CodeCode Available | 0 |
| Bringing Quantum Algorithms to Automated Machine Learning: A Systematic Review of AutoML Frameworks Regarding Extensibility for QML Algorithms | Oct 6, 2023 | AutoMLBenchmarking | —Unverified | 0 |
| Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning | Oct 6, 2023 | BenchmarkingFederated Learning | —Unverified | 0 |
| Benchmarking a foundation LLM on its ability to re-label structure names in accordance with the AAPM TG-263 report | Oct 5, 2023 | Benchmarking | —Unverified | 0 |
| A Review of Deep Reinforcement Learning in Serverless Computing: Function Scheduling and Resource Auto-Scaling | Oct 5, 2023 | BenchmarkingDeep Reinforcement Learning | —Unverified | 0 |
| From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference | Oct 4, 2023 | BenchmarkingGPU | —Unverified | 0 |
| Fully Automatic Segmentation of Gross Target Volume and Organs-at-Risk for Radiotherapy Planning of Nasopharyngeal Carcinoma | Oct 4, 2023 | BenchmarkingSegmentation | CodeCode Available | 0 |
| Deep Reinforcement Learning Algorithms for Hybrid V2X Communication: A Benchmarking Study | Oct 4, 2023 | Autonomous VehiclesBenchmarking | —Unverified | 0 |
| On the Performance of Multimodal Language Models | Oct 4, 2023 | BenchmarkingBinary Classification | —Unverified | 0 |
| EGraFFBench: Evaluation of Equivariant Graph Neural Network Force Fields for Atomistic Simulations | Oct 3, 2023 | Atomic ForcesBenchmarking | —Unverified | 0 |
| Learning Quantum Processes with Quantum Statistical Queries | Oct 3, 2023 | BenchmarkingCryptanalysis | CodeCode Available | 0 |
| EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods | Oct 3, 2023 | Benchmarkingtext-guided-image-editing | —Unverified | 0 |
| Benchmarking and Improving Generator-Validator Consistency of Language Models | Oct 3, 2023 | BenchmarkingInstruction Following | —Unverified | 0 |
| CoDBench: A Critical Evaluation of Data-driven Models for Continuous Dynamical Systems | Oct 2, 2023 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| A New Real-World Video Dataset for the Comparison of Defogging Algorithms | Oct 2, 2023 | BenchmarkingDeblurring | —Unverified | 0 |
| TRAM: Benchmarking Temporal Reasoning for Large Language Models | Oct 2, 2023 | BenchmarkingFew-Shot Learning | —Unverified | 0 |
| Adaptive Visual Scene Understanding: Incremental Scene Graph Generation | Oct 2, 2023 | BenchmarkingContinual Learning | CodeCode Available | 0 |
| The Sparsity Roofline: Understanding the Hardware Limits of Sparse Neural Networks | Sep 30, 2023 | Benchmarking | —Unverified | 0 |
| Adaptive Control of an Inverted Pendulum by a Reinforcement Learning-based LQR Method | Sep 30, 2023 | BenchmarkingReinforcement Learning (RL) | —Unverified | 0 |
| Benchmarking Collaborative Learning Methods Cost-Effectiveness for Prostate Segmentation | Sep 29, 2023 | BenchmarkingFederated Learning | —Unverified | 0 |
| A rigorous benchmarking of methods for SARS-CoV-2 lineage abundance estimation in wastewater | Sep 29, 2023 | Benchmarking | —Unverified | 0 |
| Intuitive or Dependent? Investigating LLMs' Behavior Style to Conflicting Prompts | Sep 29, 2023 | BenchmarkingDecision Making | —Unverified | 0 |
| Sarcasm in Sight and Sound: Benchmarking and Expansion to Improve Multimodal Sarcasm Detection | Sep 29, 2023 | BenchmarkingDiversity | —Unverified | 0 |
| Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors | Sep 29, 2023 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Optimizing with Low Budgets: a Comparison on the Black-box Optimization Benchmarking Suite and OpenAI Gym | Sep 29, 2023 | Bayesian OptimizationBenchmarking | —Unverified | 0 |
| Language Models as a Service: Overview of a New Paradigm and its Challenges | Sep 28, 2023 | Benchmarking | —Unverified | 0 |
| Demographic Parity: Mitigating Biases in Real-World Data | Sep 27, 2023 | Benchmarking | —Unverified | 0 |
| On quantifying and improving realism of images generated with diffusion | Sep 26, 2023 | AttributeBenchmarking | —Unverified | 0 |
| Advancing The Rate-Distortion-Computation Frontier For Neural Image Compression | Sep 26, 2023 | BenchmarkingImage Compression | —Unverified | 0 |
| Thalamic nuclei segmentation from T_1-weighted MRI: unifying and benchmarking state-of-the-art methods with young and old cohorts | Sep 26, 2023 | BenchmarkingSegmentation | —Unverified | 0 |
| Optimization Techniques for a Physical Model of Human Vocalisation | Sep 26, 2023 | Benchmarking | —Unverified | 0 |
| Efficient Pauli channel estimation with logarithmic quantum memory | Sep 25, 2023 | Benchmarking | —Unverified | 0 |
| VisionKG: Unleashing the Power of Visual Datasets via Knowledge Graph | Sep 24, 2023 | BenchmarkingKnowledge Graphs | —Unverified | 0 |
| Categorization and analysis of 14 computational methods for estimating cell potency from single-cell RNA-seq data | Sep 24, 2023 | Benchmarking | —Unverified | 0 |
| Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence | Sep 24, 2023 | BenchmarkingChange Detection | CodeCode Available | 0 |
| Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data | Sep 23, 2023 | BenchmarkingSuper-Resolution | —Unverified | 0 |
| Domain Adaptation for Arabic Machine Translation: The Case of Financial Texts | Sep 22, 2023 | ArticlesBenchmarking | —Unverified | 0 |
| Multimodal Deep Learning for Scientific Imaging Interpretation | Sep 21, 2023 | ArticlesBenchmarking | —Unverified | 0 |
| Benchmarking quantized LLaMa-based models on the Brazilian Secondary School Exam | Sep 21, 2023 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| On the relationship between Benchmarking, Standards and Certification in Robotics and AI | Sep 21, 2023 | Benchmarking | —Unverified | 0 |