| CRITERIA: a New Benchmarking Paradigm for Evaluating Trajectory Prediction Models for Autonomous Driving | Oct 11, 2023 | Autonomous DrivingBenchmarking | CodeCode Available | 3 |
| Deep Reinforcement Learning for Autonomous Cyber Defence: A Survey | Oct 11, 2023 | BenchmarkingDeep Reinforcement Learning | —Unverified | 0 |
| FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms for Federated Learning | Oct 11, 2023 | BenchmarkingDiversity | —Unverified | 0 |
| Transformers for Green Semantic Communication: Less Energy, More Semantics | Oct 11, 2023 | BenchmarkingCPU | CodeCode Available | 0 |
| Hypergraph Neural Networks through the Lens of Message Passing: A Common Perspective to Homophily and Architecture Design | Oct 11, 2023 | BenchmarkingRepresentation Learning | —Unverified | 0 |
| Risk Aware Benchmarking of Large Language Models | Oct 11, 2023 | BenchmarkingEconometrics | —Unverified | 0 |
| Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms | Oct 11, 2023 | BenchmarkingDenoising | —Unverified | 0 |
| ProbTS: Benchmarking Point and Distributional Forecasting across Diverse Prediction Horizons | Oct 11, 2023 | BenchmarkingPosition | CodeCode Available | 2 |
| BeSt-LeS: Benchmarking Stroke Lesion Segmentation using Deep Supervision | Oct 10, 2023 | Acute Stroke Lesion SegmentationBenchmarking | CodeCode Available | 0 |
| CAFA-evaluator: A Python Tool for Benchmarking Ontological Classification Methods | Oct 10, 2023 | BenchmarkingPrediction | —Unverified | 0 |
| What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models | Oct 10, 2023 | BenchmarkingCode Generation | CodeCode Available | 1 |
| Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach | Oct 10, 2023 | BenchmarkingCode Generation | CodeCode Available | 1 |
| On the Evaluation and Refinement of Vision-Language Instruction Tuning Datasets | Oct 10, 2023 | AllBenchmarking | —Unverified | 0 |
| Distributed Evolution Strategies with Multi-Level Learning for Large-Scale Black-Box Optimization | Oct 9, 2023 | Benchmarking | —Unverified | 0 |
| Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis | Oct 9, 2023 | BenchmarkingMultivariate Time Series Forecasting | CodeCode Available | 3 |
| Transcending the Attention Paradigm: Representation Learning from Geospatial Social Media Data | Oct 9, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| Simple GNNs with Low Rank Non-parametric Aggregators | Oct 8, 2023 | BenchmarkingNode Classification | CodeCode Available | 0 |
| Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE Corpus | Oct 8, 2023 | BenchmarkingMachine Translation | CodeCode Available | 0 |
| Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems | Oct 8, 2023 | Benchmarking | CodeCode Available | 0 |
| Benchmarking Large Language Models with Augmented Instructions for Fine-grained Information Extraction | Oct 8, 2023 | BenchmarkingDecoder | —Unverified | 0 |
| FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets | Oct 7, 2023 | Benchmarkingnamed-entity-recognition | —Unverified | 0 |
| Beyond Text: A Deep Dive into Large Language Models' Ability on Understanding Graph Data | Oct 7, 2023 | Benchmarking | —Unverified | 0 |
| AKFruitYield: Modular benchmarking and video analysis software for Azure Kinect cameras for fruit size and fruit yield estimation in apple orchards | Oct 6, 2023 | Benchmarking | CodeCode Available | 0 |
| Full-scale modal testing of a Hawk T1A aircraft for benchmarking vibration-based methods | Oct 6, 2023 | BenchmarkingExperimental Design | —Unverified | 0 |
| LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation | Oct 6, 2023 | BenchmarkingMathematical Reasoning | —Unverified | 0 |