| Large Language Models as Automated Aligners for benchmarking Vision-Language Models | Nov 24, 2023 | BenchmarkingWorld Knowledge | —Unverified | 0 |
| An Empirical Investigation into Benchmarking Model Multiplicity for Trustworthy Machine Learning: A Case Study on Image Classification | Nov 24, 2023 | Benchmarkingimage-classification | —Unverified | 0 |
| Dialogue Quality and Emotion Annotations for Customer Support Conversations | Nov 23, 2023 | BenchmarkingDiversity | CodeCode Available | 0 |
| Learning Dynamic Selection and Pricing of Out-of-Home Deliveries | Nov 23, 2023 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Automated 3D Tumor Segmentation using Temporal Cubic PatchGAN (TCuP-GAN) | Nov 23, 2023 | BenchmarkingBrain Tumor Segmentation | —Unverified | 0 |
| Creating and Leveraging a Synthetic Dataset of Cloud Optical Thickness Measures for Cloud Detection in MSI | Nov 23, 2023 | BenchmarkingCloud Detection | CodeCode Available | 0 |
| A projected nonlinear state-space model for forecasting time series signals | Nov 22, 2023 | BenchmarkingComputational Efficiency | CodeCode Available | 0 |
| Benchmarking Toxic Molecule Classification using Graph Neural Networks and Few Shot Learning | Nov 22, 2023 | BenchmarkingDrug Discovery | —Unverified | 0 |
| Benchmarking bias: Expanding clinical AI model card to incorporate bias reporting of social and non-social factors | Nov 21, 2023 | Benchmarking | —Unverified | 0 |
| Deep State-Space Model for Predicting Cryptocurrency Price | Nov 21, 2023 | BenchmarkingUncertainty Quantification | —Unverified | 0 |
| Segment Together: A Versatile Paradigm for Semi-Supervised Medical Image Segmentation | Nov 20, 2023 | BenchmarkingImage Segmentation | —Unverified | 0 |
| Demonstrating Almost Linear Time Complexity of Bus Admittance Matrix-Based Distribution Network Power Flow: An Empirical Approach | Nov 20, 2023 | Benchmarking | —Unverified | 0 |
| Holistic Inverse Rendering of Complex Facade via Aerial 3D Scanning | Nov 20, 2023 | BenchmarkingInverse Rendering | —Unverified | 0 |
| LABCAT: Locally adaptive Bayesian optimization using principal-component-aligned trust regions | Nov 19, 2023 | Bayesian OptimizationBenchmarking | CodeCode Available | 0 |
| Benchmarking Feature Extractors for Reinforcement Learning-Based Semiconductor Defect Localization | Nov 18, 2023 | BenchmarkingDeep Reinforcement Learning | —Unverified | 0 |
| Benchmarking Machine Learning Models for Quantum Error Correction | Nov 18, 2023 | Benchmarking | —Unverified | 0 |
| Predicting the Probability of Collision of a Satellite with Space Debris: A Bayesian Machine Learning Approach | Nov 17, 2023 | BenchmarkingCollision Avoidance | —Unverified | 0 |
| Social Bias Probing: Fairness Benchmarking for Language Models | Nov 15, 2023 | BenchmarkingFairness | —Unverified | 0 |
| Domain Aligned CLIP for Few-shot Classification | Nov 15, 2023 | BenchmarkingClassification | —Unverified | 0 |
| Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two Benchmarks | Nov 15, 2023 | BenchmarkingNetwork Pruning | CodeCode Available | 0 |
| Model Agnostic Explainable Selective Regression via Uncertainty Estimation | Nov 15, 2023 | Benchmarkingmodel | —Unverified | 0 |
| Benchmarking Individual Tree Mapping with Sub-meter Imagery | Nov 14, 2023 | BenchmarkingSegmentation | —Unverified | 0 |
| On Using Distribution-Based Compositionality Assessment to Evaluate Compositional Generalisation in Machine Translation | Nov 14, 2023 | BenchmarkingMachine Translation | CodeCode Available | 0 |
| The Disagreement Problem in Faithfulness Metrics | Nov 13, 2023 | BenchmarkingExplainable artificial intelligence | —Unverified | 0 |
| Uncertainty estimation of machine learning spatial precipitation predictions from satellite data | Nov 13, 2023 | BenchmarkingFeature Importance | —Unverified | 0 |
| MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks | Nov 13, 2023 | Benchmarking | —Unverified | 0 |
| Connecting the Dots: Graph Neural Network Powered Ensemble and Classification of Medical Images | Nov 13, 2023 | BenchmarkingClassification | CodeCode Available | 0 |
| Identification of vortex in unstructured mesh with graph neural networks | Nov 11, 2023 | BenchmarkingGraph Generation | —Unverified | 0 |
| SeaTurtleID2022: A long-span dataset for reliable sea turtle re-identification | Nov 9, 2023 | BenchmarkingInstance Segmentation | —Unverified | 0 |
| Prompt Sketching for Large Language Models | Nov 8, 2023 | Arithmetic ReasoningBenchmarking | —Unverified | 0 |
| An efficiency analysis of Spanish airports | Nov 8, 2023 | Benchmarking | —Unverified | 0 |
| A Comprehensive Summarization and Evaluation of Feature Refinement Modules for CTR Prediction | Nov 8, 2023 | BenchmarkingClick-Through Rate Prediction | CodeCode Available | 0 |
| DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding | Nov 7, 2023 | 3D ReconstructionBenchmarking | CodeCode Available | 0 |
| Benchmarking Deep Facial Expression Recognition: An Extensive Protocol with Balanced Dataset in the Wild | Nov 6, 2023 | BenchmarkingFacial Expression Recognition | —Unverified | 0 |
| Benchmarking Differential Evolution on a Quantum Simulator | Nov 6, 2023 | BenchmarkingEvolutionary Algorithms | —Unverified | 0 |
| Exploitation-Guided Exploration for Semantic Embodied Navigation | Nov 6, 2023 | Benchmarking | —Unverified | 0 |
| Benchmarking a Benchmark: How Reliable is MS-COCO? | Nov 5, 2023 | Benchmarkingimage-classification | —Unverified | 0 |
| Learning Disentangled Speech Representations | Nov 4, 2023 | BenchmarkingDisentanglement | —Unverified | 0 |
| Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval | Nov 3, 2023 | BenchmarkingFairness | CodeCode Available | 0 |
| Grounded Intuition of GPT-Vision's Abilities with Scientific Images | Nov 3, 2023 | Benchmarkingcounterfactual | CodeCode Available | 0 |
| An Empirical Study of Benchmarking Chinese Aspect Sentiment Quad Prediction | Nov 3, 2023 | BenchmarkingSentence | —Unverified | 0 |
| Investigating Deep-Learning NLP for Automating the Extraction of Oncology Efficacy Endpoints from Scientific Literature | Nov 3, 2023 | Benchmarking | —Unverified | 0 |
| Use of Deep Neural Networks for Uncertain Stress Functions with Extensions to Impact Mechanics | Nov 3, 2023 | Benchmarkingquantile regression | —Unverified | 0 |
| Replicable Benchmarking of Neural Machine Translation (NMT) on Low-Resource Local Languages in Indonesia | Nov 2, 2023 | BenchmarkingMachine Translation | CodeCode Available | 0 |
| Decentralized Federated Learning on the Edge over Wireless Mesh Networks | Nov 2, 2023 | BenchmarkingFederated Learning | —Unverified | 0 |
| Are Large Language Models Reliable Judges? A Study on the Factuality Evaluation Capabilities of LLMs | Nov 1, 2023 | BenchmarkingQuestion Answering | —Unverified | 0 |
| SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization | Nov 1, 2023 | Benchmarkingreinforcement-learning | —Unverified | 0 |
| A Two-Step Framework for Multi-Material Decomposition of Dual Energy Computed Tomography from Projection Domain | Oct 31, 2023 | BenchmarkingDiagnostic | —Unverified | 0 |
| Next-generation MRD assays: do we have the tools to evaluate them properly? | Oct 31, 2023 | BenchmarkingSensitivity | —Unverified | 0 |
| UAV Immersive Video Streaming: A Comprehensive Survey, Benchmarking, and Open Challenges | Oct 31, 2023 | Benchmarking | —Unverified | 0 |