| Benchmarking Pathology Feature Extractors for Whole Slide Image Classification | Nov 20, 2023 | Benchmarkingimage-classification | CodeCode Available | 1 |
| LABCAT: Locally adaptive Bayesian optimization using principal-component-aligned trust regions | Nov 19, 2023 | Bayesian OptimizationBenchmarking | CodeCode Available | 0 |
| Benchmarking Machine Learning Models for Quantum Error Correction | Nov 18, 2023 | Benchmarking | —Unverified | 0 |
| Benchmarking Feature Extractors for Reinforcement Learning-Based Semiconductor Defect Localization | Nov 18, 2023 | BenchmarkingDeep Reinforcement Learning | —Unverified | 0 |
| Predicting the Probability of Collision of a Satellite with Space Debris: A Bayesian Machine Learning Approach | Nov 17, 2023 | BenchmarkingCollision Avoidance | —Unverified | 0 |
| TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction | Nov 16, 2023 | BenchmarkingEvent Extraction | CodeCode Available | 1 |
| Exponentially Faster Language Modelling | Nov 15, 2023 | BenchmarkingCPU | CodeCode Available | 2 |
| Domain Aligned CLIP for Few-shot Classification | Nov 15, 2023 | BenchmarkingClassification | —Unverified | 0 |
| Social Bias Probing: Fairness Benchmarking for Language Models | Nov 15, 2023 | BenchmarkingFairness | —Unverified | 0 |
| AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph | Nov 15, 2023 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization | Nov 15, 2023 | BenchmarkingInstruction Following | CodeCode Available | 1 |
| Model Agnostic Explainable Selective Regression via Uncertainty Estimation | Nov 15, 2023 | Benchmarkingmodel | —Unverified | 0 |
| Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two Benchmarks | Nov 15, 2023 | BenchmarkingNetwork Pruning | CodeCode Available | 0 |
| On Using Distribution-Based Compositionality Assessment to Evaluate Compositional Generalisation in Machine Translation | Nov 14, 2023 | BenchmarkingMachine Translation | CodeCode Available | 0 |
| Benchmarking Individual Tree Mapping with Sub-meter Imagery | Nov 14, 2023 | BenchmarkingSegmentation | —Unverified | 0 |
| MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration | Nov 14, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 1 |
| Combinatorial Optimization with Policy Adaptation using Latent Space Search | Nov 13, 2023 | BenchmarkingCombinatorial Optimization | CodeCode Available | 1 |
| Benchmarking PtO and PnO Methods in the Predictive Combinatorial Optimization Regime | Nov 13, 2023 | BenchmarkingCombinatorial Optimization | CodeCode Available | 1 |
| Connecting the Dots: Graph Neural Network Powered Ensemble and Classification of Medical Images | Nov 13, 2023 | BenchmarkingClassification | CodeCode Available | 0 |
| MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks | Nov 13, 2023 | Benchmarking | —Unverified | 0 |
| Uncertainty estimation of machine learning spatial precipitation predictions from satellite data | Nov 13, 2023 | BenchmarkingFeature Importance | —Unverified | 0 |
| The Disagreement Problem in Faithfulness Metrics | Nov 13, 2023 | BenchmarkingExplainable artificial intelligence | —Unverified | 0 |
| WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models | Nov 13, 2023 | BenchmarkingInstruction Following | CodeCode Available | 1 |
| Flames: Benchmarking Value Alignment of LLMs in Chinese | Nov 12, 2023 | BenchmarkingFairness | CodeCode Available | 1 |
| Identification of vortex in unstructured mesh with graph neural networks | Nov 11, 2023 | BenchmarkingGraph Generation | —Unverified | 0 |
| CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation | Nov 10, 2023 | BenchmarkingCloud Computing | CodeCode Available | 1 |
| MultiIoT: Benchmarking Machine Learning for the Internet of Things | Nov 10, 2023 | BenchmarkingRepresentation Learning | CodeCode Available | 1 |
| SeaTurtleID2022: A long-span dataset for reliable sea turtle re-identification | Nov 9, 2023 | BenchmarkingInstance Segmentation | —Unverified | 0 |
| TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for Human-Aligned LLMs | Nov 9, 2023 | BenchmarkingQuestion Answering | CodeCode Available | 1 |
| An efficiency analysis of Spanish airports | Nov 8, 2023 | Benchmarking | —Unverified | 0 |
| The voraus-AD Dataset for Anomaly Detection in Robot Applications | Nov 8, 2023 | Anomaly DetectionBenchmarking | CodeCode Available | 1 |
| Prompt Sketching for Large Language Models | Nov 8, 2023 | Arithmetic ReasoningBenchmarking | —Unverified | 0 |
| The PetShop Dataset -- Finding Causes of Performance Issues across Microservices | Nov 8, 2023 | Benchmarking | CodeCode Available | 1 |
| A Comprehensive Summarization and Evaluation of Feature Refinement Modules for CTR Prediction | Nov 8, 2023 | BenchmarkingClick-Through Rate Prediction | CodeCode Available | 0 |
| Bilingual Corpus Mining and Multistage Fine-Tuning for Improving Machine Translation of Lecture Transcripts | Nov 7, 2023 | BenchmarkingMachine Translation | CodeCode Available | 1 |
| DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding | Nov 7, 2023 | 3D ReconstructionBenchmarking | CodeCode Available | 0 |
| Benchmarking Geospatial Question Answering Engines using the Dataset GeoQuestions1089 | Nov 6, 2023 | BenchmarkingKnowledge Base Question Answering | CodeCode Available | 1 |
| Hopfield-Enhanced Deep Neural Networks for Artifact-Resilient Brain State Decoding | Nov 6, 2023 | BenchmarkingData Compression | CodeCode Available | 1 |
| Benchmarking Deep Facial Expression Recognition: An Extensive Protocol with Balanced Dataset in the Wild | Nov 6, 2023 | BenchmarkingFacial Expression Recognition | —Unverified | 0 |
| Benchmarking Differential Evolution on a Quantum Simulator | Nov 6, 2023 | BenchmarkingEvolutionary Algorithms | —Unverified | 0 |
| Exploitation-Guided Exploration for Semantic Embodied Navigation | Nov 6, 2023 | Benchmarking | —Unverified | 0 |
| Digital Typhoon: Long-term Satellite Image Dataset for the Spatio-Temporal Modeling of Tropical Cyclones | Nov 5, 2023 | Benchmarking | CodeCode Available | 1 |
| JRDB-Traj: A Dataset and Benchmark for Trajectory Forecasting in Crowds | Nov 5, 2023 | Autonomous NavigationAutonomous Vehicles | CodeCode Available | 1 |
| Benchmarking a Benchmark: How Reliable is MS-COCO? | Nov 5, 2023 | Benchmarkingimage-classification | —Unverified | 0 |
| Learning Disentangled Speech Representations | Nov 4, 2023 | BenchmarkingDisentanglement | —Unverified | 0 |
| NeuroEvoBench: Benchmarking Evolutionary Optimizers for Deep Learning Applications | Nov 4, 2023 | BenchmarkingDeep Learning | CodeCode Available | 1 |
| LocoMuJoCo: A Comprehensive Imitation Learning Benchmark for Locomotion | Nov 4, 2023 | BenchmarkingImitation Learning | CodeCode Available | 3 |
| FragXsiteDTI: Revealing Responsible Segments in Drug-Target Interaction with Transformer-Driven Interpretation | Nov 4, 2023 | BenchmarkingDrug Discovery | CodeCode Available | 1 |
| Use of Deep Neural Networks for Uncertain Stress Functions with Extensions to Impact Mechanics | Nov 3, 2023 | Benchmarkingquantile regression | —Unverified | 0 |
| Investigating Deep-Learning NLP for Automating the Extraction of Oncology Efficacy Endpoints from Scientific Literature | Nov 3, 2023 | Benchmarking | —Unverified | 0 |