| Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction | May 23, 2023 | Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA) | CodeCode Available | 0 |
| Benchmarking Machine Translation with Cultural Awareness | May 23, 2023 | BenchmarkingIn-Context Learning | CodeCode Available | 0 |
| Multilingual Large Language Models Are Not (Yet) Code-Switchers | May 23, 2023 | BenchmarkingLanguage Identification | —Unverified | 0 |
| Robust Model-Based Optimization for Challenging Fitness Landscapes | May 23, 2023 | Benchmarkingmodel | CodeCode Available | 0 |
| Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate | May 22, 2023 | BenchmarkingMath | —Unverified | 0 |
| How Fragile is Relation Extraction under Entity Replacements? | May 22, 2023 | BenchmarkingCausal Inference | CodeCode Available | 0 |
| A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting Approaches | May 22, 2023 | BenchmarkingClassification | CodeCode Available | 0 |
| Value-at-Risk-Based Portfolio Insurance: Performance Evaluation and Benchmarking Against CPPI in a Markov-Modulated Regime-Switching Market | May 21, 2023 | BenchmarkingFinancial Analysis | —Unverified | 0 |
| Patterns of Convergence and Bound Constraint Violation in Differential Evolution on SBOX-COST Benchmarking Suite | May 20, 2023 | Benchmarking | —Unverified | 0 |
| TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks | May 19, 2023 | Benchmarking | —Unverified | 0 |
| Separating form and meaning: Using self-consistency to quantify task understanding across multiple senses | May 19, 2023 | BenchmarkingForm | CodeCode Available | 0 |
| Ahead-of-Time P-Tuning | May 18, 2023 | Benchmarkingparameter-efficient fine-tuning | —Unverified | 0 |
| Benchmarking Deep Learning Frameworks for Automated Diagnosis of Ocular Toxoplasmosis: A Comprehensive Approach to Classification and Segmentation | May 18, 2023 | BenchmarkingDiagnostic | —Unverified | 0 |
| Boost Vision Transformer with GPU-Friendly Sparsity and Quantization | May 18, 2023 | BenchmarkingGPU | —Unverified | 0 |
| Human Behavioral Benchmarking: Numeric Magnitude Comparison Effects in Large Language Models | May 18, 2023 | Benchmarking | —Unverified | 0 |
| Smiling Women Pitching Down: Auditing Representational and Presentational Gender Biases in Image Generative AI | May 17, 2023 | Benchmarking | —Unverified | 0 |
| Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks | May 17, 2023 | Benchmarking | —Unverified | 0 |
| Restoring Images Captured in Arbitrary Hybrid Adverse Weather Conditions in One Go | May 17, 2023 | BenchmarkingImage Restoration | —Unverified | 0 |
| DLUE: Benchmarking Document Language Understanding | May 16, 2023 | BenchmarkingDocument Classification | —Unverified | 0 |
| OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking | May 15, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Predictive Models from Quantum Computer Benchmarks | May 15, 2023 | Benchmarkingimage-classification | —Unverified | 0 |
| Benchmarking the human brain against computational architectures | May 15, 2023 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| A Strong Sustainability Paradigm Based Analytical Hierarchy Process (SSP-AHP) Method to Evaluate Sustainable Healthcare Systems | May 13, 2023 | Benchmarking | —Unverified | 0 |
| MedGPTEval: A Dataset and Benchmark to Evaluate Responses of Large Language Models in Medicine | May 12, 2023 | Benchmarking | —Unverified | 0 |
| Uncertainty in GNN Learning Evaluations: The Importance of a Consistent Benchmark for Community Detection | May 10, 2023 | BenchmarkingCommunity Detection | —Unverified | 0 |
| Comparing Foundation Models using Data Kernels | May 9, 2023 | BenchmarkingSelf-Supervised Learning | —Unverified | 0 |
| Towards Segment Anything Model (SAM) for Medical Image Segmentation: A Survey | May 5, 2023 | BenchmarkingImage Generation | CodeCode Available | 0 |
| A Comprehensive Study on Dataset Distillation: Performance, Privacy, Robustness and Fairness | May 5, 2023 | BenchmarkingDataset Distillation | —Unverified | 0 |
| Semantic Segmentation using Vision Transformers: A survey | May 5, 2023 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Can LLMs Capture Human Preferences? | May 4, 2023 | Benchmarking | —Unverified | 0 |
| Analyzing Hong Kong's Legal Judgments from a Computational Linguistics point-of-view | May 4, 2023 | BenchmarkingGraph Generation | —Unverified | 0 |
| A Simulation-Augmented Benchmarking Framework for Automatic RSO Streak Detection in Single-Frame Space Images | Apr 30, 2023 | Benchmarkingobject-detection | —Unverified | 0 |
| Benchmarking Automated Machine Learning Methods for Price Forecasting Applications | Apr 28, 2023 | AutoMLBenchmarking | —Unverified | 0 |
| ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task | Apr 27, 2023 | ArticlesBenchmarking | —Unverified | 0 |
| On Pitfalls of RemOve-And-Retrain: Data Processing Inequality Perspective | Apr 26, 2023 | BenchmarkingFeature Importance | CodeCode Available | 0 |
| Scalable, Distributed AI Frameworks: Leveraging Cloud Computing for Enhanced Deep Learning Performance and Efficiency | Apr 26, 2023 | BenchmarkingCloud Computing | —Unverified | 0 |
| CIMLA: Interpretable AI for inference of differential causal networks | Apr 25, 2023 | Benchmarking | —Unverified | 0 |
| Unsupervised Synthetic Image Refinement via Contrastive Learning and Consistent Semantic-Structural Constraints | Apr 25, 2023 | BenchmarkingContrastive Learning | —Unverified | 0 |
| Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training (TXIT) Exam and Red Journal Gray Zone Cases: Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation Oncology | Apr 24, 2023 | BenchmarkingDecision Making | CodeCode Available | 0 |
| A Framework for Benchmarking Real-Time Embedded Object Detection | Apr 23, 2023 | BenchmarkingObject | —Unverified | 0 |
| Vision Transformer for Efficient Chest X-ray and Gastrointestinal Image Classification | Apr 23, 2023 | BenchmarkingData Augmentation | —Unverified | 0 |
| Learning a quantum computer's capability | Apr 20, 2023 | Benchmarking | —Unverified | 0 |
| Towards a Benchmark for Scientific Understanding in Humans and Machines | Apr 20, 2023 | BenchmarkingInformation Retrieval | —Unverified | 0 |
| Depth Functions for Partial Orders with a Descriptive Analysis of Machine Learning Algorithms | Apr 19, 2023 | BenchmarkingDescriptive | CodeCode Available | 0 |
| The eBible Corpus: Data and Model Benchmarks for Bible Translation for Low-Resource Languages | Apr 19, 2023 | BenchmarkingMachine Translation | CodeCode Available | 0 |
| UDTIRI: An Online Open-Source Intelligent Road Inspection Benchmark Suite | Apr 18, 2023 | BenchmarkingInstance Segmentation | —Unverified | 0 |
| Computational and Exploratory Landscape Analysis of the GKLS Generator | Apr 18, 2023 | Benchmarkingglobal-optimization | —Unverified | 0 |
| OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images | Apr 17, 2023 | 3D Pose EstimationBenchmarking | —Unverified | 0 |
| Towards Computational Performance Engineering for Unsupervised Concept Drift Detection -- Complexities, Benchmarking, Performance Analysis | Apr 17, 2023 | BenchmarkingDrift Detection | CodeCode Available | 0 |
| Dialogue Games for Benchmarking Language Understanding: Motivation, Taxonomy, Strategy | Apr 14, 2023 | Benchmarking | —Unverified | 0 |