| GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking | May 24, 2023 | BenchmarkingGraph Mining | CodeCode Available | 0 |
| BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer | May 24, 2023 | BenchmarkingCross-Lingual Transfer | —Unverified | 0 |
| LAraBench: Benchmarking Arabic AI with Large Language Models | May 24, 2023 | BenchmarkingFew-Shot Learning | —Unverified | 0 |
| Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction | May 23, 2023 | Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA) | CodeCode Available | 0 |
| ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain Readability Assessment | May 23, 2023 | BenchmarkingCross-Lingual Transfer | CodeCode Available | 1 |
| Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces | May 23, 2023 | Benchmarking | CodeCode Available | 1 |
| R2H: Building Multimodal Navigation Helpers that Respond to Help Requests | May 23, 2023 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| When the Music Stops: Tip-of-the-Tongue Retrieval for Music | May 23, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| Benchmarking Machine Translation with Cultural Awareness | May 23, 2023 | BenchmarkingIn-Context Learning | CodeCode Available | 0 |
| Robust Model-Based Optimization for Challenging Fitness Landscapes | May 23, 2023 | Benchmarkingmodel | CodeCode Available | 0 |
| Exploring Large Language Models for Classical Philology | May 23, 2023 | BenchmarkingDecoder | CodeCode Available | 1 |
| Multilingual Large Language Models Are Not (Yet) Code-Switchers | May 23, 2023 | BenchmarkingLanguage Identification | —Unverified | 0 |
| How Fragile is Relation Extraction under Entity Replacements? | May 22, 2023 | BenchmarkingCausal Inference | CodeCode Available | 0 |
| Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method | May 22, 2023 | BenchmarkingHallucination | CodeCode Available | 1 |
| A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting Approaches | May 22, 2023 | BenchmarkingClassification | CodeCode Available | 0 |
| Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate | May 22, 2023 | BenchmarkingMath | —Unverified | 0 |
| Towards Benchmarking and Assessing Visual Naturalness of Physical World Adversarial Attacks | May 22, 2023 | Adversarial AttackAutonomous Driving | CodeCode Available | 1 |
| Value-at-Risk-Based Portfolio Insurance: Performance Evaluation and Benchmarking Against CPPI in a Markov-Modulated Regime-Switching Market | May 21, 2023 | BenchmarkingFinancial Analysis | —Unverified | 0 |
| Patterns of Convergence and Bound Constraint Violation in Differential Evolution on SBOX-COST Benchmarking Suite | May 20, 2023 | Benchmarking | —Unverified | 0 |
| Visualizing Linguistic Diversity of Text Datasets Synthesized by Large Language Models | May 19, 2023 | BenchmarkingDiversity | CodeCode Available | 2 |
| Separating form and meaning: Using self-consistency to quantify task understanding across multiple senses | May 19, 2023 | BenchmarkingForm | CodeCode Available | 0 |
| TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks | May 19, 2023 | Benchmarking | —Unverified | 0 |
| Ahead-of-Time P-Tuning | May 18, 2023 | Benchmarkingparameter-efficient fine-tuning | —Unverified | 0 |
| Benchmarking Deep Learning Frameworks for Automated Diagnosis of Ocular Toxoplasmosis: A Comprehensive Approach to Classification and Segmentation | May 18, 2023 | BenchmarkingDiagnostic | —Unverified | 0 |
| Boost Vision Transformer with GPU-Friendly Sparsity and Quantization | May 18, 2023 | BenchmarkingGPU | —Unverified | 0 |