| Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate | May 22, 2023 | BenchmarkingMath | —Unverified | 0 |
| Towards Benchmarking and Assessing Visual Naturalness of Physical World Adversarial Attacks | May 22, 2023 | Adversarial AttackAutonomous Driving | CodeCode Available | 1 |
| Value-at-Risk-Based Portfolio Insurance: Performance Evaluation and Benchmarking Against CPPI in a Markov-Modulated Regime-Switching Market | May 21, 2023 | BenchmarkingFinancial Analysis | —Unverified | 0 |
| Patterns of Convergence and Bound Constraint Violation in Differential Evolution on SBOX-COST Benchmarking Suite | May 20, 2023 | Benchmarking | —Unverified | 0 |
| Visualizing Linguistic Diversity of Text Datasets Synthesized by Large Language Models | May 19, 2023 | BenchmarkingDiversity | CodeCode Available | 2 |
| Separating form and meaning: Using self-consistency to quantify task understanding across multiple senses | May 19, 2023 | BenchmarkingForm | CodeCode Available | 0 |
| TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks | May 19, 2023 | Benchmarking | —Unverified | 0 |
| Ahead-of-Time P-Tuning | May 18, 2023 | Benchmarkingparameter-efficient fine-tuning | —Unverified | 0 |
| Benchmarking Deep Learning Frameworks for Automated Diagnosis of Ocular Toxoplasmosis: A Comprehensive Approach to Classification and Segmentation | May 18, 2023 | BenchmarkingDiagnostic | —Unverified | 0 |
| Boost Vision Transformer with GPU-Friendly Sparsity and Quantization | May 18, 2023 | BenchmarkingGPU | —Unverified | 0 |