| Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models | Mar 1, 2024 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| Benchmarking zero-shot stance detection with FlanT5-XXL: Insights from training data, prompting, and decoding strategies into its near-SoTA performance | Mar 1, 2024 | BenchmarkingStance Detection | —Unverified | 0 |
| The 6th Affective Behavior Analysis in-the-wild (ABAW) Competition | Feb 29, 2024 | Action Unit DetectionArousal Estimation | —Unverified | 0 |
| FlowCyt: A Comparative Study of Deep Learning Approaches for Multi-Class Classification in Flow Cytometry Benchmarking | Feb 28, 2024 | BenchmarkingInductive Learning | CodeCode Available | 0 |
| Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models | Feb 28, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies | Feb 27, 2024 | BenchmarkingSystematic Generalization | —Unverified | 0 |
| The KANDY Benchmark: Incremental Neuro-Symbolic Learning and Reasoning with Kandinsky Patterns | Feb 27, 2024 | BenchmarkingBinary Classification | CodeCode Available | 0 |
| A Large-scale Evaluation of Pretraining Paradigms for the Detection of Defects in Electroluminescence Solar Cell Images | Feb 27, 2024 | BenchmarkingDefect Detection | —Unverified | 0 |
| The Seeker's Dilemma: Realistic Formulation and Benchmarking for Hardware Trojan Detection | Feb 27, 2024 | Benchmarking | —Unverified | 0 |
| Performance Comparison of Surrogate-Assisted Evolutionary Algorithms on Computational Fluid Dynamics Problems | Feb 26, 2024 | BenchmarkingEvolutionary Algorithms | —Unverified | 0 |