| A Large-scale Evaluation of Pretraining Paradigms for the Detection of Defects in Electroluminescence Solar Cell Images | Feb 27, 2024 | BenchmarkingDefect Detection | —Unverified | 0 |
| The KANDY Benchmark: Incremental Neuro-Symbolic Learning and Reasoning with Kandinsky Patterns | Feb 27, 2024 | BenchmarkingBinary Classification | CodeCode Available | 0 |
| Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data | Feb 27, 2024 | Benchmarking | CodeCode Available | 1 |
| Partial Rankings of Optimizers | Feb 26, 2024 | Benchmarking | CodeCode Available | 0 |
| Benchmarking LLMs on the Semantic Overlap Summarization Task | Feb 26, 2024 | BenchmarkingDocument Summarization | —Unverified | 0 |
| Towards Explainability and Fairness in Swiss Judgement Prediction: Benchmarking on a Multilingual Dataset | Feb 26, 2024 | BenchmarkingCross-Lingual Transfer | —Unverified | 0 |
| Performance Comparison of Surrogate-Assisted Evolutionary Algorithms on Computational Fluid Dynamics Problems | Feb 26, 2024 | BenchmarkingEvolutionary Algorithms | —Unverified | 0 |
| HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs | Feb 25, 2024 | BenchmarkingChatbot | CodeCode Available | 0 |
| PST-Bench: Tracing and Benchmarking the Source of Publications | Feb 25, 2024 | Benchmarking | CodeCode Available | 1 |
| Decoding Intelligence: A Framework for Certifying Knowledge Comprehension in LLMs | Feb 24, 2024 | BenchmarkingKnowledge Graphs | —Unverified | 0 |