| Towards Explainability and Fairness in Swiss Judgement Prediction: Benchmarking on a Multilingual Dataset | Feb 26, 2024 | BenchmarkingCross-Lingual Transfer | —Unverified | 0 |
| Benchmarking LLMs on the Semantic Overlap Summarization Task | Feb 26, 2024 | BenchmarkingDocument Summarization | —Unverified | 0 |
| Partial Rankings of Optimizers | Feb 26, 2024 | Benchmarking | CodeCode Available | 0 |
| HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs | Feb 25, 2024 | BenchmarkingChatbot | CodeCode Available | 0 |
| E(3)-equivariant models cannot learn chirality: Field-based molecular generation | Feb 24, 2024 | BenchmarkingGraph Neural Network | —Unverified | 0 |
| Decoding Intelligence: A Framework for Certifying Knowledge Comprehension in LLMs | Feb 24, 2024 | BenchmarkingKnowledge Graphs | —Unverified | 0 |
| Benchmarking Observational Studies with Experimental Data under Right-Censoring | Feb 23, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking the Robustness of Panoptic Segmentation for Automated Driving | Feb 23, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal Data | Feb 22, 2024 | Benchmarking | CodeCode Available | 0 |
| PQA: Zero-shot Protein Question Answering for Free-form Scientific Enquiry with Large Language Models | Feb 21, 2024 | BenchmarkingForm | CodeCode Available | 0 |