| HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics | Oct 13, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| LoRE: Logit-Ranked Retriever Ensemble for Enhancing Open-Domain Question Answering | Oct 13, 2024 | Answer GenerationLanguage Modeling | —Unverified | 0 |
| Impeding LLM-assisted Cheating in Introductory Programming Assignments via Adversarial Perturbation | Oct 12, 2024 | Code GenerationLanguage Modeling | —Unverified | 0 |
| DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning | Oct 12, 2024 | Audio captioningLarge Language Model | —Unverified | 0 |
| Debiasing Vison-Language Models with Text-Only Training | Oct 12, 2024 | Large Language Model | —Unverified | 0 |
| LINKED: Eliciting, Filtering and Integrating Knowledge in Large Language Model for Commonsense Reasoning | Oct 12, 2024 | Knowledge GraphsLanguage Modeling | CodeCode Available | 0 |
| Extended Japanese Commonsense Morality Dataset with Masked Token and Label Enhancement | Oct 12, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| Enterprise Benchmarks for Large Language Model Evaluation | Oct 11, 2024 | BenchmarkingLanguage Model Evaluation | CodeCode Available | 0 |
| LLMD: A Large Language Model for Interpreting Longitudinal Medical Records | Oct 11, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains | Oct 11, 2024 | Large Language ModelLogical Reasoning | —Unverified | 0 |