| Adversarial Databases Improve Success in Retrieval-based Large Language Models | Jul 19, 2024 | Multiple-choiceRAG | —Unverified | 0 |
| TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish | Jul 17, 2024 | MathMultiple-choice | CodeCode Available | 1 |
| MINI-LLM: Memory-Efficient Structured Pruning for Large Language Models | Jul 16, 2024 | GPUMultiple-choice | —Unverified | 0 |
| Fine-tuning Multimodal Large Language Models for Product Bundling | Jul 16, 2024 | In-Context LearningMultiple-choice | CodeCode Available | 1 |
| Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models | Jul 15, 2024 | Backdoor AttackMultiple-choice | CodeCode Available | 1 |
| AstroMLab 1: Who Wins Astronomy Jeopardy!? | Jul 15, 2024 | AstronomyBenchmarking | —Unverified | 0 |
| NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models | Jul 15, 2024 | Common Sense ReasoningMultiple-choice | —Unverified | 0 |
| LAB-Bench: Measuring Capabilities of Language Models for Biology Research | Jul 14, 2024 | Language ModellingMultiple-choice | —Unverified | 0 |
| Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures? | Jul 12, 2024 | Logical ReasoningMultiple-choice | CodeCode Available | 0 |
| Evaluating Nuanced Bias in Large Language Model Free Response Answers | Jul 11, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |