| Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment | Jul 20, 2024 | Contrastive LearningMultiple-choice | CodeCode Available | 0 |
| Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data | Jul 20, 2024 | Language ModellingMachine Translation | —Unverified | 0 |
| Adversarial Databases Improve Success in Retrieval-based Large Language Models | Jul 19, 2024 | Multiple-choiceRAG | —Unverified | 0 |
| MINI-LLM: Memory-Efficient Structured Pruning for Large Language Models | Jul 16, 2024 | GPUMultiple-choice | —Unverified | 0 |
| NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models | Jul 15, 2024 | Common Sense ReasoningMultiple-choice | —Unverified | 0 |
| AstroMLab 1: Who Wins Astronomy Jeopardy!? | Jul 15, 2024 | AstronomyBenchmarking | —Unverified | 0 |
| LAB-Bench: Measuring Capabilities of Language Models for Biology Research | Jul 14, 2024 | Language ModellingMultiple-choice | —Unverified | 0 |
| Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures? | Jul 12, 2024 | Logical ReasoningMultiple-choice | CodeCode Available | 0 |
| Evaluating Nuanced Bias in Large Language Model Free Response Answers | Jul 11, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Self-Recognition in Language Models | Jul 9, 2024 | Multiple-choice | CodeCode Available | 0 |