| Eir: Thai Medical Large Language Models | Sep 13, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks | Sep 13, 2024 | ARCCode Generation | —Unverified | 0 |
| Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models | Sep 7, 2024 | MMLUTruthfulQA | —Unverified | 0 |
| MMLU-Pro+: Evaluating Higher-Order Reasoning and Shortcut Learning in LLMs | Sep 3, 2024 | MMLU | CodeCode Available | 0 |
| Performance Law of Large Language Models | Aug 19, 2024 | MMLU | CodeCode Available | 0 |
| Reasoning Beyond Bias: A Study on Counterfactual Prompting and Chain of Thought Reasoning | Aug 16, 2024 | counterfactualMMLU | —Unverified | 0 |
| SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models | Aug 16, 2024 | GSM8KMMLU | —Unverified | 0 |
| ArabLegalEval: A Multitask Benchmark for Assessing Arabic Legal Knowledge in Large Language Models | Aug 15, 2024 | In-Context LearningMMLU | CodeCode Available | 1 |
| BOTS-LM: Training Large Language Models for Setswana | Aug 5, 2024 | Computational EfficiencyLanguage Modeling | —Unverified | 0 |
| Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions | Aug 1, 2024 | Medical Question AnsweringMedQA | CodeCode Available | 4 |
| A deeper look at depth pruning of LLMs | Jul 23, 2024 | MMLU | CodeCode Available | 1 |
| Networks of Networks: Complexity Class Principles Applied to Compound AI Systems Design | Jul 23, 2024 | Formal LogicLanguage Modelling | —Unverified | 0 |
| ALLaM: Large Language Models for Arabic and English | Jul 22, 2024 | DecoderLanguage Acquisition | —Unverified | 0 |
| Compact Language Models via Pruning and Knowledge Distillation | Jul 19, 2024 | Knowledge DistillationLanguage Modeling | CodeCode Available | 3 |
| Qwen2 Technical Report | Jul 15, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 13 |
| Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs | Jul 5, 2024 | General KnowledgeInstruction Following | CodeCode Available | 1 |
| metabench -- A Sparse Benchmark to Measure General Ability in Large Language Models | Jul 4, 2024 | ARCGSM8K | CodeCode Available | 0 |
| AgentInstruct: Toward Generative Teaching with Agentic Flows | Jul 3, 2024 | GSM8KMMLU | —Unverified | 0 |
| Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning | Jul 2, 2024 | Active LearningLanguage Modelling | —Unverified | 0 |
| Changing Answer Order Can Decrease MMLU Accuracy | Jun 27, 2024 | MMLUMultiple-choice | —Unverified | 0 |
| EmPO: Emotion Grounding for Empathetic Response Generation through Preference Optimization | Jun 27, 2024 | DiversityEmpathetic Response Generation | CodeCode Available | 0 |
| The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale | Jun 25, 2024 | ARCLanguage Modeling | CodeCode Available | 1 |
| Training-Free Exponential Context Extension via Cascading KV Cache | Jun 24, 2024 | Book summarizationComputational Efficiency | CodeCode Available | 0 |
| Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging | Jun 24, 2024 | MMLUModel Compression | CodeCode Available | 1 |
| Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models | Jun 23, 2024 | Machine TranslationMMLU | CodeCode Available | 1 |