| Large Language Model Compression with Neural Architecture Search | Oct 9, 2024 | Instruction FollowingLanguage Modeling | —Unverified | 0 |
| Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths | Oct 7, 2024 | AttributeGSM8K | —Unverified | 0 |
| Continuous Approximations for Improving Quantization Aware Training of LLMs | Oct 6, 2024 | MMLUModel Compression | —Unverified | 0 |
| CommonIT: Commonality-Aware Instruction Tuning for Large Language Models via Data Partitions | Oct 4, 2024 | Instruction FollowingMMLU | CodeCode Available | 0 |
| LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity | Oct 4, 2024 | DiversityEnsemble Pruning | CodeCode Available | 0 |
| BrainTransformers: SNN-LLM | Oct 3, 2024 | ARCGSM8K | —Unverified | 0 |
| Efficiently Deploying LLMs with Controlled Risk | Oct 3, 2024 | MMLUTruthfulQA | —Unverified | 0 |
| DoPAMine: Domain-specific Pre-training Adaptation from seed-guided data Mining | Sep 30, 2024 | Continual PretrainingDomain Adaptation | —Unverified | 0 |
| Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs | Sep 30, 2024 | ARCDiversity | —Unverified | 0 |
| Instance-adaptive Zero-shot Chain-of-Thought Prompting | Sep 30, 2024 | GSM8KMath | —Unverified | 0 |
| SSR: Alignment-Aware Modality Connector for Speech Language Models | Sep 30, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Uncovering Latent Chain of Thought Vectors in Language Models | Sep 21, 2024 | ARCGSM8K | —Unverified | 0 |
| Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination | Sep 19, 2024 | General KnowledgeMMLU | —Unverified | 0 |
| GRIN: GRadient-INformed MoE | Sep 18, 2024 | HellaSwagHumanEval | —Unverified | 0 |
| Eir: Thai Medical Large Language Models | Sep 13, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks | Sep 13, 2024 | ARCCode Generation | —Unverified | 0 |
| Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models | Sep 7, 2024 | MMLUTruthfulQA | —Unverified | 0 |
| MMLU-Pro+: Evaluating Higher-Order Reasoning and Shortcut Learning in LLMs | Sep 3, 2024 | MMLU | CodeCode Available | 0 |
| Performance Law of Large Language Models | Aug 19, 2024 | MMLU | CodeCode Available | 0 |
| SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models | Aug 16, 2024 | GSM8KMMLU | —Unverified | 0 |
| Reasoning Beyond Bias: A Study on Counterfactual Prompting and Chain of Thought Reasoning | Aug 16, 2024 | counterfactualMMLU | —Unverified | 0 |
| BOTS-LM: Training Large Language Models for Setswana | Aug 5, 2024 | Computational EfficiencyLanguage Modeling | —Unverified | 0 |
| Networks of Networks: Complexity Class Principles Applied to Compound AI Systems Design | Jul 23, 2024 | Formal LogicLanguage Modelling | —Unverified | 0 |
| ALLaM: Large Language Models for Arabic and English | Jul 22, 2024 | DecoderLanguage Acquisition | —Unverified | 0 |
| metabench -- A Sparse Benchmark to Measure General Ability in Large Language Models | Jul 4, 2024 | ARCGSM8K | CodeCode Available | 0 |
| AgentInstruct: Toward Generative Teaching with Agentic Flows | Jul 3, 2024 | GSM8KMMLU | —Unverified | 0 |
| Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning | Jul 2, 2024 | Active LearningLanguage Modelling | —Unverified | 0 |
| Changing Answer Order Can Decrease MMLU Accuracy | Jun 27, 2024 | MMLUMultiple-choice | —Unverified | 0 |
| EmPO: Emotion Grounding for Empathetic Response Generation through Preference Optimization | Jun 27, 2024 | DiversityEmpathetic Response Generation | CodeCode Available | 0 |
| Training-Free Exponential Context Extension via Cascading KV Cache | Jun 24, 2024 | Book summarizationComputational Efficiency | CodeCode Available | 0 |
| Data Efficient Evaluation of Large Language Models and Text-to-Image Models via Adaptive Sampling | Jun 21, 2024 | ClusteringMMLU | —Unverified | 0 |
| DEM: Distribution Edited Model for Training with Mixed Data Distributions | Jun 21, 2024 | DiversityInstruction Following | —Unverified | 0 |
| Pistis-RAG: Enhancing Retrieval-Augmented Generation with Human Feedback | Jun 21, 2024 | Information RetrievalLearning-To-Rank | —Unverified | 0 |
| Optimised Grouped-Query Attention Mechanism for Transformers | Jun 21, 2024 | MMLU | —Unverified | 0 |
| Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluation | Jun 20, 2024 | GSM8KLanguage Model Evaluation | CodeCode Available | 0 |
| Understanding Finetuning for Factual Knowledge Extraction | Jun 20, 2024 | MMLUQuestion Answering | —Unverified | 0 |
| Input Conditioned Graph Generation for Language Agents | Jun 17, 2024 | Graph GenerationMMLU | CodeCode Available | 0 |
| The Base-Rate Effect on LLM Benchmark Performance: Disambiguating Test-Taking Strategies from Benchmark Performance | Jun 17, 2024 | counterfactualMMLU | —Unverified | 0 |
| Cultural Conditioning or Placebo? On the Effectiveness of Socio-Demographic Prompting | Jun 17, 2024 | EthicsMMLU | —Unverified | 0 |
| ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation | Jun 16, 2024 | Continual LearningGSM8K | CodeCode Available | 0 |
| Reactor Mk.1 performances: MMLU, HumanEval and BBH test results | Jun 15, 2024 | BenchmarkingHumanEval | —Unverified | 0 |
| MMLU-SR: A Benchmark for Stress-Testing Reasoning Capability of Large Language Models | Jun 15, 2024 | Mathematical ReasoningMMLU | —Unverified | 0 |
| Quantifying Variance in Evaluation Benchmarks | Jun 14, 2024 | MMLU | —Unverified | 0 |
| GEB-1.3B: Open Lightweight Large Language Model | Jun 14, 2024 | CPULanguage Modeling | —Unverified | 0 |
| An Empirical Study of Mamba-based Language Models | Jun 12, 2024 | 16kIn-Context Learning | —Unverified | 0 |
| Does your data spark joy? Performance gains from domain upsampling at the end of training | Jun 5, 2024 | GSM8KHumanEval | —Unverified | 0 |
| Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function | Jun 3, 2024 | DiversityMMLU | CodeCode Available | 0 |
| MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures | Jun 3, 2024 | ChatbotMMLU | —Unverified | 0 |
| Spanish and LLM Benchmarks: is MMLU Lost in Translation? | May 28, 2024 | MMLUTranslation | —Unverified | 0 |
| GECKO: Generative Language Model for English, Code and Korean | May 24, 2024 | kmmluLanguage Modeling | —Unverified | 0 |