| An Assessment of Model-On-Model Deception | May 10, 2024 | MMLUmodel | —Unverified | 0 |
| SUTRA: Scalable Multilingual Language Model Architecture | May 7, 2024 | Computational EfficiencyHallucination | —Unverified | 0 |
| Octopus v4: Graph of language models | Apr 30, 2024 | MMLU | —Unverified | 0 |
| Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone | Apr 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models | Apr 18, 2024 | GSM8KMMLU | —Unverified | 0 |
| Post-Hoc Reversal: Are We Selecting Models Prematurely? | Apr 11, 2024 | Language ModellingMMLU | CodeCode Available | 0 |
| LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction | Apr 1, 2024 | Image CaptioningInstruction Following | —Unverified | 0 |
| NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning | Mar 30, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Few-Shot Recalibration of Language Models | Mar 27, 2024 | MathMMLU | —Unverified | 0 |
| CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge | Mar 13, 2024 | Dialogue EvaluationHumanEval | —Unverified | 0 |
| The Claude 3 Model Family: Opus, Sonnet, Haiku | Mar 4, 2024 | 1 Image, 2*2 StitchingArithmetic Reasoning | —Unverified | 0 |
| KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations | Mar 3, 2024 | MedQAMMLU | —Unverified | 0 |
| OpenMedLM: Prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models | Feb 29, 2024 | Medical Question AnsweringMedQA | —Unverified | 0 |
| Do Large Language Models Mirror Cognitive Language Processing? | Feb 28, 2024 | ChatbotLogical Reasoning | —Unverified | 0 |
| MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning | Feb 27, 2024 | 8kLanguage Modeling | CodeCode Available | 0 |
| ARL2: Aligning Retrievers for Black-box Large Language Models via Self-guided Adaptive Relevance Labeling | Feb 21, 2024 | MMLURetrieval | CodeCode Available | 0 |
| Automating Dataset Updates Towards Reliable and Timely Evaluation of Large Language Models | Feb 19, 2024 | MMLU | —Unverified | 0 |
| When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards | Feb 1, 2024 | Answer SelectionLanguage Modeling | CodeCode Available | 0 |
| Towards Uncertainty-Aware Language Agent | Jan 25, 2024 | MMLUStrategyQA | —Unverified | 0 |
| LLaMA Beyond English: An Empirical Study on Language Capability Transfer | Jan 2, 2024 | GPUInformativeness | —Unverified | 0 |
| Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities | Dec 22, 2023 | ChatbotGSM8K | —Unverified | 0 |
| YAYI 2: Multilingual Open-Source Large Language Models | Dec 22, 2023 | MMLU | —Unverified | 0 |
| LM-Cocktail: Resilient Tuning of Language Models via Model Merging | Nov 22, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| AcademicGPT: Empowering Academic Research | Nov 21, 2023 | Abstract generationGeneral Knowledge | —Unverified | 0 |
| Investigating Data Contamination in Modern Benchmarks for Large Language Models | Nov 16, 2023 | Common Sense ReasoningMMLU | —Unverified | 0 |
| ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in Psychology | Nov 16, 2023 | MMLUMultiple-choice | —Unverified | 0 |
| The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback | Oct 31, 2023 | GSM8KMMLU | —Unverified | 0 |
| TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise | Oct 29, 2023 | Data AugmentationLanguage Modeling | —Unverified | 0 |
| Evaluation of large language models using an Indian language LGBTI+ lexicon | Oct 26, 2023 | Machine TranslationMMLU | —Unverified | 0 |
| Irreducible Curriculum for Language Model Pretraining | Oct 23, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Instruction Tuning with Human Curriculum | Oct 14, 2023 | ARCMMLU | —Unverified | 0 |
| Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models | Oct 9, 2023 | MMLU | —Unverified | 0 |
| Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models | Sep 27, 2023 | HumanEvalLanguage Modeling | CodeCode Available | 0 |
| Pruning Large Language Models via Accuracy Predictor | Sep 18, 2023 | MMLUModel Compression | —Unverified | 0 |
| Empowering Cross-lingual Abilities of Instruction-tuned Large Language Models by Translation-following demonstrations | Aug 27, 2023 | Instruction FollowingMMLU | CodeCode Available | 0 |
| The Poison of Alignment | Aug 25, 2023 | MMLU | —Unverified | 0 |
| Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning | Jun 25, 2023 | counterfactualMath | —Unverified | 0 |
| Inconsistencies in Masked Language Models | Dec 30, 2022 | LAMBADAMMLU | CodeCode Available | 0 |
| Measuring Progress on Scalable Oversight for Large Language Models | Nov 4, 2022 | Experimental DesignLanguage Modelling | —Unverified | 0 |
| Transcending Scaling Laws with 0.1% Extra Compute | Oct 20, 2022 | Arithmetic ReasoningCross-Lingual Question Answering | —Unverified | 0 |