Mistral 7B

2023-10-10Code Available6· sign in to hype

Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de Las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/mistralai/mistral-src
OfficialIn paperpytorch★ 10,731
github.com/facebookresearch/fairseq2
pytorch★ 1,122
github.com/mgmalek/efficient_cross_entropy
pytorch★ 124
github.com/ninglab/ecellm
pytorch★ 55
github.com/pwc-1/Paper-9/tree/main/2/mistral
mindspore★ 0

Abstract

We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license.

Tasks

answerability prediction Arithmetic Reasoning Chatbot Code Generation Common Sense Reasoning Language Modeling Language Modelling Math Mathematical Reasoning Math Word Problem Solving Multi-task Language Understanding Question Answering Sentence Completion World Knowledge Zero-Shot Video Question Answer

Mistral 7B

Code

Abstract

Tasks

Benchmark Results

Reproductions