Prompt-Engineering-Guide/pages/models/gemma.en.mdx

# Gemma

Google DeepMind releases Gemma, a series of open language models inspired by the same research and technology used to create Gemini. The Gemma model release includes 2B (trained on 2T tokens) and 7B (trained on 6T tokens) models including base and instruction tuned checkpoints. The models are trained on a context length of 8192 tokens and generally outperform Llama 2 7B and Mistral 7B models on several benchmarks.

The Gemma model architecture is based on the transformer decoder with improvements including [multi-query attention](http://arxiv.org/abs/1911.02150) (used by the 2B model), multi-head attention (used by 7B model), [RoPE embeddings](https://arxiv.org/abs/2104.09864), [GeGLU activations](https://arxiv.org/abs/2002.05202), and [normalizer location](http://arxiv.org/abs/1910.07467).

According to the [technical report](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf), Gemma 2B and 7B are trained on 2T and 6T tokens mainly consisting of web documents, mathematics, and code. Unlike Gemini, these models are not explicitly trained to support multilingual or multimodal capabilities. The vocabulary size is 256K tokens and uses a subset of the SentencePiece tokenize of Gemini, preserves whitespace in splits digits, and relies on byte-level encodings for unknown tokens.

The instruction tuned models are tuned using supervised fine-tuning on a mix of text-only synthetic and human-generated prompt response pairs and reinforcement learning from human feedback (RLHF) with the reward model trained on labelled preference data and the policy based on a set of high-quality prompts. Note that all the datasets used are English only. As shown in the table below, the instruction tuned models also use specific formatting control tokens to indicate roles and turns in a conversation.

!["Gemma Control Tokens"](../../img/gemma/control-tokens.png)

## Results

As shown in the figure below, the Gemma 7B model demonstrates strong performance on math, science, and code-related tasks. The scores correspond to the average scores on academic benchmark evaluations grouped by capability.

!["Gemma Capabilities"](../../img/gemma/capabilities.png)

Gemma 7B outperforms Llama 2 7B and Mistral 7B on various academic benchmarks with notable performance on HumanEval, GSM8K, MATH, and AGIEval and improved performance on reasoning, dialogue, mathematics, and code.

!["Gemma Safety"](../../img/gemma/safety.png)

The Gemma 7B instruction tuned models also outperform the Mistral-7B v0.2 Instruct model on safety and instruction following as evaluated by humans.

!["Gemma Safety"](../../img/gemma/safety.png)

Gemma is also evaluated on several safety academic benchmarks and compared with Mistral. The technical report also mentions the use of debiasing techniques and red-teaming to potentially mitigate common risks associated with large language models (LLMs). You can find more information on how to responsibly develop with Gemma in the [model card](https://ai.google.dev/gemma/docs/model_card) and [Responsible Generative AI toolkit](https://ai.google.dev/responsible).

!["Gemma Safety"](../../img/gemma/safety-2.png)


## Resources and Integrations

Here are several resources and integrations that were part of the Gemma release:

- [Colab](https://ai.google.dev/gemma/docs/get_started) and [Kaggle](https://www.kaggle.com/models/google/gemma/code) notebooks
- [Hugging Face models](https://huggingface.co/collections/google/gemma-release-65d5efbccdbb8c4202ec078b)
- [MaxText](https://github.com/google/maxtext)
- [NVIDIA NeMo](https://github.com/NVIDIA/GenerativeAIExamples/tree/main/models/Gemma)
- [TensorRT-LLM](https://developer.nvidia.com/blog/nvidia-tensorrt-llm-revs-up-inference-for-google-gemma/)
- Gemma 7B is available in the [NVIDIA AI Playground](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-foundation/models/gemma-7b)

According to the official [blog release](https://blog.google/technology/developers/gemma-open-models/), the [Terms of Use](https://www.kaggle.com/models/google/gemma/license/consent) permit responsible commercial usage and distribution for all organizations, regardless of size.

## References

- [Gemma: Introducing new state-of-the-art open models](https://blog.google/technology/developers/gemma-open-models/)
- [Gemma: Open Models Based on Gemini Research and Technology](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf)
- [Responsible Generative AI Toolkit](https://ai.google.dev/responsible)
- [Fast Transformer Decoding: One Write-Head is All You Need](https://arxiv.org/abs/1911.02150)
- [Roformer: Enhanced transformer with rotary position embedding](https://arxiv.org/abs/2104.09864)
- [GLU variants improve transformer](https://arxiv.org/abs/2002.05202)
- [Root mean square layer normalization](http://arxiv.org/abs/1910.07467)