Authors: Gemma Team
Year: 2024
Published in: ArXiv
Institution: Google DeepMind, Google
Research Area: LLM, Model Efficiency, Architecture
Discipline: Artificial Intelligence
Gemma 2 introduces scalable Transformer-based language models (2B-27B parameters) enhanced with techniques like local-global and group-query attention, achieving state-of-the-art performance for their size and competing with larger models.
Methods: The study applied modifications to the Transformer architecture, such as local-global attentions and group-query attention, as well as knowledge distillation training for select model sizes.
Key Findings: Performance of lightweight language models in terms of efficiency and competitiveness with larger models.
Citations: 1649