Moving Generative AI Past Transformers for Efficient Language Models with Lower Compute Needs

Kishore Gopalan

Principal Architect, Financial Services at Google

Published Apr 26, 2024

The development of LLMs and multimodal AI models (like Gemini) has led to the belief that GPUs are foundational to the growth of AI. While GPUs are important, they are only a stepping stone, rather than the ultimate currency.

Here at Google - the birthplace of transformers - Google DeepMind published a research on a new architecture that can train LLMs with far fewer tokens, and therefore, lesser GPUs. You can read the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"

During research, Griffin matched the performance of Llama-2, despite being trained on over 7 times fewer tokens. Griffin combines gated linear recurrences with local attention, achieving performance comparable to transformers on certain tasks while requiring fewer tokens for training. This translates to reduced computational needs and less reliance on GPUs. Moreover, Griffin's ability to handle extrapolation tasks and effectively learn to retrieve information further solidifies its potential.

Griffin performs better than Transformers when evaluated on sequences longer than those seen during training, and can also efficiently learn copying and retrieval tasks from training data.

Recommended by LinkedIn

The Week of Small Language Models

AIM 3 months ago

The Limits of Retrieval Augmentation, 8 AI Research…

Open Data Science Conference (ODSC) 8 months ago

The Week of Small Language Models

Bhasker Gupta 3 months ago

Building upon Griffin's foundation, we introduced RecurrentGemma, a SOTA model that exemplifies the power of RNNs to deliver high performance with fewer tokens. You can read the paper: "RecurrentGemma: Moving Past Transformers for Efficient Open Language Models"

RecurrentGemma combines linear recurrences with local attention to achieve excellent performance on language tasks. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. This is in contrast to transformers, whose memory usage grows with sequence length. Additionally, RecurrentGemma-2B achieves similar performance while being trained on fewer tokens than Gemma-2B.

You can download RecurrentGemma on Hugging Face and deploy it on Vertex AI.

Just as Moore's Law predicted the exponential growth in transistor density, leading to smaller yet more powerful integrated circuits, Griffin lays the groundwork for future LLM architectures that will require fewer tokens and less computational power. This opens up exciting possibilities for creating more capable models without being constrained by compute availability and at a lower cost.

The Transformational CxO

793 followers

+ Subscribe

Therese Tarlinton

🎤 Keynote Speaker | ✨ Marketing Collaboration & Strategic Partnership Specialist | 🥇 2022 Book of the Year & Amazon Bestselling Author | 📗 Business & Incubator Advisor | 👩🦰 "The Multiplier Effect"

6mo

That's some forward-thinking insight. Looking beyond GPUs to streamline LLM architectures is key for future advancements.

2 Reactions

To view or add a comment, sign in

More articles by Kishore Gopalan

Overcoming Generative AI FOMO: Follow the Research, Not the Tools

Aug 21, 2024

Overcoming Generative AI FOMO: Follow the Research, Not the Tools

Generative AI has emerged as a transformative force, presenting both immense opportunities and unseen challenges for…

1 Comment
Long-Context Language Models (Gemini 1.5) as a Potential Replacement for RAG Methods

Jun 28, 2024

Long-Context Language Models (Gemini 1.5) as a Potential Replacement for RAG Methods

Retrieval Augmented Generation (RAG) has been widely used for tasks that require retrieving relevant information from…
Lessons from the Dotcom Era for AI Success: Avoiding Hype-Driven Failure and Building Sustainable Business Value

Mar 8, 2024

Lessons from the Dotcom Era for AI Success: Avoiding Hype-Driven Failure and Building Sustainable Business Value

The technological landscape is cyclical, with periods of fervent excitement followed by inevitable corrections. Today's…

11 Comments
Preparing for an AI-Driven Future: Assessing the Hype, Challenges and Opportunities for Individuals and Organizations

Feb 27, 2024

Preparing for an AI-Driven Future: Assessing the Hype, Challenges and Opportunities for Individuals and Organizations

The dazzles of AI today obscures some deeper questions. Experts are locked in a philosophical battle over AI's…

7 Comments
LLM LLM On the Wall, Who's the Best of Them All? Answer: It's Complicated!

Nov 21, 2023

LLM LLM On the Wall, Who's the Best of Them All? Answer: It's Complicated!

The world finds itself characterized by a divide between two groups: those who possess a comprehensive understanding of…
Making LLMs More Useful for Organizations: Smaller, More Interpretable, More Factual and Cost-Effective

Sep 19, 2023

Making LLMs More Useful for Organizations: Smaller, More Interpretable, More Factual and Cost-Effective

Google Researches published a groundbreaking research "Distilling Step-by-Step! Outperforming Larger Language Models…
How Do We Facilitate Safe and Responsible Adoption of Generative AI for Individuals and Regulated Enterprises in the World?

Mar 30, 2023

How Do We Facilitate Safe and Responsible Adoption of Generative AI for Individuals and Regulated Enterprises in the World?

Generative AI is a powerful technology. But with power also comes responsibility.
Demystifying Generative AI: The Most Spoken About, But the Least Understood Technology Ever. Are You Really Ready to Adopt It?

Mar 14, 2023

Demystifying Generative AI: The Most Spoken About, But the Least Understood Technology Ever. Are You Really Ready to Adopt It?

Generative AI is the newest suction that's making the world gravitate into a technology that is most spoken about, but…

5 Comments
Kishore's 2023 Predictions on Data, AI, Metaverse and What You Can Learn About the Future of Digital Transformation from a Kid Playing Minecraft

Jan 3, 2023

Kishore's 2023 Predictions on Data, AI, Metaverse and What You Can Learn About the Future of Digital Transformation from a Kid Playing Minecraft

It's now a cliche that all aspects of digital transformation for any organization in any industry rely…

6 Comments
Dear CIO: You Probably Migrated to the Cloud for the Wrong Reasons. (But You Can Fix it!)

Nov 22, 2022

Dear CIO: You Probably Migrated to the Cloud for the Wrong Reasons. (But You Can Fix it!)

Cloud Computing means many things to many leaders. It all started with the grand idea of being able to run your VMs…

See all articles

Moving Generative AI Past Transformers for Efficient Language Models with Lower Compute Needs

Kishore Gopalan

Principal Architect, Financial Services at Google

Recommended by LinkedIn

The Transformational CxO

793 followers

More articles by Kishore Gopalan

Insights from the community

Others also viewed

Unlocking LLM Potential with Memory Compression: ARM and RISC-V's Role

LLM Pulse- October 15, 2024

💪 Is Google Back in the AI Race?

Revolutionizing Industries: The Transformative Power of Intel Xeon Processors

Maximizing GPU and TPU Utilization on GPTs and LLMs with Vector Databases and Speedb

🥇Top ML Papers of the Week

Quantization Techniques for LLMs

Acceleration in Innovation! The Latest Breakthroughs in Conversational AI, Computer Vision and Recommender Systems with NVIDIA

15x Faster than Llama 2: DeciLM, a NAS-Generated LLM with Variable GQA

LLM Hallucination: an Optimization Problem or an Architecture Problem?

Explore topics

Recommended by LinkedIn

The Transformational CxO

793 followers

More articles by Kishore Gopalan

Overcoming Generative AI FOMO: Follow the Research, Not the Tools

Long-Context Language Models (Gemini 1.5) as a Potential Replacement for RAG Methods

Lessons from the Dotcom Era for AI Success: Avoiding Hype-Driven Failure and Building Sustainable Business Value

Preparing for an AI-Driven Future: Assessing the Hype, Challenges and Opportunities for Individuals and Organizations

LLM LLM On the Wall, Who's the Best of Them All? Answer: It's Complicated!

Making LLMs More Useful for Organizations: Smaller, More Interpretable, More Factual and Cost-Effective

How Do We Facilitate Safe and Responsible Adoption of Generative AI for Individuals and Regulated Enterprises in the World?

Demystifying Generative AI: The Most Spoken About, But the Least Understood Technology Ever. Are You Really Ready to Adopt It?

Kishore's 2023 Predictions on Data, AI, Metaverse and What You Can Learn About the Future of Digital Transformation from a Kid Playing Minecraft

Dear CIO: You Probably Migrated to the Cloud for the Wrong Reasons. (But You Can Fix it!)

Insights from the community

Others also viewed

Unlocking LLM Potential with Memory Compression: ARM and RISC-V's Role

LLM Pulse- October 15, 2024

💪 Is Google Back in the AI Race?

Revolutionizing Industries: The Transformative Power of Intel Xeon Processors

Maximizing GPU and TPU Utilization on GPTs and LLMs with Vector Databases and Speedb

🥇Top ML Papers of the Week

Quantization Techniques for LLMs

Acceleration in Innovation! The Latest Breakthroughs in Conversational AI, Computer Vision and Recommender Systems with NVIDIA

15x Faster than Llama 2: DeciLM, a NAS-Generated LLM with Variable GQA

LLM Hallucination: an Optimization Problem or an Architecture Problem?

Explore topics