The Irony of "Small" Large Language Models
In the world of artificial intelligence, describing something as "small" has become quite ironic.
Not long ago, any AI model with a billion parameters was seen as huge, a real giant in the world of tech. Now, we find ourselves calling models with 11 billion parameters "small." This change in terms is more than just words—it shows a big shift in AI research.
The development of models like the Falcon's New Falcon 2 (11B), Meta's Llama 3 (8B), and Google's Gemma (7B) marks this huge change. Each of these models might be "small" by today's standards, but they would have been giants just a few years back. Their release by major tech companies is a race to find the right balance between power and practicality, pushing the limits of what is considered "powerful enough" in the world of AI language models.
This race is not just about doing more with less but about reshaping what AI can do, making powerful tools more usable and fitting them into more places. As we explore what these smaller LLMs mean, it's clear they're not just about being smaller. They're about rethinking what it means to be big and powerful in the world of generative AI.
According to this article, the author consider a language model with less than 13 billion parameters as a "small model."
Local Deployment and Model Efficiency
One of the biggest advantages of smaller Large Language Models (LLMs) is their potential for local deployment. This means that these AI models can be run on less powerful machines, which aren't part of huge data centers. This is a big change from the traditional setup where powerful and huge AI models needed massive computing infrastructure to function.
Practical Implications:
The Trade-Off:
Architectural Innovations Needed?
This discussion about efficiency versus power in smaller LLMs is crucial as it shapes how and where AI can be integrated into our daily lives and businesses. The ability to run powerful AI applications locally without massive infrastructure could democratize access to AI, but it remains to be seen if these smaller models can truly stand up to the challenge without significant breakthroughs in how they are built.
The Role of Data Quality
The accessibility of smaller Large Language Models (LLMs) is indeed a significant advancement, but the quality of data they are trained on remains a pivotal factor. This aspect of AI development is critical because no matter how advanced a model's architecture might be, its performance heavily depends on the quality and breadth of its training data.
Data Quality Impact:
Recommended by LinkedIn
Continued Dominance of Larger Models:
Potential for a Digital Divide:
This situation underscores the importance of breakthroughs not just in how models are built but also in how they are trained. Innovations that allow smaller models to learn more effectively from smaller or less diverse datasets could help bridge the gap, democratizing access to powerful AI capabilities. The future of AI could hinge on our ability to make high-level AI tools not just more efficient but also more universally accessible.
Seeking the Next Breakthrough
As we continue to push the boundaries of what's possible with AI, there's a growing recognition that the field might be on the cusp of another major breakthrough, reminiscent of the "Attention is all you need" paper that introduced transformers. This kind of revolutionary development is crucial for smaller Large Language Models (LLMs), which face inherent limitations due to their size. A new breakthrough could enable these smaller models to perform at, or even surpass, the levels achieved by today's larger models.
Why We Need a Breakthrough:
Practical Implications for End Users
Despite the focus on academic metrics like model parameters and benchmarks, the ultimate measure of a model's success is its utility to end users. For most people and businesses, the technical specifications of a model are less important than how well it meets their needs.
User-Centric Performance Metrics:
Conclusion: The Ongoing Dominance of Large Models
Despite the exciting advancements and potential of smaller Large Language Models (LLMs), it's important to acknowledge that, as of now, larger models still hold a significant advantage in many areas of artificial intelligence. The sheer scale of their training data and computational power allows these giants to perform complex tasks with a level of nuance and depth that smaller models struggle to match.
As we stand, the performance gap between large and smaller models is clear, with larger models leading in most performance benchmarks. They handle more complex queries and deliver more accurate responses across a broader range of tasks, which continues to make them the preferred choice for most high-stakes applications.
However, the AI community remains hopeful. While we may seem to be hitting a wall in terms of balancing size with efficiency and cost, the history of technological breakthroughs gives us reason to believe that we will find a way to overcome these barriers. The quest for more manageable yet powerful AI models is ongoing, and with continued innovation and research, smaller models may soon close the gap with their larger counterparts.
In this journey, the ultimate goal remains clear: to develop AI technologies that are not only powerful and efficient but also accessible and useful across a wide range of applications. As we look to the future, the potential for breakthroughs that reshape the landscape of AI technology is both a challenge and a promise, keeping the field dynamic and ever-evolving.
Chief Technology Officer @ Genius.AI | AI Strategy, Generative AI Technologies
2moAt the end, it all depends on the task. I would argue that a good 60% of repetitive tasks that plague an average business could probably be solved with smaller large language model.