How the OpenAI GPT-4 model is performing | Gary Longsine posted on the topic

Fractional CTO. Collaborate • Deliver • Iterate. 📱

10mo

Musings… According to rumors considered by those who know more than me about it to be probably not far off the mark, the OpenAI GPT-4 model is (probably) a "mixture of experts" (MoE) model with an estimated total of about 1.7 trillion parameters. Each generation of these models (GPT-2, GPT-3, GPT-3.5) has proven to be more capable than the last. Researchers in recent years appear to have been surprised that only two tuning knobs, parameter count and training tokens count, seem to drive the increase in capability. (The other at-least-two tuning knobs they thought should matter… don't matter to us, today.) The not-uncommon expectation that "the only thing that matters in the next several years of AI technology is the data" has a certain grounding. It sure looks like the technology of the model doesn't need to improve, in order for rapid progress in the field to continue. In fact, in theory it's not even necessary for training data to expand, or even get cleaned up. It's likely the models could get quite a bit better for a generation or two merely by doubling the number of parameters a couple times. This will happen even if Moore's Law were to hit the wall, tomorrow, because more GPUs can be thrown at the problem for probably a few cycles. The transformer-based LLMs though, by themselves, probably will struggle to overcome the (poorly named) "hallucination" problem — all of their output is produced in the same way, it's all "hallucinations". Progress along the axis of reduced noise in the output will likely require improvements to the models, or combinations with new types of models in an MoE style network. The successes of the transformer architecture have ignited new thinking and provide a baseline against which the performance of new ideas can be tested and measured. New ideas are being generated at a rapid clip, probably more than we realize are soon discarded. Every few weeks something shiny turns up. What do you think the next year in AI be like? https://lnkd.in/gR83wTpc

Mamba: Redefining Sequence Modeling and Outforming Transformers Architecture - Unite.AI

https://www.unite.ai

2 Comments

Eric Geller (אברם וולצ)

Account manager at Connection, Co-founder of Short Hop Games and ESG Biosolutions, polymath, world builder, and Passionate about making this Earth a better place to live. (My opinions my own)

10mo

I think we’re going to see a whole lot more use cases functionally explored around AI agents like Copilot. I imagine we’ll see more focus on utilizing NPUs to balance out GPU and gain some efficiency/ cost saving on compute.

1 Reaction

To view or add a comment, sign in

Gary Longsine’s Post

Mamba: Redefining Sequence Modeling and Outforming Transformers Architecture - Unite.AI

https://www.unite.ai

More from this author

Freeze Your Credit — Protect Yourself

How to Use Passkeys at LinkedIn (more secure and simpler than legacy passwords)

How to Read the Apple Silicon M-Series Tea Leaves (M2 Extreme)

Explore topics