Last night I was invited to another great session with Marian Hurducaș on UPGRADE 100 Live Talks @ Radio Guerrilla on the topic of Deepseek, the Chinese company that stole the AI rug from underneath US giants feet for the last couple of days.
In short, main points were 👇:
1) via Yann LeCun: the deepseek team stood on the shoulders of open-source research / models / code to achieve something that is indeed an impressive feat of engineering, matching o1 / sonnet 3.5 on some benchmarks. They did do some proper innovation around the training recipe (Group Relative Policy Optimisation) hats off!
2) DeepSeek system is not fully OpenSource! We should talk Systems, not models. Everything is a system & there are various degrees of "open-source", including "open-weights" / "open-code" / released paper / recipe / licenses etc. We should not generalise everything into 100% completely open to the public, always.
3) It is still under debate whether or not their entire cost was 5M$. This seems to be feasible for training, corroborated by external parties running the numbers, but it is much cheaper to be the follow-up team than the leading team, and such an effort is not solely about training costs. Currently, multiple companies are trying to replicate the results with even less money (2-3M) Wow. How? They already have the recipe, so it's cheaper!
4) Seems like China is making a statement, challenging the US leaders in AI, and succeeding to some extent (stock market volatility , -17% etc.. PS Nvidia should not have slumped, they will likely bounce back, specially after the Trump Stargate partnerships). This is the most important argument for not fully trusting the DeepSeek numbers and the approach presented. They stand to gain a lot more than whatever money they might have invested. Rumours have magical effects.
I'll be talking more about this at RO AI Alliance meet-ups, starting 13'th of Feb.
Get your thinking caps on! We have cookies. 🥂🧠👽
#machinelearning #deepseek #radiotalks #nerd