“Training models is a cost center, but inference is a profit center, and unless you make money on inference, ubiquitous AI is not going to happen.” Here's how we deliver data center scale GenAI inference: https://lnkd.in/dnNAG3bg
Recogni’s Post
More Relevant Posts
-
Buckle up your seats. Recogni’s simulations have its single-rack system offering 320,000 tokens/s for Llama3.1-405B (2304 concurrent users), with 3× faster time to first token (TTFT) and 6.4× faster time per output token (TPOT) compared to an equivalent size Nvidia H200-based system. Competition to the NVIDIA domination is coming and real !!! #genai #gpu Celesta Capital
“Training models is a cost center, but inference is a profit center, and unless you make money on inference, ubiquitous AI is not going to happen.” Here's how we deliver data center scale GenAI inference: https://lnkd.in/dnNAG3bg
To view or add a comment, sign in
-
The economics of multi-modal #GenerativeAI Inference is going to matter the most as the industry strives to make the use of AI more ubiquitous and broad. This is essential for broad-based use of AI across enterprises too. With the recent chain-of-thought model release from OpenAI - we just saw a major jump in the amount of time the model runs before generating a response. This is potentially an order-of-magnitude increase in Inference costs - and adds to the pressure to make #Inference more cost effective from both a #CAPEX and an #OPEX perspective. We saw also news about Microsoft looking to reactivate the Unit-1 reactor at Three Mile Island with a major investment and commitment. This points to the big challenge around power generation capacity for #AIDataCenters across the whole world - which again reinforces all of us to think about the energy consumption of #GenAI #Inference systems. At Recogni where are singularly focused on making #GenAI #Inference optimal from #LogMath to #Silicon to #Systems - this requires a first principles thinking and approach - where one can support the widest set of #AIModels - across all modalities and most importantly at all scales and sizes.
“Training models is a cost center, but inference is a profit center, and unless you make money on inference, ubiquitous AI is not going to happen.” Here's how we deliver data center scale GenAI inference: https://lnkd.in/dnNAG3bg
To view or add a comment, sign in
-
Updates on #AI : MinIO Dives into AI with AIStore Launch, Full article link 👇🏻👇🏻 https://lnkd.in/dXCfVhuE #artificialintelligence #machinelearning #ML
To view or add a comment, sign in
-
In today's AI-focused world... selling dumb storage boxes is not going to cut it. A Data Platform like #VAST is the next leap forward. Check this short article! #VAST #AI #Innovation
To view or add a comment, sign in
-
Our weekly update of AI news is here! Check out what happened below ⤵ ▪ Timescale launched pgai Vectorizer, an open-source and cloud-hosted tool integrating advanced AI capabilities into PostgreSQL. ▪ Nexusflow unveiled Athene-V2, a "suite of fine-tuned 72B AI models designed to compete with GPT-4o across specialized use cases." ▪ Eviden, part of the Atos Group, introduced BXI v3, the latest gen of its BullSequana eXascale Interconnect technology. ▪ Alif Semiconductor and Edge Impulse announced full integration of NVIDIA's TAO model training toolkit into Alif's Ensemble and Balletto MCUs. ▪ and more from Ironclad, Qwen and Fixie.ai. https://lnkd.in/e6yAGAuh #AINews #AI #TechNews
To view or add a comment, sign in
-
Updates on #AI : MinIO Dives into AI with AIStore Launch, Full article link 👇🏻👇🏻 https://lnkd.in/dK2rJzkG #artificialintelligence #machinelearning #ML
To view or add a comment, sign in
-
How can your organization build effective data pipelines for #AI? VAST Data uncovers the role of their all-flash storage to optimize data flow and unlock the full potential of AI initiatives: https://ow.ly/faCJ50SOAyp
To view or add a comment, sign in
-
At Celona we have had this vision from our founding in 2019 but sometimes you need to wait for the right time to talk about it. We are seeing this new #IndustrialIntelligence stack developing. The #WirelessEdge is not the only innovation needed and might not even be the one that creates the most hype but I have no doubt that this is a critical layer in this stack. I would love to hear any opinions as we learn more and get ready for the new age of AI. https://lnkd.in/gKSP8-yi
To view or add a comment, sign in
-
The AI Data Processing Paradigm Shift: What You Need to Know #ai #llm #generativeAI #dataengineering #machinelearning 👀 https://lnkd.in/gRcasVR2
To view or add a comment, sign in
-
Let's break down why DeepSeek's AI innovations are blowing people's minds... First, some context: Right now, training top AI models is INSANELY expensive. OpenAI, Anthropic, etc. spend $100M+ just on compute. They need massive data centers with thousands of $40K GPUs. It's like needing a whole power plant to run a factory. DeepSeek just showed up and said "LOL what if we did this for $5M instead?" And they didn't just talk - they actually DID it. Their models match or beat GPT-4 and Claude on many tasks. The AI world is shook. How? They rethought everything from the ground up. Traditional AI is like writing every number with 32 decimal places. DeepSeek was like "what if we just used 8? It's still accurate enough!" Boom - 75% less memory needed. Then there's their "multi-token" system. Normal AI reads like a first-grader: "The... cat... sat..." DeepSeek reads in whole phrases at once. 2x faster, 90% as accurate. When you're processing billions of words, this MATTERS. But here's the really clever bit: They built an "expert system." Instead of one massive AI trying to know everything (like having one person be a doctor, lawyer, AND engineer), they have specialized experts that only wake up when needed. Traditional models? All 1.8 trillion parameters active ALL THE TIME. DeepSeek? 671B total but only 37B active at once. It's like having a huge team but only calling in the experts you actually need for each task. The results are mind-blowing: - Training cost: $100M → $5M - GPUs needed: 100,000 → 2,000 - API costs: 95% cheaper - Can run on gaming GPUs instead of data center hardware But wait, you might say, "there must be a catch!" That's the wild part - it's all open source. Anyone can check their work. The code is public. The technical papers explain everything. It's not magic, just incredibly clever engineering. Why does this matter? Because it breaks the model of "only huge tech companies can play in AI." You don't need a billion-dollar data center anymore. A few good GPUs might do it. And here's the kicker: DeepSeek did this with a team of <200 people. Meanwhile, Meta has teams where the compensation alone exceeds DeepSeek's entire training budget... and their models aren't as good. The implications are huge: - AI development becomes more accessible - Competition increases dramatically - The "moats" of big tech companies look more like puddles - Hardware requirements (and costs) plummet Of course, giants like OpenAI and Anthropic won't stand still. They're probably already implementing these innovations. But the efficiency genie is out of the bottle - there's no going back to the "just throw more GPUs at it" approach. AI is about to become a lot more accessible, and a lot less expensive. The question isn't if this will disrupt the current players, but how fast.
To view or add a comment, sign in