Lightscape Partners’ Post

Lightscape Partners

425 followers

4mo

We are thrilled to be following and supporting the team at Etched!

Etched

10,328 followers

4mo Edited

Meet Sohu, the fastest AI chip of all time. With over 500,000 tokens per second in Llama 70B throughput, Sohu lets you build products impossible on GPUs. Sohu is the world’s first specialized chip (ASIC) for transformers (the “T” in ChatGPT). By burning the transformer architecture into our chip, we can’t run most traditional AI models. But for generative AI models, like ChatGPT (text), SD3 (images), and Sora (video), Sohu has unparalleled performance. One Sohu server runs over 500,000 Llama 70B tokens per second: >20x more than an H100 server (23,000 tokens/sec), and >10x more than a B200 server. We recently raised $120M from Primary Venture Partners and Positive Sum, with participation from Two Sigma Ventures, Skybox Datacenters, Hummingbird Ventures, Oceans, Fundomo, Velvet Sea Ventures, Fontinalis Partners, Galaxy, Earthshot Ventures, Max Ventures and Lightscape Partners. We’re grateful for the support of industry leaders, including Peter Thiel, David Siegel, Thomas Dohmke, Jason Warner, Amjad Masad, Kyle Vogt, Stanley Freeman Druckenmiller, and many more. We’re on track for one of the fastest chip launches in history: - Top hardware engineers and AI researchers have left every major AI chip project to join us. - We’ve partnered directly with TSMC on their 4nm process. We’ve secured HBM and server supply from top vendors and can quickly ramp our first year of production. - Our early customers have reserved tens of millions of dollars of our hardware. As we hit the limits of speed, cost, and scale on GPUs, specialized chips are inevitable. If you want to change the future of AI compute, please join us at www.etched.com/careers. (Benchmarks are from running in FP8 without sparsity at 8x model parallelism with 2048 input/128 output lengths. 8xH100s figures are from TensorRT-LLM 0.10.08 (latest version), and 8xB200 figures are estimated. This is the same benchmark NVIDIA and AMD use.)

To view or add a comment, sign in

More Relevant Posts

Etched

10,328 followers
4mo Edited
Report this post
Meet Sohu, the fastest AI chip of all time. With over 500,000 tokens per second in Llama 70B throughput, Sohu lets you build products impossible on GPUs. Sohu is the world’s first specialized chip (ASIC) for transformers (the “T” in ChatGPT). By burning the transformer architecture into our chip, we can’t run most traditional AI models. But for generative AI models, like ChatGPT (text), SD3 (images), and Sora (video), Sohu has unparalleled performance. One Sohu server runs over 500,000 Llama 70B tokens per second: >20x more than an H100 server (23,000 tokens/sec), and >10x more than a B200 server. We recently raised $120M from Primary Venture Partners and Positive Sum, with participation from Two Sigma Ventures, Skybox Datacenters, Hummingbird Ventures, Oceans, Fundomo, Velvet Sea Ventures, Fontinalis Partners, Galaxy, Earthshot Ventures, Max Ventures and Lightscape Partners. We’re grateful for the support of industry leaders, including Peter Thiel, David Siegel, Thomas Dohmke, Jason Warner, Amjad Masad, Kyle Vogt, Stanley Freeman Druckenmiller, and many more. We’re on track for one of the fastest chip launches in history: - Top hardware engineers and AI researchers have left every major AI chip project to join us. - We’ve partnered directly with TSMC on their 4nm process. We’ve secured HBM and server supply from top vendors and can quickly ramp our first year of production. - Our early customers have reserved tens of millions of dollars of our hardware. As we hit the limits of speed, cost, and scale on GPUs, specialized chips are inevitable. If you want to change the future of AI compute, please join us at www.etched.com/careers. (Benchmarks are from running in FP8 without sparsity at 8x model parallelism with 2048 input/128 output lengths. 8xH100s figures are from TensorRT-LLM 0.10.08 (latest version), and 8xB200 figures are estimated. This is the same benchmark NVIDIA and AMD use.)
113 Comments
Like Comment
To view or add a comment, sign in
Darsh Patel

Cybersecurity | NetOps | Machine Learning
4mo
Report this post
This is an absolute game-changer. While GPUs and TPUs offer flexibility over various models like CNNs, RNNs, and GANs, Etched is creating faster chips that can ONLY run Transformers, BURNING them into the chip itself! Wowww. This is insane. Etched 🔥 #AI #Startups
Etched

10,328 followers
4mo Edited

Meet Sohu, the fastest AI chip of all time. With over 500,000 tokens per second in Llama 70B throughput, Sohu lets you build products impossible on GPUs. Sohu is the world’s first specialized chip (ASIC) for transformers (the “T” in ChatGPT). By burning the transformer architecture into our chip, we can’t run most traditional AI models. But for generative AI models, like ChatGPT (text), SD3 (images), and Sora (video), Sohu has unparalleled performance. One Sohu server runs over 500,000 Llama 70B tokens per second: >20x more than an H100 server (23,000 tokens/sec), and >10x more than a B200 server. We recently raised $120M from Primary Venture Partners and Positive Sum, with participation from Two Sigma Ventures, Skybox Datacenters, Hummingbird Ventures, Oceans, Fundomo, Velvet Sea Ventures, Fontinalis Partners, Galaxy, Earthshot Ventures, Max Ventures and Lightscape Partners. We’re grateful for the support of industry leaders, including Peter Thiel, David Siegel, Thomas Dohmke, Jason Warner, Amjad Masad, Kyle Vogt, Stanley Freeman Druckenmiller, and many more. We’re on track for one of the fastest chip launches in history: - Top hardware engineers and AI researchers have left every major AI chip project to join us. - We’ve partnered directly with TSMC on their 4nm process. We’ve secured HBM and server supply from top vendors and can quickly ramp our first year of production. - Our early customers have reserved tens of millions of dollars of our hardware. As we hit the limits of speed, cost, and scale on GPUs, specialized chips are inevitable. If you want to change the future of AI compute, please join us at www.etched.com/careers. (Benchmarks are from running in FP8 without sparsity at 8x model parallelism with 2048 input/128 output lengths. 8xH100s figures are from TensorRT-LLM 0.10.08 (latest version), and 8xB200 figures are estimated. This is the same benchmark NVIDIA and AMD use.)
Like Comment
To view or add a comment, sign in
Rana el Kaliouby, Ph.D.
4mo Edited
Report this post
The AI chip landscape, currently dominated by NVIDIA is SO ripe for disruption. Congratulations Etched on your $120M Series A round. I am so proud to be an investor. Training AI models today cost billions of dollars (not to mention that training these models consumes the power equivalent of what a country like Costa Rica consumes in a whole year!). At the current pace, our hardware, our power grids, and pocketbooks can’t keep up. Etched is changing this. In 2022, they made a bet that transformers (the “T” in ChatGPT) would take over the world. The team made that bet even before OpenAI released ChatGPT to the public - how cool! Today, every state-of-the-art AI model is a transformer: ChatGPT, Sora, Gemini, Stable Diffusion 3, and more. The team Etched spent the past two years building Sohu, the world’s first specialized chip (ASIC) for transformers. Sohu is an order of magnitude faster and cheaper than NVIDIA’s next generation of Blackwell (GB200) GPUs when running text, image, and video transformers. Etched is making one the biggest bets in #AI right now and they are on track for one of the fastest chip launches in history. If they pull this off, every AI product will run on their chips. Excited to be on this journey with the team. Gavin Uberti Robert W. Ps. Thank you Taryn Southern, Ocean Ventures for bringing me into this opportunity. Primary Venture Partners Positive Sum Two Sigma Ventures Skybox Datacenters Hummingbird Ventures Oceans Fundomo Velvet Sea Ventures Fontinalis Partners Galaxy Earthshot Ventures Lightscape Partners
Etched

10,328 followers
4mo Edited

Meet Sohu, the fastest AI chip of all time. With over 500,000 tokens per second in Llama 70B throughput, Sohu lets you build products impossible on GPUs. Sohu is the world’s first specialized chip (ASIC) for transformers (the “T” in ChatGPT). By burning the transformer architecture into our chip, we can’t run most traditional AI models. But for generative AI models, like ChatGPT (text), SD3 (images), and Sora (video), Sohu has unparalleled performance. One Sohu server runs over 500,000 Llama 70B tokens per second: >20x more than an H100 server (23,000 tokens/sec), and >10x more than a B200 server. We recently raised $120M from Primary Venture Partners and Positive Sum, with participation from Two Sigma Ventures, Skybox Datacenters, Hummingbird Ventures, Oceans, Fundomo, Velvet Sea Ventures, Fontinalis Partners, Galaxy, Earthshot Ventures, Max Ventures and Lightscape Partners. We’re grateful for the support of industry leaders, including Peter Thiel, David Siegel, Thomas Dohmke, Jason Warner, Amjad Masad, Kyle Vogt, Stanley Freeman Druckenmiller, and many more. We’re on track for one of the fastest chip launches in history: - Top hardware engineers and AI researchers have left every major AI chip project to join us. - We’ve partnered directly with TSMC on their 4nm process. We’ve secured HBM and server supply from top vendors and can quickly ramp our first year of production. - Our early customers have reserved tens of millions of dollars of our hardware. As we hit the limits of speed, cost, and scale on GPUs, specialized chips are inevitable. If you want to change the future of AI compute, please join us at www.etched.com/careers. (Benchmarks are from running in FP8 without sparsity at 8x model parallelism with 2048 input/128 output lengths. 8xH100s figures are from TensorRT-LLM 0.10.08 (latest version), and 8xB200 figures are estimated. This is the same benchmark NVIDIA and AMD use.)
13 Comments
Like Comment
To view or add a comment, sign in
Dr. Joyjit Chatterjee

Forbes 30U30 Europe | Data Scientist/Sr. BA @ Reckitt, UK| Green Talents Awardee (German Govt.) | PhD & Postdoc in AI (UofHull) | E&C Engineer (Gold Medalist)|Global Talent (UKRI) | Visiting Academic |CEng(I)| AI4Good
4mo
Report this post
Etched has just raised $120M for bringing to the market Sohu - world's fastest ASIC AI chip specific for Transformer models. Transformer is the latest and greatest model behind ChatGPT, Gemini, Stable Diffusion, Sora etc., and they are particularly well known for their parallel processing and multi-head attention capabilities, which would mean almost everything in GenAI space today can be trained and deployed via these chips. As long as Transformer continues to be prominent in the AI community (it doesn't seem Transformers are going away anytime soon!), Sohu will likely continue to thrive and can be highly transformative for the AI industry. #artificialintelligence #transformers #deeplearning #ai #ml #industry #innovation
Etched

10,328 followers
4mo Edited

Meet Sohu, the fastest AI chip of all time. With over 500,000 tokens per second in Llama 70B throughput, Sohu lets you build products impossible on GPUs. Sohu is the world’s first specialized chip (ASIC) for transformers (the “T” in ChatGPT). By burning the transformer architecture into our chip, we can’t run most traditional AI models. But for generative AI models, like ChatGPT (text), SD3 (images), and Sora (video), Sohu has unparalleled performance. One Sohu server runs over 500,000 Llama 70B tokens per second: >20x more than an H100 server (23,000 tokens/sec), and >10x more than a B200 server. We recently raised $120M from Primary Venture Partners and Positive Sum, with participation from Two Sigma Ventures, Skybox Datacenters, Hummingbird Ventures, Oceans, Fundomo, Velvet Sea Ventures, Fontinalis Partners, Galaxy, Earthshot Ventures, Max Ventures and Lightscape Partners. We’re grateful for the support of industry leaders, including Peter Thiel, David Siegel, Thomas Dohmke, Jason Warner, Amjad Masad, Kyle Vogt, Stanley Freeman Druckenmiller, and many more. We’re on track for one of the fastest chip launches in history: - Top hardware engineers and AI researchers have left every major AI chip project to join us. - We’ve partnered directly with TSMC on their 4nm process. We’ve secured HBM and server supply from top vendors and can quickly ramp our first year of production. - Our early customers have reserved tens of millions of dollars of our hardware. As we hit the limits of speed, cost, and scale on GPUs, specialized chips are inevitable. If you want to change the future of AI compute, please join us at www.etched.com/careers. (Benchmarks are from running in FP8 without sparsity at 8x model parallelism with 2048 input/128 output lengths. 8xH100s figures are from TensorRT-LLM 0.10.08 (latest version), and 8xB200 figures are estimated. This is the same benchmark NVIDIA and AMD use.)
Like Comment
To view or add a comment, sign in
Parag Paul

Co-Founder(AI Startup)| Post Doc (SBGCE) | Auburn (PhD) | Univ. of Washington(M.S) | Anna Uni | ServiceNow | Microsoft(SQL Server Engine)| Synopsys(VLSI) | Patents | Books | Conf. Panelist | Community Editor | IEEE pubs
4mo
Report this post
Keeping this post handy and in my timeline. Will update a few interesting findings in the subsequent articles. #AIStartup #AI #Founders #AIFOUNDERS #startups
Etched

10,328 followers
4mo Edited

Meet Sohu, the fastest AI chip of all time. With over 500,000 tokens per second in Llama 70B throughput, Sohu lets you build products impossible on GPUs. Sohu is the world’s first specialized chip (ASIC) for transformers (the “T” in ChatGPT). By burning the transformer architecture into our chip, we can’t run most traditional AI models. But for generative AI models, like ChatGPT (text), SD3 (images), and Sora (video), Sohu has unparalleled performance. One Sohu server runs over 500,000 Llama 70B tokens per second: >20x more than an H100 server (23,000 tokens/sec), and >10x more than a B200 server. We recently raised $120M from Primary Venture Partners and Positive Sum, with participation from Two Sigma Ventures, Skybox Datacenters, Hummingbird Ventures, Oceans, Fundomo, Velvet Sea Ventures, Fontinalis Partners, Galaxy, Earthshot Ventures, Max Ventures and Lightscape Partners. We’re grateful for the support of industry leaders, including Peter Thiel, David Siegel, Thomas Dohmke, Jason Warner, Amjad Masad, Kyle Vogt, Stanley Freeman Druckenmiller, and many more. We’re on track for one of the fastest chip launches in history: - Top hardware engineers and AI researchers have left every major AI chip project to join us. - We’ve partnered directly with TSMC on their 4nm process. We’ve secured HBM and server supply from top vendors and can quickly ramp our first year of production. - Our early customers have reserved tens of millions of dollars of our hardware. As we hit the limits of speed, cost, and scale on GPUs, specialized chips are inevitable. If you want to change the future of AI compute, please join us at www.etched.com/careers. (Benchmarks are from running in FP8 without sparsity at 8x model parallelism with 2048 input/128 output lengths. 8xH100s figures are from TensorRT-LLM 0.10.08 (latest version), and 8xB200 figures are estimated. This is the same benchmark NVIDIA and AMD use.)
Like Comment
To view or add a comment, sign in
Rajgopal A S

Managing Director & Chief Executive Officer | Business Administration
4mo
Report this post
Over next couple years a lot will change in the AI space. For us, this is not the time to commit to one GPU platform, but to build capabilities to help fine-tune models on any platform and help customers get the best value. A novel approach- Sohu, will bring efficiency over the current way of leveraging GPUs by using ASICs for running specific models. We should closely follow progress being made by such innovations.
Etched

10,328 followers
4mo Edited

Meet Sohu, the fastest AI chip of all time. With over 500,000 tokens per second in Llama 70B throughput, Sohu lets you build products impossible on GPUs. Sohu is the world’s first specialized chip (ASIC) for transformers (the “T” in ChatGPT). By burning the transformer architecture into our chip, we can’t run most traditional AI models. But for generative AI models, like ChatGPT (text), SD3 (images), and Sora (video), Sohu has unparalleled performance. One Sohu server runs over 500,000 Llama 70B tokens per second: >20x more than an H100 server (23,000 tokens/sec), and >10x more than a B200 server. We recently raised $120M from Primary Venture Partners and Positive Sum, with participation from Two Sigma Ventures, Skybox Datacenters, Hummingbird Ventures, Oceans, Fundomo, Velvet Sea Ventures, Fontinalis Partners, Galaxy, Earthshot Ventures, Max Ventures and Lightscape Partners. We’re grateful for the support of industry leaders, including Peter Thiel, David Siegel, Thomas Dohmke, Jason Warner, Amjad Masad, Kyle Vogt, Stanley Freeman Druckenmiller, and many more. We’re on track for one of the fastest chip launches in history: - Top hardware engineers and AI researchers have left every major AI chip project to join us. - We’ve partnered directly with TSMC on their 4nm process. We’ve secured HBM and server supply from top vendors and can quickly ramp our first year of production. - Our early customers have reserved tens of millions of dollars of our hardware. As we hit the limits of speed, cost, and scale on GPUs, specialized chips are inevitable. If you want to change the future of AI compute, please join us at www.etched.com/careers. (Benchmarks are from running in FP8 without sparsity at 8x model parallelism with 2048 input/128 output lengths. 8xH100s figures are from TensorRT-LLM 0.10.08 (latest version), and 8xB200 figures are estimated. This is the same benchmark NVIDIA and AMD use.)
2 Comments
Like Comment
To view or add a comment, sign in
Dawn Lippert

Elemental Impact + Earthshot Ventures + Emerson Collective
4mo Edited
Report this post
Ahh, the intersection of AI + climate. These two mega trends are already impacting each other in enormous ways, which is why I'm so excited to share that Earthshot Ventures is investing in Etched, launching today with $120m to scale Sohu, the world’s first specialized -- and most efficient -- chip for transformers (the “T” in ChatGPT). The Intl Energy Agency estimates that by 2026, AI processing demand could use 1,000 TWh (!) “This demand is roughly equivalent to the electricity consumption of Japan,” says the IEA. Why did we invest? - The chip has demonstrated huge efficiency gains: a 15x increase in throughput and a 15x improvement in energy consumption. - Etched already is seeing significant demand, with tens of millions of $ in pre orders. - We believe in the brilliant founding team including CEO Gavin Uberti, a world math champion who identified the Etched opportunity while working for AI startups, and COO Robert W., who previously founded an AI startup accelerator. They have also recruited outstanding industry veterans from every major chip manufacturer. Mike Jackson, Matt Logan, Ramsay Siegal, Austin Blackmon, Garrett Apel, Brianna Rodrigues
Etched

10,328 followers
4mo Edited

Meet Sohu, the fastest AI chip of all time. With over 500,000 tokens per second in Llama 70B throughput, Sohu lets you build products impossible on GPUs. Sohu is the world’s first specialized chip (ASIC) for transformers (the “T” in ChatGPT). By burning the transformer architecture into our chip, we can’t run most traditional AI models. But for generative AI models, like ChatGPT (text), SD3 (images), and Sora (video), Sohu has unparalleled performance. One Sohu server runs over 500,000 Llama 70B tokens per second: >20x more than an H100 server (23,000 tokens/sec), and >10x more than a B200 server. We recently raised $120M from Primary Venture Partners and Positive Sum, with participation from Two Sigma Ventures, Skybox Datacenters, Hummingbird Ventures, Oceans, Fundomo, Velvet Sea Ventures, Fontinalis Partners, Galaxy, Earthshot Ventures, Max Ventures and Lightscape Partners. We’re grateful for the support of industry leaders, including Peter Thiel, David Siegel, Thomas Dohmke, Jason Warner, Amjad Masad, Kyle Vogt, Stanley Freeman Druckenmiller, and many more. We’re on track for one of the fastest chip launches in history: - Top hardware engineers and AI researchers have left every major AI chip project to join us. - We’ve partnered directly with TSMC on their 4nm process. We’ve secured HBM and server supply from top vendors and can quickly ramp our first year of production. - Our early customers have reserved tens of millions of dollars of our hardware. As we hit the limits of speed, cost, and scale on GPUs, specialized chips are inevitable. If you want to change the future of AI compute, please join us at www.etched.com/careers. (Benchmarks are from running in FP8 without sparsity at 8x model parallelism with 2048 input/128 output lengths. 8xH100s figures are from TensorRT-LLM 0.10.08 (latest version), and 8xB200 figures are estimated. This is the same benchmark NVIDIA and AMD use.)
7 Comments
Like Comment
To view or add a comment, sign in
Tan Huynh

Metadata and Ontology
4mo
Report this post
I am predicting inferencing token cost will be cut down 4x by 2025. More specialized hardware will come (soon) to obtain the transferable value from NVIDIA ($$$ remains in the AI bucket, but different stock) Generic-purpose GPU does for both model building and inferencing will be of similar to Xerox print/scan/multi-functions machine back in the 70. You are paying for the whole baggage, which is translated into the cloud billing. Embedded decentralized AI appliances will soon be available, once again, manufactured by the Chinese as an alternative to US Chip-Act. Before we know it, the entire world will be flooded with AI smart devices that can host an internal embedded offline transformer model. GenAI is now just a story in your company journey, whether it's a startup, midmarket, enterprise.
Etched

10,328 followers
4mo Edited

Meet Sohu, the fastest AI chip of all time. With over 500,000 tokens per second in Llama 70B throughput, Sohu lets you build products impossible on GPUs. Sohu is the world’s first specialized chip (ASIC) for transformers (the “T” in ChatGPT). By burning the transformer architecture into our chip, we can’t run most traditional AI models. But for generative AI models, like ChatGPT (text), SD3 (images), and Sora (video), Sohu has unparalleled performance. One Sohu server runs over 500,000 Llama 70B tokens per second: >20x more than an H100 server (23,000 tokens/sec), and >10x more than a B200 server. We recently raised $120M from Primary Venture Partners and Positive Sum, with participation from Two Sigma Ventures, Skybox Datacenters, Hummingbird Ventures, Oceans, Fundomo, Velvet Sea Ventures, Fontinalis Partners, Galaxy, Earthshot Ventures, Max Ventures and Lightscape Partners. We’re grateful for the support of industry leaders, including Peter Thiel, David Siegel, Thomas Dohmke, Jason Warner, Amjad Masad, Kyle Vogt, Stanley Freeman Druckenmiller, and many more. We’re on track for one of the fastest chip launches in history: - Top hardware engineers and AI researchers have left every major AI chip project to join us. - We’ve partnered directly with TSMC on their 4nm process. We’ve secured HBM and server supply from top vendors and can quickly ramp our first year of production. - Our early customers have reserved tens of millions of dollars of our hardware. As we hit the limits of speed, cost, and scale on GPUs, specialized chips are inevitable. If you want to change the future of AI compute, please join us at www.etched.com/careers. (Benchmarks are from running in FP8 without sparsity at 8x model parallelism with 2048 input/128 output lengths. 8xH100s figures are from TensorRT-LLM 0.10.08 (latest version), and 8xB200 figures are estimated. This is the same benchmark NVIDIA and AMD use.)
1 Comment
Like Comment
To view or add a comment, sign in
Shawn Wilson

Unix Dev*Ops Engineer
4mo
Report this post
There are lots of markets that need disruption (or just more investment) here. Like energy (I bet people are upset that they didn’t try to build more nuclear plants around datacenters right now), cooling (kinda surprised there’s not more industrial scale mineral oil setups now), PaaS with GPUs, vram dumping apps/processes, etc. There’s obvious bottlenecks here - nVidia only being the most obvious that need to be removed before this AI can mature much more. Also, blockchain keeps going through bubbles. What’s going to happen the next time there’s a big cryptocurrency push? There’s no more ASICs and everything wants PoW (or ownership). So if a bunch of bankers are willing to pay for GPU time, how many other industries are going to want to outbid them?
Etched

10,328 followers
4mo Edited

Meet Sohu, the fastest AI chip of all time. With over 500,000 tokens per second in Llama 70B throughput, Sohu lets you build products impossible on GPUs. Sohu is the world’s first specialized chip (ASIC) for transformers (the “T” in ChatGPT). By burning the transformer architecture into our chip, we can’t run most traditional AI models. But for generative AI models, like ChatGPT (text), SD3 (images), and Sora (video), Sohu has unparalleled performance. One Sohu server runs over 500,000 Llama 70B tokens per second: >20x more than an H100 server (23,000 tokens/sec), and >10x more than a B200 server. We recently raised $120M from Primary Venture Partners and Positive Sum, with participation from Two Sigma Ventures, Skybox Datacenters, Hummingbird Ventures, Oceans, Fundomo, Velvet Sea Ventures, Fontinalis Partners, Galaxy, Earthshot Ventures, Max Ventures and Lightscape Partners. We’re grateful for the support of industry leaders, including Peter Thiel, David Siegel, Thomas Dohmke, Jason Warner, Amjad Masad, Kyle Vogt, Stanley Freeman Druckenmiller, and many more. We’re on track for one of the fastest chip launches in history: - Top hardware engineers and AI researchers have left every major AI chip project to join us. - We’ve partnered directly with TSMC on their 4nm process. We’ve secured HBM and server supply from top vendors and can quickly ramp our first year of production. - Our early customers have reserved tens of millions of dollars of our hardware. As we hit the limits of speed, cost, and scale on GPUs, specialized chips are inevitable. If you want to change the future of AI compute, please join us at www.etched.com/careers. (Benchmarks are from running in FP8 without sparsity at 8x model parallelism with 2048 input/128 output lengths. 8xH100s figures are from TensorRT-LLM 0.10.08 (latest version), and 8xB200 figures are estimated. This is the same benchmark NVIDIA and AMD use.)
Like Comment
To view or add a comment, sign in
Two Sigma Ventures

28,973 followers
4mo
Report this post
Announcing our investment in Etched's Series A! 🎯 Their mission: Solve AI's compute crunch. GPUs are hitting a wall; they’re getting bigger, but not necessarily better. 🗡 Their weapon: Sohu, a transformer-specific chip. 🚀 The potential: 20x faster AI at 1/20th the cost; That means real-time video gen, instant agents, & more. We're excited to be betting big on specialized AI hardware alongside Etched! Read more about their specialized chip (ASIC), Sohu, below.
Etched

10,328 followers
4mo Edited

Meet Sohu, the fastest AI chip of all time. With over 500,000 tokens per second in Llama 70B throughput, Sohu lets you build products impossible on GPUs. Sohu is the world’s first specialized chip (ASIC) for transformers (the “T” in ChatGPT). By burning the transformer architecture into our chip, we can’t run most traditional AI models. But for generative AI models, like ChatGPT (text), SD3 (images), and Sora (video), Sohu has unparalleled performance. One Sohu server runs over 500,000 Llama 70B tokens per second: >20x more than an H100 server (23,000 tokens/sec), and >10x more than a B200 server. We recently raised $120M from Primary Venture Partners and Positive Sum, with participation from Two Sigma Ventures, Skybox Datacenters, Hummingbird Ventures, Oceans, Fundomo, Velvet Sea Ventures, Fontinalis Partners, Galaxy, Earthshot Ventures, Max Ventures and Lightscape Partners. We’re grateful for the support of industry leaders, including Peter Thiel, David Siegel, Thomas Dohmke, Jason Warner, Amjad Masad, Kyle Vogt, Stanley Freeman Druckenmiller, and many more. We’re on track for one of the fastest chip launches in history: - Top hardware engineers and AI researchers have left every major AI chip project to join us. - We’ve partnered directly with TSMC on their 4nm process. We’ve secured HBM and server supply from top vendors and can quickly ramp our first year of production. - Our early customers have reserved tens of millions of dollars of our hardware. As we hit the limits of speed, cost, and scale on GPUs, specialized chips are inevitable. If you want to change the future of AI compute, please join us at www.etched.com/careers. (Benchmarks are from running in FP8 without sparsity at 8x model parallelism with 2048 input/128 output lengths. 8xH100s figures are from TensorRT-LLM 0.10.08 (latest version), and 8xB200 figures are estimated. This is the same benchmark NVIDIA and AMD use.)
Like Comment
To view or add a comment, sign in

425 followers

View Profile Follow

Lightscape Partners’ Post

More Relevant Posts

Explore topics