TensorOpera AI’s Post

View organization page for TensorOpera AI, graphic

2,676 followers

5mo

🔥 How to Create Your Scalable and Dedicated Qualcomm-TensorOpera AI Endpoint? Last week: A Demo of Qualcomm-TensorOpera Dedicated Endpoint in Action This week: How to Create Your Own Endpoints? Deployment Steps on TensorOpera AI Platform (https://lnkd.in/end_FWiD): 1. Go to Deploy > Endpoints > Create Endpoint 2. Select model (e.g., SDXL, Llama3-8B), version, and name your endpoint 3. Select deployment method: dedicated on TensorOpera cloud or your on-premise servers 4. Set the needed number of GPUs per replica (we recommend 1x AI 100 for Llama3 and 2x AI 100 for SDXL replica) 5. Set the number of replicas to meet your average traffic demand 6. Set the autoscale limit to meet your peak traffic variations Customized Auto-Scaling: 1. Customize auto-scaling conditions and speed that scales replicas based on your traffic 2. Balance automatically high SLA & cost efficiency Result: 1. Your own dedicated endpoint running on Qualcomm AI 100 2. Advanced features: Playground, API Access, System Monitoring, Prediction Logs, User Statistics from TensorOpera AI Get early access on https://lnkd.in/eJKVMB9D #TensorOpera #QualcommCloud #GenAIPlatform #ScalableAPIs

Transcript

Last week, I showed you the Qualcomm Tensor Opera AI Dedicated Endpoint, live and in action. And today, I want to show you how you can create one of those dedicated endpoints yourself. If you're interested in the advanced features that I'm showing you today, or showed you last week, make sure to put your email down and get early access using this form. To get your own dedicated endpoint in Tensor Opera AI, go to our Deploy tab, under Endpoints, and create your own endpoint. First, we have to name our endpoint, so we just name our endpoint as we want, and choose the model that we want to deploy. You can choose all the models that are available in our ModelHub. For this demo, I'm going with Llama3-8B. I choose the latest version and go to the deployment methods. Here, you can choose whether you want dedicated endpoints on your own machine or our Tensor Opera Cloud that is supported by many of our partners such as Qualcomm. As a resource type, we choose the Qualcomm AI100 and depending on your individual traffic, you can vary the number of desired replicas over here. For this demo, one should be enough. And we recommend to use one GPU per replica for Llama3 specifically. Next, I want to introduce you to our autoscaling feature that can be enabled with just one click. Autoscale-lets you scale the number of replicas based on your traffic to guarantee high SLA. For this demo, I'm picking 10 as maximum replicas. further let you customize your autoscaling and choose how fast you want to scale, when you want to scale up, when you want to scale down, and under what conditions. This allows your infrastructure to scale flexibly, reacting to your traffic, thus perfectly matching the balance of maximum SLA and cost efficiency. I am happy with my setup and I'm deploying the endpoint. You can see the endpoint is successfully created and already initializing. And you see all the familiar advanced features that I showed to you last week. Let me show you the endpoint that I already deployed before, where you can click through the playground, you get the API access as mentioned before, the system monitoring, and all the advanced features. That's it for today and make sure you sign up for the early access. I'm looking forward to hearing from you.

To view or add a comment, sign in

More Relevant Posts

Alex Williams

Founder at The New Stack
7mo Edited
Report this post
#Tailscale is built on #Wireguard. It handles all of the key rotation, the coordination between instances on the user’s behalf. It helps users traverse complicated networks. From a Kubernetes perspective, Tailscale enables cross cluster communication both in different regions or in the same region. It enables a user to expose like services and ingresses or workloads to internal users with a zero trust approach. So you can tag different devices and limit who's able to access those things. It takes away the complexity at the routing level. Communicating between clusters is a common problem in the AI space where GPUs are in short supply. Tailscale makes it easier to cross connect cloud providers to get access to resources such as GPUs. #kubecon

2 Comments
Like Comment
To view or add a comment, sign in
Vishal Parashar

Sales Leader and Cloud Evangelist; Cloud Infrastructure (Compute Grid, High Performance Compute, GenAI Infrastructure); Capital Markets - Sell/buy Side, Market Data, Risk Analysis; Banking, Insurance, FinTech & FinServ
3mo
Report this post
***THE BEST AI INFRASTRUCTURE*** Run the most demanding AI workloads faster, including generative AI, computer vision, and predictive analytics. Use Oracle Cloud Infrastructure (OCI) Supercluster to scale up to 32,768 GPUs today and 65,536 GPUs in the near future. #GPU #AIInfrastructure #OCI

AI Power for Next-Gen Workloads

oracle.com
Like Comment
To view or add a comment, sign in
Amar Gowda

Sr. Principal Product Lead - Gen AI & GPU Incubations
4mo
Report this post
At Oracle Cloud Infrastructure (OCI) we focus on price-to-performance of GPU hardware on any new product launches. Here is an example on benchmarking a scenario that is very close customer production use cases with LLama 2 70B serving at scale. AMD MI300X continues to impress us with a powerful hardware & software shaping up for most demanding workloads. https://lnkd.in/gVQgbepz #AMD #OCI #LLM #GenerativeAI #MI300X

Early LLM serving experience and performance results with AMD Instinct MI300X GPUs

blogs.oracle.com
Like Comment
To view or add a comment, sign in
CoreWeave

31,997 followers
2mo
Report this post
Learn all about enterprise AI deployments in the cloud in our new blog post, including info about: 🔷 NVIDIA GB200 Blackwell release 🔷 State-of-the-art liquid cooling in the data center 🔷 DPU's and BMC observability 🔷 Software for 1000+ server clusters Check out the Jacob Yundt-authored post below 👇 https://hubs.la/Q02Ln2TD0

How AI Clusters for Enterprises Are Evolving Ahead of 2025 — CoreWeave

coreweave.com

2 Comments
Like Comment
To view or add a comment, sign in
Astera Labs

21,306 followers
8mo Edited
Report this post
Check out the new article from Adam Armstrong with TechTarget on our Aries #PCIe/#CXL Smart Cable Modules (SCMs). This article goes into more detail on how our Aries SCMs will enable CXL-attached memory and multi-rack GPU clustering for #cloud and #AI. Read now:

Astera Labs uses CXL to accelerate AI, expand memory | TechTarget

techtarget.com
Like Comment
To view or add a comment, sign in
Pradeep R

On a Mission Building Next Gen Digital Infrastructure | AI Data Centers | AI Compute | GPU Cloud | AI Cloud Infrastructure Engineering Leader | Hyperscalers| Cloud,AI/HPC Infra Solutions | Sustainability | 10K Followers
4mo
Report this post
Learn How to run LLMs on AMD GPUs.

Jeff Boudier

Product + Growth at Hugging Face
4mo

Want to learn how to deploy LLMs on AMD GPUs and Ryzen AI PCs? 🎙 Tune in to Hugging Cast this Thursday June 6 - register here: https://lnkd.in/gu6yyKaA This episode will be all about using Hugging Face on AMD, from cloud to laptop: 🚀 How we collaborate with AMD to make open models go brrr everywhere ☁️ Deploy LLMs on Azure with the new ND MI300X v5 VMs and TGI 💻 Unlock on-device inference with optimum-amd on the new Ryzen AI PCs Join Morgan Funtowicz Félix Marty Mohit Sharma and I and bring your questions!

🤗 Hugging Cast - Run LLMs on AMD from cloud to laptop 🔥

streamyard.com
Like Comment
To view or add a comment, sign in
Shaoni Mukherjee

🚀 Passionate AI/ML Innovator | Turning Data into Actionable Insights | Tech Enthusiast 🌟
7mo
Report this post
🚀 Elevate Your Compute Power with GPUs! 🚀 🔥 Embrace Peak Performance: Unleash the true potential of your computing tasks with the unparalleled power of Graphics Processing Units (GPUs). 💡 Whether you're into AI, deep learning, or high-performance computing, GPUs are the game-changers you've been waiting for! 💪 Why GPUs? ✨ Lightning-Fast Processing: Accelerate your workloads and enjoy seamless, rapid computations. ✨ Deep Learning Dynamo: Propel your AI projects to new heights with GPU-driven parallel processing. ✨ Versatility Unleashed: From gaming to data analytics, GPUs redefine multitasking prowess. 💡 Dive into the GPU Advantage: 🚀 Cutting-Edge Technology: Stay ahead with the latest GPU innovations. 🌐 Global Impact: Revolutionize industries from healthcare to finance with accelerated data analysis. 🌟 Ready to Upgrade? Discover the transformative power of GPUs - where speed meets precision, and innovation knows no bounds. 💻✨ Let's shape the future together! 🚀 #GPU #ComputingPower #Innovation #TechUpgrade #hellopaperspace https://bit.ly/49D9wRf

NVIDIA H100 for AI & ML Workloads | Cloud GPU Platform | Paperspace

paperspace.com
Like Comment
To view or add a comment, sign in
Eric Bassier

AI + HPC | Head of Solution Marketing and Sales Enablement at Hammerspace
4mo
Report this post
Really interesting webinar earlier today from NVIDIA covering considerations for deploying #generativeAI in production. Nice job Bethann Noble and Neal Vaidya! Some takeaways: 1. #genai is moving from experimentation to production quickly, and they provided some good tips on building RAG frameworks and pipelines, and good best practices in general. 2. Interesting insight that the workloads are going to shift from 80% training today to 80% inference in a few years (I rounded and simplified). 3. Arguably, keeping GPUs 100% utilized for inference is going to be more important, because your customers (either internal or external) are waiting. Hammerspace delivers fast performance to keep GPUs fully utilized, and brings your data to your GPU resources even if they are in the cloud. Learn more about how we are reducing latencies and increasing throughput in NVIDIA environments.
Like Comment
To view or add a comment, sign in
GMI Cloud

1,924 followers
1mo
Report this post
Joining the #AIConference next week and need fast, dependable GPU infrastructure at your fingertips? Look no further! Whether you're building cutting-edge AI models or scaling your ML workflows, GMI Cloud has you covered. 💼 Let’s connect! Reach out here on LinkedIn or email sales@gmicloud.ai to schedule a time to meet and get a personal demo of our full-stack GPU platform. Don’t miss out on optimizing your AI projects with high-performance GPUs! #AIConference #SF #GPUCloud #AIInfrastructure #CloudComputing #GPUPower
Like Comment
To view or add a comment, sign in
Abhinav Pratap Singh Thakur

"Specialist in AI / API & Application Security Sales | SaaS solutions, and Modern Application Sales | Driving secure digital transformation with multi-cloud expertise. F5 Sales Professional."
2mo
Report this post
"The Need for AI Infrastructure Solutions to Focus on GPU Optimization" Generative AI is reshaping the IT landscape, making GPUs more essential than ever. As Moore’s Law slows down and edge computing advances, GPUs have become vital for handling the demands of AI and high-performance computing.With their high demand and limited supply, organizations are investing heavily in GPUs for both on-premises and public cloud infrastructure. But integrating GPUs into existing infrastructure brings new challenges. Traditional IT setups were simple and standardized, but GPUs require a more strategic approach. A recent report highlights that 15% of organizations use less than half of their GPUs, indicating potential mismatches between available resources and actual workloads. To optimize GPU utilization, companies must enhance their infrastructure management practices. This means smart provisioning, efficient load balancing, and possibly leveraging public cloud GPU resources. Updating enterprise architecture to address these complexities is crucial for fully leveraging the power of AI and ensuring effective resource use. Source: https://lnkd.in/gwmavZJj #AI #F5 #GPUOptimization
1 Comment
Like Comment
To view or add a comment, sign in

TensorOpera AI

2,676 followers

View Profile Follow

More from this author

Webinar: FEDML Nexus AI Next-Gen Cloud Services for Generative AI to accelerate your LLM projects. Live demo.

TensorOpera AI 12mo

Explore topics

翻译：