It's that time again! We're back with the latest updates here at #VastAI, aimed at bringing you the best possible GPU rental platform experience. Last month, we've rolled out numerous template updates as well as added a new guide to our Docs on serving Infinity Embeddings. https://lnkd.in/grZzpS-i
Vast.ai
Software Development
Los Angeles, California 1,478 followers
Peer GPU rental: One simple interface to search, compare and utilize GPU computing at the best prices.
About us
Vast.ai is the market leader for low cost GPU rentals. The service connects data centers and professionals running the Vast hosting software with users who can quickly find the best deals for compute according to their specific requirements. Vast.ai GPU rentals are ~3-5X cheaper than current alternatives. Consumer computers and consumer GPUs in particular are considerably more cost effective than equivalent enterprise hardware. We are helping the millions of underutilized consumer GPUs around the world enter the cloud computing market for the first time.
- Website
-
https://vast.ai
External link for Vast.ai
- Industry
- Software Development
- Company size
- 2-10 employees
- Headquarters
- Los Angeles, California
- Type
- Privately Held
- Founded
- 2018
Locations
-
Primary
6600 W Sunset Blvd
STE 256
Los Angeles, California 90028, US
Employees at Vast.ai
Updates
-
Medusa is slightly different than other types of speculative decoding in that it adds a piece of the original model to do the speculation. TGI is the first major serving framework for large language models that enables Medusa-style speculative decoding.
Serving Online Inference with TGI and Medusa on Vast.ai
vast.ai
-
As the year winds down, rumors are intensifying around NVIDIA's highly anticipated GeForce RTX 5090 GPU. Industry insiders are divided on the release date, with some sources suggesting a launch just in time for Christmas, while other reports point to a formal announcement at CES 2025 in the new year.
NVIDIA RTX 5090: Out by Christmas? A Look at the Latest Rumors
vast.ai
-
In the complex landscape of data center operations, understanding and adhering to various compliance standards is crucial.
Navigating Data Center Compliance: Understanding Tier 2/3 and HIPAA/ISO 27001 Standards
vast.ai
-
Medusa is a method of specular decoding. Speculative decoding speeds up inference of large language models by having a smaller model multiple tokens and lets the larger model just verify. If the Verification for the large model is cheaper than generating the tokens themselves. If the smaller model is accurate enough, then the cost to generate tokens goes down overall. Medusa is slightly different than other types of speculative decoding in that it adds a piece of the original model to do the speculation.
Serving Online Inference with TGI and Medusa on Vast.ai
-
This guide will show you how to set up SGLang to serve a language model on Vast.
Serving sglang on Vast
vast.ai
-
The L40S was developed to meet the surging demand for GPUs that can handle the intense computational requirements of machine learning training and inference. How does it stack against the L40 -- and which one do you need?
Comparing NVIDIA L40 vs. L40s – and More
vast.ai
-
SGLang provides an OpenAI-compatible server, allowing you to easily integrate it into chatbots and other applications. As companies develop their AI products, they often face challenges like rate limits and high costs when using these models. With SGLang on Vast, you can run your own models in the form factor you need, at a much more affordable price point. As inference demand grows with agents and complex workflows, SGLang on Vast excels in performance and affordability where it matters most.
Serving sglang on Vast
vast.ai
-
When deciding between the A100 and H100, consider your specific workload requirements. If you need top-tier double-precision performance and superior memory bandwidth, or you're dealing with next-gen HPC at datacenter scale and trillion-parameter AI, the H100 is the clear winner. For a more versatile and cost-effective solution that still delivers powerful AI performance, the A100 is a solid choice.
H100 vs A100: Comparing Two Powerhouse GPUs
vast.ai
-
vLLM is now more flexible than ever as it also supports embedding models. This brings vLLM's dynamic batching and Paged Attention to embedding models for much faster throughput, all from the docker image that developers are used to. This guide will show you how to setup vLLM to serve embedding models on Vast.
Serving vLLM Embeddings on Vast.ai
vast.ai