Baseten

Baseten

Software Development

San Francisco, CA 4,151 followers

Fast, scalable inference in our cloud or yours

About us

At Baseten we provide all the infrastructure you need to deploy and serve ML models performantly, scalably, and cost-efficiently. Get started in minutes, and avoid getting tangled in complex deployment processes. You can deploy best-in-class open-source models and take advantage of optimized serving for your own models. We also utilize horizontally scalable services that take you from prototype to production, with light-speed inference on infra that autoscales with your traffic. Best in class doesn't mean breaking the bank. Run your models on the best infrastructure without running up costs by taking advantage of our scaled-to-zero feature.

Website
https://www.baseten.co/
Industry
Software Development
Company size
11-50 employees
Headquarters
San Francisco, CA
Type
Privately Held
Specialties
developer tools and software engineering

Products

Locations

Employees at Baseten

Updates

  • View organization page for Baseten, graphic

    4,151 followers

    🎉 We’re excited to announce Baseten Self-hosted for unparalleled control over AI model deployments! 👉🏻 Check out our announcement blog to learn more: <<<<<https://lnkd.in/gVR6GhQ6>>>>> After working with countless AI builders across different industries, we consistently heard the need for a high-performance inference solution running in their VPC to: • Meet strict data residency requirements • Align with organizational and industry compliance standards • Leverage existing cloud commitments and resources • Customize hardware and GPU usage Both Baseten Cloud and Baseten Self-hosted offer enterprise-grade security, performance, and reliability. Baseten Self-hosted is specifically designed for companies and enterprises needing enhanced control over infrastructure and data, while gaining the performance, reliability, and scale we specialize in. 🥇 Baseten Self-hosted enables you to run inference in your own VPC with the same user experience as our Cloud offering. Model inference inputs and outputs go directly to your compute—they never touch our premises. 💚 We love to support our customers with state-of-the-art AI inference. If Baseten Self-hosted can help you meet your security and compliance needs, provide necessary control over hardware, or leverage your existing resources, get in touch! <<<<<https://lnkd.in/gSQWwH5m>>>>>

    • No alternative text description for this image
  • View organization page for Baseten, graphic

    4,151 followers

    Bland AI announced $22M of funding and launched on Product Hunt today! 🎊 We’re so pumped to support the future of AI phone calling that we decided to throw an end-of-summer party with them. 🍸 Check the comments for the registration link. With Baseten, Bland reduced end-to-end call latency from 3 seconds to under 400 milliseconds and gained seamless traffic-based autoscaling to meet customer demands—with 50x growth in usage, and 100% uptime to date. Check out the story, support their ProductHunt launch, and come celebrate with us!

    • No alternative text description for this image
  • View organization page for Baseten, graphic

    4,151 followers

    toby founders Lucas Campa 🤌 and Vincent Wilmet 🤌 came to Baseten one week away from their startup’s Product Hunt launch. Their AI-powered real-time translation service allows people to have a live video call while speaking different languages. After working with our engineers, Vincent and Lucas migrated from their development infrastructure to an ultra-low-latency production-ready deployment on Baseten—and reached #3 on Product Hunt on launch day, with zero minutes of downtime. 🔥 Read their story: https://lnkd.in/efz2_DKb

    • No alternative text description for this image
  • View organization page for Baseten, graphic

    4,151 followers

    You love building robust systems and processes? Join us as an SRE. ⚙ Optimizing AI models? Join our model performance team. 🚀 Engaging with potential customers? Join our sales team as an SDR! 💪 We're thrilled to welcome many new team members, but we're not stopping there! We're hiring for 9 open roles, take a look: https://lnkd.in/eMHByrHz 📣 Share or tag someone you know would be a great fit!

    • No alternative text description for this image
  • View organization page for Baseten, graphic

    4,151 followers

    Using open-source ML models poses a few advantages: 🎛️ Control (over model inputs, outputs, and environment) 📊 Custom optimizations 💰 Predictable spend 👉 With so many open-source models to pick from, Philip Kiely put together a guide on how to choose the right model for your use case—take a look: https://lnkd.in/eduzEivF And as always, you can launch all of these models from our model library. 🔥 https://lnkd.in/eKJebzGs

    • No alternative text description for this image
  • View organization page for Baseten, graphic

    4,151 followers

    Philip Kiely got tired of waiting 8-10 seconds for Stable Diffusion XL to generate images on an A100, so he set out to make it faster. 🏎 Using 5 different optimizations, he first made it 5x faster: SDXL inference took only 1.92 seconds 💪 (see how: https://lnkd.in/e2ABQxX8). Then, by adding TensorRT to the mix, Philip Kiely and Pankaj Gupta decreased latency by another 40%! Take a look: https://lnkd.in/ePqpa6Hj 🏅 Optimizing model performance is one of our specialties. If you're looking to optimize your own models in production, give us a shout!

    • No alternative text description for this image
  • View organization page for Baseten, graphic

    4,151 followers

    What precision format do you use for LLM serving? 🤔 LLMs have billions of parameters that translate to billions of numbers needing to be stored, read, and processed when they're run. FP16 has been a common default format, but it's increasingly common to serve LLMs using FP8—and for good reasons. FP8 can massively improve inference speed and decrease operational costs, with less output quality degradation compared to other techniques. 💡 Learn more about FP8 quantization in Philip Kiely's article: https://lnkd.in/eKvQzsni Tell us: what precision formats do you use for your models? 🧮

    • No alternative text description for this image
  • View organization page for Baseten, graphic

    4,151 followers

    🛠 We built Truss, an open-source model packaging framework, to give developers unparalleled control and simplicity for serving ML models. Model serving requires iterative development; Truss addresses this need with live reload. With Truss, the upload-build-deploy loop is practically instantaneous. ⚡️ Otherwise, this can take anywhere from 3 to 30 minutes! 🐌 🧠 Our Co-Founder Pankaj Gupta wrote a technical deep-dive on Truss' live reload feature on our blog, check it out: https://lnkd.in/e6XasSbc ⭐ Or take a look at Truss on GitHub: https://lnkd.in/gAivnGWz

    • No alternative text description for this image
  • View organization page for Baseten, graphic

    4,151 followers

    Using Medusa, we achieved a 94% to 122% increase in tokens per second for Llama 3! 🤯 Medusa is a method for generating multiple tokens per forward pass during LLM inference. After trying more fundamental optimizations (like quantization, using H100 GPUs, or TensorRT-LLM), more speed requires implementing cutting-edge inference techniques like Medusa. Check out Philip Kiely and Abu Qader's new article to learn how Medusa works, performs on different benchmarks, and how you can use a Medusa-optimized LLM in production! 💪 https://lnkd.in/eK9i3hTu

    • No alternative text description for this image

Similar pages

Browse jobs

Funding