Baseten’s Post

View organization page for Baseten, graphic

4,780 followers

What precision format do you use for LLM serving? 🤔 LLMs have billions of parameters that translate to billions of numbers needing to be stored, read, and processed when they're run. FP16 has been a common default format, but it's increasingly common to serve LLMs using FP8—and for good reasons. FP8 can massively improve inference speed and decrease operational costs, with less output quality degradation compared to other techniques. 💡 Learn more about FP8 quantization in Philip Kiely's article: https://lnkd.in/eKvQzsni Tell us: what precision formats do you use for your models? 🧮

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics