Trelis Research’s Post

++ Build an Auto-scaling Inference Service ++ This video is for those of you who want to set up APIs for custom models (can be text or multi-modal). - I walk through the steps involved in setting up inference endpoints. - I weigh the options of a) renting gpus, b) using a serverless service or c) building an auto-scaling service yourself - Then, I build out an auto-scaling service that can be served through a single open-ai style endpoint. I show how to set up a scaling service for SmolLM and also for Qwen multi-modal text plus image models. Find the video over on Trelis Research on YouTube

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics