Deploying open-source models like Llama3, Mixtral, Gemma, etc. in production..
But facing difficulty with setting up Kubernetes, Karpenter, CI/CD, etc. for GPU nodes? At Tensorfuse, we've solved most of these issues.
We're building a serverless runtime that operates on your own cloud (AWS/Azure/GCP). It offers the ease and speed of a serverless, along with the flexibility and control of your own infra. Here are some of our most loved features:
1. Ability to customize your environment: Specify container images and hardware specifications using simple Python, no YAML required.
2. Autoscaling: Scale GPU workers from zero to hundreds in seconds to meet user demand in real-time.
3. OpenAI compatibility: Start using your deployment on an OpenAI compatible endpoint.
It took us just 30 minutes to deploy Llama3 on our own AWS account using Tensorfuse. The best part is that all of this can be achieved directly from your CLI, eliminating the need for context switching.
Check out our website for more details. Link is in the comments!