Great work! Let us know if you need access to better GPU!
About a week ago I set up a frankenstein 🧟♂️ of a Kubernetes cluster (AMD, Intel, ARM) for my homelab. Since then I've added Nvidia into the mix. Being GPU Poor has it's challenges but since my old laptop node in the cluster already had an Nvidia GPU why not utilize it? An Nvidia GTX 1050 Mobile with 4GB of VRAM is good enough for small models. I'm Malaysian, naturally so Mesolitica have some small malaysian LLM models that I wanted to try out. Utilizing Open-WebUI for a chatgpt like interface and NVIDIA GPU Operator to enable gpu processing in my kubernetes cluster, I've managed to run Mesolitica malaysian-Llama-3.2-3B-Instruct model. Outputting at a nice rate of 23.56 tokens/s. Pretty fast considering it's a 7 year old laptop. Quite the work getting this all to run in my frankenstein of a cluster. The troubleshooting it took getting the gpu to work was quite the learning experience. Thanks to Mesolitica for the models on hugging face. https://lnkd.in/gQxi4Pm2 My frankenstein cluster. I'll use whatever I can find. https://lnkd.in/gwf3Cnk3 Anybody willing to pass me a high memory gpu to play with? 😂. Would love to play around how one scales up serving LLM to multiple users. #malaysian #llm #ai #gpupoor #kubernetes #openwebui #nvidia #grafana