Bin Tan’s Post

To support various AI software, such as, LangChain, etc., and to be able to manage 1000+ GPUs, and to be used by many customers for their critical business tasks, the amount of work grow exponentially. With very limited resources, how am I able to handle them? My eyes turn to this tiny AI cluster I built with just under $3000 GPUs. Can the AI cluster help me to develop itself? With 1.5X, or 2X, or 3X, or 5X, or even 10X productivity improvement? It is good to find them out.

A Small Cluster with 2 Linux Nodes and 7 GPU Graphic Cards (4 RTX 3060, 2 RTX 4060 Ti Super and 1 RTX 4070 Ti Super). In the bottom node, I have to lift all 4 GPU cards and connect them to the motherboard via PCIe cables due to space limitation. Total GPU VRAM: 96GB Total cost of GPU Graphic Cards < $3000 Able to run Llama2-7B, Llama2-13B, Llama3-8B and CodeLlama-34B-Python models in float16 concurrently. Need software dev work to make it happen as conventional ways require more than (7 + 13 + 8 + 34) * 2 GB = 124GB GPU VRAM. With Device Management and AI Service Management software in place, this small cluster can be turned into an Enterprise GenAI System.

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics