AIKit is a comprehensive platform to quickly get started to host, deploy, build and fine-tune large language models (LLMs).
AIKit offers two main capabilities:
-
Inference: AIKit uses LocalAI, which supports a wide range of inference capabilities and formats. LocalAI provides a drop-in replacement REST API that is OpenAI API compatible, so you can use any OpenAI API compatible client, such as Kubectl AI, Chatbot-UI and many more, to send requests to open LLMs!
-
Fine-Tuning: AIKit offers an extensible fine-tuning interface. It supports Unsloth for fast, memory efficient, and easy fine-tuning experience.
👉 For full documentation, please see AIKit website!
- 🐳 No GPU, Internet access or additional tools needed except for Docker!
- 🤏 Minimal image size, resulting in less vulnerabilities and smaller attack surface with a custom distroless-based image
- 🎵 Fine-tune support
- 🚀 Easy to use declarative configuration for inference and fine-tuning
- ✨ OpenAI API compatible to use with any OpenAI API compatible client
- 📸 Multi-modal model support
- 🖼️ Image generation support
- 🦙 Support for GGUF (
llama
), GPTQ or EXL2 (exllama2
), and GGML (llama-ggml
) and Mamba models - 🚢 Kubernetes deployment ready
- 📦 Supports multiple models with a single image
- 🖥️ Supports AMD64 and ARM64 CPUs and GPU-accelerated inferencing with NVIDIA GPUs
- 🔐 Ensure supply chain security with SBOMs, Provenance attestations, and signed images
- 🌈 Supports air-gapped environments with self-hosted, local, or any remote container registries to store model images for inference on the edge.
You can get started with AIKit quickly on your local machine without a GPU!
docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.1:8b
After running this, navigate to http://localhost:8080/chat to access the WebUI!
AIKit provides an OpenAI API compatible endpoint, so you can use any OpenAI API compatible client to send requests to open LLMs!
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "llama-3.1-8b-instruct",
"messages": [{"role": "user", "content": "explain kubernetes in a sentence"}]
}'
Output should be similar to:
{
// ...
"model": "llama-3.1-8b-instruct",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "Kubernetes is an open-source container orchestration system that automates the deployment, scaling, and management of applications and services, allowing developers to focus on writing code rather than managing infrastructure."
}
}
],
// ...
}
That's it! 🎉 API is OpenAI compatible so this is a drop-in replacement for any OpenAI API compatible client.
AIKit comes with pre-made models that you can use out-of-the-box!
If it doesn't include a specific model, you can always create your own images, and host in a container registry of your choice!
Model | Optimization | Parameters | Command | Model Name | License |
---|---|---|---|---|---|
🦙 Llama 3.2 | Instruct | 1B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.2:1b |
llama-3.2-1b-instruct |
Llama |
🦙 Llama 3.2 | Instruct | 3B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.2:3b |
llama-3.2-3b-instruct |
Llama |
🦙 Llama 3.1 | Instruct | 8B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.1:8b |
llama-3.1-8b-instruct |
Llama |
🦙 Llama 3.1 | Instruct | 70B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.1:70b |
llama-3.1-70b-instruct |
Llama |
Instruct | 8x7B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/mixtral:8x7b |
mixtral-8x7b-instruct |
Apache | |
Instruct | 3.8B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/phi3.5:3.8b |
phi-3.5-3.8b-instruct |
MIT | |
🔡 Gemma 2 | Instruct | 2B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/gemma2:2b |
gemma-2-2b-instruct |
Gemma |
⌨️ Codestral 0.1 | Code | 22B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/codestral:22b |
codestral-22b |
MNLP |
Note
To enable GPU acceleration, please see GPU Acceleration.
Please note that only difference between CPU and GPU section is the --gpus all
flag in the command to enable GPU acceleration.
Model | Optimization | Parameters | Command | Model Name | License |
---|---|---|---|---|---|
🦙 Llama 3.2 | Instruct | 1B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3.2:1b |
llama-3.2-1b-instruct |
Llama |
🦙 Llama 3.2 | Instruct | 3B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3.2:3b |
llama-3.2-3b-instruct |
Llama |
🦙 Llama 3.1 | Instruct | 8B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3.1:8b |
llama-3.1-8b-instruct |
Llama |
🦙 Llama 3.1 | Instruct | 70B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3.1:70b |
llama-3.1-70b-instruct |
Llama |
Instruct | 8x7B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/mixtral:8x7b |
mixtral-8x7b-instruct |
Apache | |
Instruct | 3.8B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/phi3.5:3.8b |
phi-3.5-3.8b-instruct |
MIT | |
🔡 Gemma 2 | Instruct | 2B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/gemma2:2b |
gemma-2-2b-instruct |
Gemma |
⌨️ Codestral 0.1 | Code | 22B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/codestral:22b |
codestral-22b |
MNLP |
📸 Flux 1 Dev | Text to image | 12B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/flux1:dev |
flux-1-dev |
FLUX.1 [dev] Non-Commercial License |
👉 For more information and how to fine tune models or create your own images, please see AIKit website!