Self-deployment guide
This guide provides detailed instructions on how to self-deploy the Jamba Mini and Jamba Large models.
Self-Deployment Options
Option 1: Direct Download from HuggingFace
- Download base docker image.
- Download the model weights and the tokenizer from HuggingFace:
Option 2: Using a Specific Platform
- For platforms like SageMaker, we provide tailored guides.
You can find the SageMaker guide here.
Option 3: Running vLLM docker (On-Premises)
We recommend using vLLM version v0.6.5
to v0.7.3
.
NVIDIA Stack
- NVIDIA Driver Version: 535.x.y
- CUDA 12.1
While earlier versions may be compatible, they have not been tested and may result in less optimized performance or lack support for certain features.
When deploying an LLM, you need the following components:
- Hardware (Compute):
Jamba Mini
Jamba Large
- Runtime Environment ( Execution Environment)
This includes the OS, Frameworks, libraries and any dependencies
required for the model to function. See this docker. We support the latest version of vLLM. - Prepare host environment for VLLM deployment
Pull the Docker image and prepare a working directory. - Download the model directly from Hugging Face
Jamba Large: https://huggingface.co/ai21labs/AI21-Jamba-Large-1.6/edit/main/README.md
Jamba Mini: https://huggingface.co/ai21labs/AI21-Jamba-Mini- - Run vLLM docker container in ‘offline’ modeSet OFFLINE mode
- Map local model directory to the container path ‘/mnt/model/’
- Set quantization to ‘experts_int8’ for better throughput and usage in smaller instances
- Use the container model path as the name of the model:
docker run --gpus all
-v /root/models/my-local-model-dir/:/mnt/model/
-p 8000:8000
--env "TRANSFORMERS_OFFLINE=1"
--env "HF_DATASET_OFFLINE=1"
--quantization="experts_int8"
--ipc=host vllm/vllm-openai:latest
--model="/mnt/model/"
)
- Call API with model name set to
‘/mnt/model’ curl -X POST "http\://localhost:8000/v1/chat/completions"
-H "Content-Type: application/json"
--data '{
"model": "/mnt/model/"
"messages": [
{"role": "user", "content": "Hello!"}
]
}
References
Updated 8 days ago