Self-deployment guide
This guide provides detailed instructions on how to self-deploy the Jamba 1.5 Mini and Jamba 1.5 Large models.
Self-Deployment Options
Option 1: Direct Download from HuggingFace
- Download base docker image.
- Download the model from HuggingFace:
- Download the Tokenizer:
Option 2: Using a Specific Platform
- For platforms like SageMaker, we provide tailored guides.
You can find the SageMaker guide here.
Option 3: Running vLLM docker (On-Premises)
When deploying an LLM, you need the following components:
- Hardware (Compute):
Jamba 1.5 Mini
Jamba 1.5 Large
- Runtime Environment ( Execution Environment)
This includes the OS, Frameworks, libraries and any dependencies
required for the model to function. See this docker. - Prepare host environment for VLLM deployment.
Pull the Docker image and prepare a working directory. - Download the model directly from HuggingFace.
Example: download model to a local folder by cloning the huggingface model repo:
Install Git Large File Storage
git clone https://huggingface.co/ai21labs/AI21-Jamba-1.5-Mini - Run vLLM docker container in ‘offline’ mode
- Set OFFLINE mode
- Map local model directory to the container path ‘/mnt/model/’
- Set quantization to ‘experts_int8’ for better throughput and usage in smaller instances
- Use the container model path as the name of the model:
docker run --gpus all
-v /root/models/my-local-model-dir/:/mnt/model/
-p 8000:8000
--env "TRANSFORMERS_OFFLINE=1"
--env "HF_DATASET_OFFLINE=1"
--quantization="experts_int8"
--ipc=host vllm/vllm-openai:latest
--model="/mnt/model/"
- Call API with model name set to ‘/mnt/model’
curl -X POST "http://localhost:8000/v1/chat/completions"
-H "Content-Type: application/json"
--data '{
"model": "/mnt/model/"
"messages": [
{"role": "user", "content": "Hello!"}
]
}
References
Updated 1 day ago