Self-deployment guide

This guide provides detailed instructions on how to self-deploy the Jamba 1.5 Mini and Jamba 1.5 Large models.

Self-Deployment Options

Option 1: Direct Download from HuggingFace

  1. Download base docker image.
  2. Download the model from HuggingFace:
  3. Download the Tokenizer:

Option 2: Using a Specific Platform

  • For platforms like SageMaker, we provide tailored guides.
    You can find the SageMaker guide here.

Option 3: Running vLLM docker (On-Premises)

When deploying an LLM, you need the following components:

  1. Hardware (Compute):

Jamba 1.5 Mini

Jamba 1.5 Mini System Requirements

Jamba 1.5 Large

Jamba 1.5 Large System Requirements

  1. Runtime Environment ( Execution Environment)
    This includes the OS, Frameworks, libraries and any dependencies
    required for the model to function. See this docker.
  2. Prepare host environment for VLLM deployment.
    Pull the Docker image and prepare a working directory.
  3. Download the model directly from HuggingFace.
    Example: download model to a local folder by cloning the huggingface model repo:
    Install Git Large File Storage
    git clone https://huggingface.co/ai21labs/AI21-Jamba-1.5-Mini
  4. Run vLLM docker container in ‘offline’ mode
  • Set OFFLINE mode
  • Map local model directory to the container path ‘/mnt/model/’
  • Set quantization to ‘experts_int8’ for better throughput and usage in smaller instances
  • Use the container model path as the name of the model:
    docker run --gpus all
    -v /root/models/my-local-model-dir/:/mnt/model/
    -p 8000:8000
    --env "TRANSFORMERS_OFFLINE=1"
    --env "HF_DATASET_OFFLINE=1"
    --quantization="experts_int8"
    --ipc=host vllm/vllm-openai:latest
    --model="/mnt/model/"
  1. Call API with model name set to ‘/mnt/model’
    curl -X POST "http://localhost:8000/v1/chat/completions"
    -H "Content-Type: application/json"
    --data '{
    "model": "/mnt/model/"
    "messages": [
    {"role": "user", "content": "Hello!"}
    ]
    }

References