Deploy AI21’s Jamba models using vLLM in your own environment. vLLM is an open-source library for high-throughput LLM inference and serving.
v0.6.5
to v0.8.5.post1
for optimal performance and compatibility.≥0.6.5, ≤0.8.5.post1
to ensure maximum compatibility with all Jamba models).
$HF_TOKEN
:
Pull the Docker image
Run the container
-v /path/to/model:/mnt/model/
and --model="/mnt/model/"
instead of the HuggingFace model identifier.