Local Inference

For fully local execution, llama.cpp enables running compatible open models in the GGUF format, with optional GPU acceleration.

AI21 publishes official Jamba model weights on the Hugging Face Hub, and community contributors may provide GGUF-format conversions (e.g., Jamba Mini 1.7) for use with llama.cpp.

Note:AI21 does not distribute or support GGUF builds and cannot verify the accuracy of third-party conversions. Be sure to review the model’s license terms and consult the llama.cpp documentation before use.

Troubleshooting & Performance Optimization File Library

⌘I

Getting Started

AI21 Maestro

Foundation Models

Private AI

AI21 Studio

Model Preparation

Usage

AI Ethics & Data Transperancy

Additional Resources