> ## Documentation Index > Fetch the complete documentation index at: https://docs.ai21.com/llms.txt > Use this file to discover all available pages before exploring further. # Troubleshooting & Performance Optimization > Resolve common issues and optimize performance for AI21's Jamba model deployments ## Overview This guide helps you troubleshoot common deployment issues and optimize performance for AI21's Jamba models across different deployment scenarios. Before troubleshooting, ensure you're using the recommended vLLM version `v0.6.5` to `v0.8.5.post1`. ## Troubleshooting ### Memory Issues **Symptoms:** * CUDA out of memory errors * Process killed by system **Solutions:** ```bash theme={"system"} --quantization="experts_int8" # Reduces memory usage (recommended) --tensor-parallel-size=8 # Number of GPUs to use (1-8) --max-model-len=128000 # Reduce context length if needed (max 256K) --gpu-memory-utilization=0.8 # Limits GPU memory usage ``` **Symptoms:** * Inconsistent OOM errors * Memory usage appears lower than expected **Solutions:** ```bash theme={"system"} --max-num-seqs=50 # Controls memory per request (increase/decrease to tune) ``` ### Model Loading Issues **Storage Recommendations:** * Network Storage: >1 GB/s bandwidth * RAM Disk: Load model from RAM if possible ### Performance Issues **When to Use:** * Long input sequences (>8K tokens) * High memory pressure during prefill * Mixed sequence lengths in batches **Configuration:** ```bash theme={"system"} --enable-chunked-prefill # Enables chunked prefill --max-num-batched-tokens=8192 # Adjust based on GPU memory ``` ## Getting Help If you need support, please contact our team at **[support@ai21.com](mailto:support@ai21.com)** with the following information: **Environment Details:** * Hardware specifications (GPU model, memory, CPU) * Software versions (vLLM, CUDA, drivers) * Full vLLM command **Diagnostics:** * Full error messages and stack traces * GPU utilization logs (`nvidia-smi` output)