> ## Documentation Index
> Fetch the complete documentation index at: https://docs.ai21.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Troubleshooting & Performance Optimization

> Resolve common issues and optimize performance for AI21's Jamba model deployments

## Overview

This guide helps you troubleshoot common deployment issues and optimize performance for AI21's Jamba models across different deployment scenarios.

<Note>
  Before troubleshooting, ensure you're using the recommended vLLM version `v0.6.5` to `v0.8.5.post1`.
</Note>

## Troubleshooting

### Memory Issues

<AccordionGroup>
  <Accordion title="Out of Memory (OOM) Errors">
    **Symptoms:**

    * CUDA out of memory errors
    * Process killed by system

    **Solutions:**

    ```bash theme={"system"}
    --quantization="experts_int8"      # Reduces memory usage (recommended)
    --tensor-parallel-size=8           # Number of GPUs to use (1-8)
    --max-model-len=128000            # Reduce context length if needed (max 256K)
    --gpu-memory-utilization=0.8    # Limits GPU memory usage
    ```
  </Accordion>

  <Accordion title="Memory Fragmentation">
    **Symptoms:**

    * Inconsistent OOM errors
    * Memory usage appears lower than expected

    **Solutions:**

    ```bash theme={"system"}
    --max-num-seqs=50    # Controls memory per request (increase/decrease to tune)
    ```
  </Accordion>
</AccordionGroup>

### Model Loading Issues

<AccordionGroup>
  <Accordion title="Slow Model Loading">
    **Storage Recommendations:**

    * Network Storage: >1 GB/s bandwidth
    * RAM Disk: Load model from RAM if possible
  </Accordion>
</AccordionGroup>

### Performance Issues

<AccordionGroup>
  <Accordion title="Chunked Prefill Optimization">
    **When to Use:**

    * Long input sequences (>8K tokens)
    * High memory pressure during prefill
    * Mixed sequence lengths in batches

    **Configuration:**

    ```bash theme={"system"}
    --enable-chunked-prefill           # Enables chunked prefill
    --max-num-batched-tokens=8192     # Adjust based on GPU memory
    ```
  </Accordion>
</AccordionGroup>

## Getting Help

<Card title="Need Support?" icon="headset">
  If you need support, please contact our team at **[support@ai21.com](mailto:support@ai21.com)** with the following information:

  **Environment Details:**

  * Hardware specifications (GPU model, memory, CPU)
  * Software versions (vLLM, CUDA, drivers)
  * Full vLLM command

  **Diagnostics:**

  * Full error messages and stack traces
  * GPU utilization logs (`nvidia-smi` output)
</Card>
