> ## Documentation Index
> Fetch the complete documentation index at: https://docs.ai21.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Chat response

## Response details

### Non-streaming results

A successful non-streamed response includes the following members:

<ParamField body="id" type="string">
  Unique ID for each request (not message). Same ID for all responses in a streaming response.
</ParamField>

<ParamField body="choices" type="object[]">
  One or more responses, depending on the `n` parameter from the request.
  Each response includes the following members.
</ParamField>

<Expandable title="properties">
  <ParamField body="index" type="object">
    Zero-based index of the message in the list of messages. Note that this might not correspond with the position in the response list.
  </ParamField>

  <ParamField body="message" type="object">
    The message generated by the model. Includes two fields: `role` and `content`.
  </ParamField>

  <ParamField body="tool_calls" type="object[]">
    Tool calls only occur if a tools parameter was specified in the request. These tool calls apply solely to the current message, and returned values should be added to the message thread in both the assistant message tool\_calls fields and the tool message.

    <Expandable title="properties">
      <ParamField body="id" type="string">
        ID of the tool call, generated by the model.
      </ParamField>

      <ParamField body="type" type="string">
        The type of tool called. Currently the only possible value is "function".
      </ParamField>

      <ParamField body="function" type="object">
        The invoked function.

        <Expandable title="properties">
          <ParamField body="name" type="string">
            The name of the function, which you specified in your request.
          </ParamField>

          <ParamField body="arguments" type="object">
            A JSON object containing the function's parameters and values.
          </ParamField>
        </Expandable>
      </ParamField>
    </Expandable>
  </ParamField>
</Expandable>

<ParamField body="finish_reason" type="string">
  Why the message ended.

  <Expandable title="properties">
    <ParamField body="stop" type="string">
      The response ended naturally as a complete answer (due to end-of-sequence token) or because the model generated a stop sequence provided in the request.
    </ParamField>

    <ParamField body="length" type="string">
      The response ended by reaching max\_tokens.
    </ParamField>
  </Expandable>
</ParamField>

<ParamField body="usage" type="object">
  The token counts for this request.
  Per-token billing is based on the prompt token and completion token counts and rates.

  <Expandable title="properties">
    <ParamField body="prompt_tokens" type="integer">
      Number of tokens in the prompt for this request.
      The prompt token contains the entire message history and extra tokens for combining messages, proportional to the number of messages.
    </ParamField>

    <ParamField body="completion_tokens" type="integer">
      Number of tokens in the response message.
    </ParamField>

    <ParamField body="total_tokens" type="integer">
      prompt\_tokens and completion\_tokens.
    </ParamField>
  </Expandable>
</ParamField>

### Streamed results

Setting `stream = true` in the request will return a stream of messages, each containing one token. You can read more about streaming calls using the [SDK](https://github.com/AI21Labs/ai21-python/blob/main/README.md#Streaming).

The final message will be `data: [DONE]`. All other messages will have `data` set to a JSON object with the following fields:

<ParamField body="data" type="object">
  An object containing either an object with the following members, or the string "DONE" for the last message.
</ParamField>

<ParamField body="id" type="string">
  Unique ID for each request (not message). Same ID for all streaming responses.
</ParamField>

<ParamField body="choices" type="object">
  An array with one object containing the following fields:
</ParamField>

<ParamField body="index" type="integer">
  Always zero.
</ParamField>

<ParamField body="delta" type="object">
  * The first message in the stream will be an object set to `{"role":"assistant"}`.
  * Subsequent messages will have an object `{"content": **token**}` with the generated token.
</ParamField>

<ParamField body="finish_reason" type="string">
  Why the message ended.

  <Expandable title="properties">
    <ParamField body="usage" type="object">
      The last message includes this field, which shows the total token counts for the request. Per-token billing is based on the prompt token and completion token counts and rates.
      When present, it contains a null value except for the last chunk which contains the token usage statistics for the entire request.

      <Expandable title="properties">
        <ParamField body="prompt_tokens" type="integer">
          Number of tokens in the prompt for this request.
          The prompt token contains the entire message history and extra tokens for combining messages, proportional to the number of messages.
        </ParamField>

        <ParamField body="completion_tokens" type="integer">
          Number of tokens in the response message.
        </ParamField>

        <ParamField body="total_tokens" type="integer">
          prompt\_tokens and completion\_tokens.
        </ParamField>
      </Expandable>
    </ParamField>
  </Expandable>
</ParamField>

`usage` will be `null` except for the last chunk which contains the token usage statistics for the entire request.

<ResponseExample>
  ```python Python (Non-streaming results) theme={"system"}
  import asyncio

  from ai21 import AsyncAI21Client
  from ai21.models.chat import ChatMessage

  messages = [ChatMessage(content="What is the meaning of life?", role="user")]

  client = AsyncAI21Client()


  async def main():
      response = await client.chat.completions.create(
          messages=messages,
          model="jamba-large",
          stream=True,
      )
      async for chunk in response:
          print(chunk.choices[0].delta.content, end="")


  asyncio.run(main())
  ```

  ```python Python (Streamed results) theme={"system"}
  from ai21 import AI21Client
  from ai21.models.chat import ChatMessage

  messages = [ChatMessage(content="What is the meaning of life?", role="user")]

  client = AI21Client()

  response = client.chat.completions.create(
      messages=messages,
      model="jamba-large",
      stream=True,
  )
  for chunk in response:
      print(chunk.choices[0].delta.content, end="")
  ```
</ResponseExample>

***

## Error Codes

500 - Internal Server Error\
429 - Too Many Requests (You are sending requests too quickly.)\
503 - Service Unavailable (The engine is currently overloaded, please try again later)\
401 - Unauthorized (Incorrect API key provided/Invalid Authentication)\
403 - Access Denied\
422 - Unprocurable Entity (Request body is malformed)

***
