This model is in private preview. Sign up for private preview waitlist here.

Sending a Request

To solicit a response to a set of messages, dispatch an HTTP request. This request should encompass:

  1. A sequence of text messages.
  2. Relevant parameters to modulate the generation of text.

For authentication, it's imperative to include your API key within the request headers.

Post submission of your request, anticipate a response encompassing the generated text message.

Request Parameters

model | str

: The name of the model to use. Required.

Currently, the only option is `jamba-instruct-preview`.

messages | List[ChatMessage]

: A list of messages that build up the message so far. Required.

The ChatMessage class has the following attributes:

  • role | str
    • A string representation the role of the message author. Can. be one of user, assistant ,system.
  • content | str
    • The content of the message. For all roles except system, this can’t be an empty string. For the system role, if this is an empty string, we ignore it.

In the payload, both role and content are mandatory for every message.

The messages list should have at least 1 `user`/`assistant` message.

max_tokens | int

: The maximum number of tokens to generate for each completion. Optional, default = 4096.

Must be under 64K.

temperature | float

: Modifies the distribution from which tokens are sampled. Optional, default = 1.0.

Setting temperature to 1.0 samples directly from the model distribution. Lower (higher) values increase the chance of sampling higher (lower) probability tokens. A value of 0 essentially disables sampling and results in greedy decoding, where the most likely token is chosen at every step.

top_p | float

: Sample tokens from the corresponding top percentile of probability mass. Optional, default = 1.0.

For example, a value of 0.9 will only consider tokens comprising the top 90% probability mass.

stop | Union[str, List[str]

: Stop the generation when the model generates one of these strings. Optional.

For example, to stop at a comma or a new line use `[".", "\n"]`. 

Response Parameters

The response will have the following properties:

id | str

: A unique string id for the processed request. Repeated identical requests get different ids.


choices | List[ChatCompletionResponseChoice]

: A list representing the completions generated by the model.

The ChatCompletionReponseChoice class has the following attributes:

  • index | int
    • The index of the completion. For parsing a single completion, use index=0.
  • message | ChatMessage
    • The message generated by the model.
  • finish_reason | UsageInfo
    • Representing why the generation was stopped. The options:
      • stop - the generation stopped naturally (due to end of sequence token) or by generating a stop sequence.
      • length - the generation stopped due to reaching max_tokens.

usage | str

: Usage statistics for the request.

Example Request

import requests

url = "https://api.ai21.com/studio/v1/chat/completions"

{
  "model": "jamba-instruct-preview",
  "messages": [
      {
        "role": "user",
        "content": "Tell me something I don't know"
      }
    ],
  "max_tokens": 200,
  "temperature": 1,
  "top_p": 1,
  "stop": None,
 }

Example Response

{
  "id": "chatcmpl-8zLI4FFBAAApK2mGJ1BJOrMrPZQ8N",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Sure! Here's an interesting fact: Did you know that honey never spoils? Archaeologists have"
      },
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 26,
    "completion_tokens": 20,
    "total_tokens": 46
  },
}