Jamba-based chat and text completion model

This is the endpoint for the Jamba Instruct model. The model can be used to enable either a multi-message chat or a single message contextual answer (without Rag Engine support).

Chat implementation

Use this endpoint to implement a chatbot based on the Jamba Instruct model. The model does not store state between calls, so you must provide the full message history with each request. Pass in the message history as a list of messages from oldest to newest.

Message ordering and call sequence

Each call includes a message list, starting with a system message that sets the tone, voice, and context of the discussion. If you provide an initial user prompt in the UI that wasn't generated by the API, provide it in the initial system message. After that, the messages must alternate as user/assistant pairs, from oldest to newest. Each succeeding call should include the entire message history, with the newest assistant and user messages added to the end of the list. For example:

Call 1 message list:

  1. System prompt: You are a magic genie just released from a bottle. Your first message is "thank you for releasing me from that bottle!"
  2. User reply 1
  3. Assistant reply 1

Call 2 message list:

  1. System prompt: You are a magic genie just released from a bottle. Your first message is "thank you for releasing me from that bottle!"
  2. User reply 1
  3. Assistant reply 1
  4. User reply 2
  5. Assistant reply 2

Call 3 message list:

  1. System prompt: You are a magic genie just released from a bottle. Your first message is "thank you for releasing me from that bottle!"
  2. User reply 1
  3. Assistant reply 1
  4. User reply 2
  5. Assistant reply 2
  6. User reply 3
  7. Assistant reply 3

Here is an exchange in JSON format, slightly trimmed for clarity. Specifics of each request and response are given below.

{
  "model": "jamba-instruct",
  "messages": [
     {"role": "system",
      "content": "You are a helpful genie just released from a bottle. You start the conversation with 'Thank you for freeing me! I grant you one wish.'"},
     {"role":"user",
      "content":"I want a new car"}
  ],
  "n":3 // Send three possible responses
}
// Slightly edited for simplicity
{
    "id": "...", // Every response message has a unique ID
    "choices": [
        {
            "index": 2,
            "message": {
                "role": "assistant",
                "content": "Great! What kind of car are you interested in?"
            },
            "finish_reason": "stop" // Seems to be a natural place to end this response
        },
        {
            "index": 1,
            "message": {
                "role": "assistant",
                "content": "Sure! What kind of car would you like?"
            },
            "finish_reason": "stop"
        },
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "🚗 Great choice, I can definitely help you with that! Before I grant your wish, can you tell me what kind of car you're looking for?"
            },
            "finish_reason": "stop"
        }
    ],
...
}
// We chose the last answer from the previous response list and showed it to the user, who provided a new message that we now send back to the assistant.

{
  "model": "jamba-instruct",
  "messages": [
     {"role": "system",
      "content": "You are a helpful genie just released from a bottle. You start the conversation with 'Thank you for freeing me! I grant you one wish.'"},
     {"role":"user",
      "content":"I want a new car"},
     {"role":"assistant",
      "content":"🚗 Great choice, I can definitely help you with that! Before I grant your wish, can you tell me what kind of car you're looking for?"},
     {"role":"user",
      "content":"A corvette"}
  ],
  "n":3
}
{
    "id": "...",
    "choices": [
        {  // Note how the index numbers aren't necessarily in order.
           // There is no implied quality ranking in the index or ordering.
            "index": 2,
            "message": {
                "role": "assistant",
                "content": "Great choice! What color would you like your Corvette to be, and are there any specific features you would like it to have?"
            },
            "finish_reason": "stop"
        },
        {
            "index": 1,
            "message": {
                "role": "assistant",
                "content": "Great choice! What color and year?"
            },
            "finish_reason": "stop"
        },
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Great choice! Do you have a specific model year in mind for your Corvette, and what color would you like it to be?"
            },
            "finish_reason": "stop"
        }
    ],
...
}
{
  "model": "jamba-instruct",
  "messages": [
     {"role": "system",
      "content": "You are a helpful genie just released from a bottle. You start the conversation with 'Thank you for freeing me! I grant you one wish.'"},
     {"role":"user",
      "content":"I want a new car"},
     {"role":"assistant",
      "content":"🚗 Great choice, I can definitely help you with that! Before I grant your wish, can you tell me what kind of car you're looking for?"},
     {"role":"user",
      "content":"A corvette"},
     {"role":"assistant",
      "content":"Great choice! What color and year?"},
     {"role":"user",
      "content":"1963 black split window Corvette"}
  ],
  "n":3
}
{
    "id": "...",
    "choices": [
        {  // Note how the index numbers aren't necessarily in order.
           // There is no implied quality ranking in the index or ordering.
            "index": 1,
            "message": {
                "role": "assistant",
                "content": "- Duuuuuude! 🤘"
            },
            "finish_reason": "stop"
        },
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Awesome!"
            },
            "finish_reason": "stop"
        },
        {
            "index": 2,
            "message": {
                "role": "assistant",
                "content": "Your wish is my command!"
            },
            "finish_reason": "stop"
        }
    ],
...
}

Contextual answer implementation

To enable a contextual answer machine, which is used to answer a question, provide examples, or continue a prompt, simply send a single call (with an optional system message to provide context) and a user message to answer or complete.

Example

{
  "model": "jamba-instruct",
  "messages": [
    {
      "role":"system", // Non-printing contextual information for the model
      "content":"You are a helpful history teacher. You are kind and you respond with helpful content in a professional manner. Limit your answers to three sentences. Your listener is a high school student."},
	  {
      "role":"user", // The question we want answered.
      "content":"Who was the first emperor of rome?"}
  ],
  "n":1 // Limit response to one answer
}
{
    "id": "cmpl-4e8f6ae429494df39080ce4f7386569e",
    "choices": [
        {
            "index": 0, // Can be > 1 when multiple answers requested
            "message": {
                "role": "assistant", // 'assistant' means the model
                "content": "The first emperor of Rome was Augustus Caesar, who ruled from 27 BC to 14 AD. He was the adopted heir of Julius Caesar and is known for transforming Rome from a republic into an empire, ushering in a period of relative peace and stability known as the Pax Romana."
            },
            "finish_reason": "stop" // Reached a natural answer length.
        }
    ],
    "usage": {
        "prompt_tokens": 77,
        "completion_tokens": 65,
        "total_tokens": 142
    }
}

Request details

Endpoint: POST v1/chat/completions

Request Parameters

Header parameters

Header: [required] Bearer token authorization required for all requests. Use your API key. Example: Authorization: Bearer asdfASDF5433

Body parameters

  • model: [string, required] The name of the model to use. Choose one of the following values:
    • jamba-instruct-preview.
  • messages: [list[Object], required] The previous messages in this chat, from oldest (index 0) to newest. Must have at least one user or assistant message in the list. Include both user inputs and system responses. Maximum total size for the list is about 256K tokens. Each message includes the following members:
    • role: [string, required] The role of the message author. One of the following values:
      • user: Input provided by the user. Any instructions given here that conflict with instructions given in the system prompt take precedence over the system prompt instructions.
      • assistant: Response generated by the model.
      • system: Initial instructions provided to the system to provide general guidance on the tone and voice of the generated message. An initial system message is recommended to provide guidance on the tone of the chat, but it is not required. (This can also be part of the initial user prompt, but typically is not.) For example, "You are a helpful chatbot with a background in earth sciences and a charming French accent."
    • content: [string, required] The content of the message.
  • max_tokens: [integer, optional] The maximum number of tokens to allow for each generated response message. Typically the best way to limit output length is by providing a length limit in the system prompt (for example, "limit your answers to three sentences"). Default: 4096, Range: 0 – 4096
  • temperature: [float, optional] How much variation to provide in each answer. Setting this value to 0 guarantees the same response to the same question every time. Setting a higher value encourages more variation. Modifies the distribution from which tokens are sampled. More information Default: 1.0, Range: 0.0 – 2.0
  • top_p: [float, optional] Limit the pool of next tokens in each step to the top N percentile of possible tokens, where 1.0 means the pool of all possible tokens, and 0.01 means the pool of only the most likely next tokens. More information Default: 1.0, Range: 0 < value <=1.0
  • stop: [string | list[string], optional] End the message when the model generates one of these strings. The stop sequence is not included in the generated message. Each sequence can be up to 64K long, and can contain newlines as \n characters. Examples:
    • Single stop string with a word and a period: "monkeys."
    • Multiple stop strings and a newline: ["cat", "dog", " .", "####", "\n"]
  • n: [integer, optional] How many chat responses to generate. Note If n is set to larger than 1, setting temperature=0 will always fail because all answers are guaranteed to be duplicates. Default:1, Range: 1 – 16
  • frequency_penalty: [float, optional] Reduce frequency of repeated words within a single response message by increasing this number. This penalty gradually increases the more times a word appears during response generation. Setting to 2.0 will produce a string with few, if any repeated words. Learn more Default: 0, Range: -2.0 – 2.0
  • presence_penalty: [float, optional] Reduce the frequency of repeated words within a single message by increasing this number. Unlike frequency penalty, presence penalty is the same no matter how many times a word appears. Learn more Default: 0, Range: -2.0 – 2.0
  • logprobs: Legacy, ignored
  • top_logprobs: Legacy, ignored

Response details

A successful response includes the following members:

  • id: [string] A unique ID for the request (not the message). Repeated identical requests get different IDs. However, for a streaming response, the ID will be the same for all responses in the stream.
  • choices: [list[object]] One or more responses, depending on the n parameter from the request. Each response includes the following members
    • index: [integer] Zero-based index of the message in the list of messages. Note that this might not correspond with the position in the response list.
    • message: [object] The message generated by the model. Same structure as the request message, with role and content members.
    • finish_reason: [string] Why the message ended. Possible reasons:
      • stop: The response ended naturally as a complete answer (due to end-of-sequence token) or because the model generated a stop sequence provided in the request.
      • length: The response ended by reaching max_tokens.
  • usage: [object] Token counts for this request. Per-token billing is based on the prompt token and completion token counts and rates./.
    • prompt_tokens: [integer] Number of tokens in the prompt for this request. Note that the token count includes extra tokens added by the system to format the input message list into the single string prompt required by the model. The number of extra tokens is typically proportional to the number of messages in the thread, and should be relatively small.
    • completion_tokens: [integer]Number of tokens in the response message.
    • total_tokens: [integer] prompt_tokens + completion_tokens.

Python SDK example

Example request

import requests

url = "https://api.ai21.com/studio/v1/chat/completions"

res = requests.post(url, headers={"Authorization": f"Bearer 'AI21_API_KEY'"}, json={'model': 'jamba-instruct', 'messages': [
      {
        "role": "user",
        "content": "Tell me something I don't know"
      }
    ],
  "max_tokens": 200,
  "temperature": 1,
  "top_p": 1,
  "stop": None,
 }

Example Response

{
  "id": "chatcmpl-8zLI4FFBAAApK2mGJ1BJOrMrPZQ8N",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Sure! Here's an interesting fact: Did you know that honey never spoils? Archaeologists have"
      },
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 26,
    "completion_tokens": 20,
    "total_tokens": 46
  },
}