POST
/
studio
/
v1
/
chat
/
completions
from ai21 import AI21Client
from ai21.models.chat import ChatMessage

messages = [
    ChatMessage(role="user", content="Hello how are you?"),
]

client = AI21Client()

client.chat.completions.create(
    messages=messages,
    model="jamba-large",
    max_tokens=1024,
)

Overview

The Jamba API provides access to a set of instruction-following chat models. It describes the details for interacting with the chat model via the API endpoint and provides specifications for request and response structures.


Request body

model
string
required

The name of the model to use.
You can call our model without specifying a version by using the following model names:

  • jamba-large
  • jamba-mini

For more information on the available model versions, click here.

messages
object[]
required

A list of messages representing the conversation history. The structure of the message object depends on the type:

tools
object[]

A list of tools that the model can use when generating a response.
Currently, the only function type tools are supported.

documents
object[]

The document parameter accepts a list of objects, each containing multiple fields.

response_format
object

An object defining the output format required from the model.
Setting it to { "type": "json_object" } activates JSON mode, ensuring the generated message adheres to valid JSON structure.

max_tokens
integer

The maximum number of tokens the model can generate in its response.
For Jamba models, the maximum allowed value is 4096 tokens.

temperature
float

Controls the variety of responses provided—a higher value results in more diverse answers.
Default: 0.4, Range: 0.0–2.0
More information

top_p
float

Limit the pool of next tokens in each step to the top N percentile of possible tokens, where 1.0 means the pool of all possible tokens, and 0.01 means the pool of only the most likely next tokens.
Default: 1.0, Range: 0 <= value <=1.0
More information

stop
string[]

End the message when the model generates one of these strings. The stop sequence is not included in the generated message. Each sequence can be up to 64K long, and can contain newlines as n characters.

n
integer

How many chat responses to generate. Default:1, Range: 1 – 16.
Notes:

  • If n > 1, setting temperature = 0 will fail because all answers are guaranteed to be duplicates.
  • n must be 1 when stream = True
stream
boolean

Stream results one token at a time using server-sent events. Useful for long results to avoid long wait times. If True, n must be 1. Must be False if using tools.

from ai21 import AI21Client
from ai21.models.chat import ChatMessage

messages = [
    ChatMessage(role="user", content="Hello how are you?"),
]

client = AI21Client()

client.chat.completions.create(
    messages=messages,
    model="jamba-large",
    max_tokens=1024,
)