Overview
The Jamba API provides access to a set of instruction-following chat models.
This document describes the details for interacting with the chat model via the API endpoint and provides specifications for request and response structures.
Authentication
This header is required for all API requests. It must include a Bearer token for authentication.
Use your API key to generate the token.
Format:
Authorization: Bearer <your-api-key>
Create chat completion
POST v1/chat/completions
Request body
model string Required
The name of the model to use.
You can call our model without specifying a version by using the following model names:
jamba-large
jamba-mini
For more information on the available model versions, click here.
messages array of objects Required
A list of messages representing the conversation history.
The structure of the message object depends on the type:
show more
system message object
An initial system message is optional but recommended to set the tone of the chat.
role string Required
The role of the entity that is creating the message.
content string Required
The content of the message.
user message object
Input provided by the user.
role string Required
The role of the entity that is creating the message, in this case the user.
content string Required
The content of the message.
assistant message object
Response generated by the model. Include this in your request to provide context for future answers.
role string Required
The role of the entity that is creating the message, in this case the assistant.
content string Required
The content of the message.
tool_calls array of objects
The function calls generated by the model, such as tool invocations.
id string
The id of the tool call.
type string
The type of tool.
function object
The function invoked by the model.
name string
The name of the function.
arguments JSON string
The parameters of the function as a JSON schema.
tool message object
Contains the output of a tool. Add the function output for user-implemented tools to enable a user-friendly model response. If included, ensure an assistant message with a tool_calls entry with a matching id exists.
role string Required
The role of the entity that is creating the message, in this case the tool.
content string Required
The content of the message.
tool_call_id string Required
The message is a response to this tool call.
tools array of objects
A list of tools that the model can use when generating a response.
Currently, the only function type tools are supported.
show more
type string Required
The type of tool. Currently, the only supported value is "function".
function object Required
Describes a function to call. Currently, all functions must be described by the user; there are no built-in functions. An example function template is given below.
show more
name string Required
The name of the function.
description object Required
Provide a complete description of what the function does, what it returns, and any limitations.
parameter object
Each function parameter has a name, a type ("string", "integer", "float", "array", "boolean", or "enum"), and a description.
documents array of objects
The document parameter accepts a list of objects, each containing multiple fields.
show more
content string Required
The content of this "document".
metadata array of objects
Key-value pairs describing the document:
key string Required
Type of metadata, like ‘author’, ‘date’, ‘url’, etc. Should be things the model understands.
value string Required
Value of the metadata.
response_format object
An object defining the output format required from the model.
Setting it to { "type": "json_object" }
activates JSON mode, ensuring the generated message adheres to valid JSON structure.
max_tokens integer
The maximum number of tokens the model can generate in its response.
temperature float
Controls the variety of responses provided. A higher value results in more diverse answers.
More information. Default: 0.4, Range: 0.0 – 2.0
top_p float
Limit the pool of next tokens in each step to the top N percentile of possible tokens, where 1.0 means the pool of all possible tokens, and 0.01 means the pool of only the most likely next tokens. More information. Default: 1.0, Range: 0 <= value <=1.0
stop array of strings
End the message when the model generates one of these strings. The stop sequence is not included in the generated message. Each sequence can be up to 64K long, and can contain newlines as n
characters.
show more
- Single stop string with a word and a period: "monkeys."
- Multiple stop strings and a newline: ["cat", "dog", " .", "####", "\n"]
n integer
How many chat responses to generate. Default:1, Range: 1 – 16 Notes:
- If
n > 1
, settingtemperature = 0
will fail because all answers are guaranteed to be duplicates. n
must be 1 whenstream = True
stream boolean
Stream results one token at a time using server-sent events. Useful for long results to avoid long wait times. If True
, n
must be 1. Must be False
if using tools
.
Request Example
from ai21 import AI21Client
from ai21.models.chat.chat_message import SystemMessage, UserMessage, AssistantMessage
system = "You're a support engineer in a SaaS company"
messages = [
SystemMessage(content=system, role="system"),
UserMessage(content="Hello, I need help with a signup process.", role="user"),
AssistantMessage(content="Hi Alice, I can help you with that. What seems to be the problem?", role="assistant"),
UserMessage(content="I am having trouble signing up for your product with my Google account.", role="user"),
]
client = AI21Client()
response = client.chat.completions.create(
messages=messages,
model="jamba-mini",
max_tokens=100,
temperature=0.7,
top_p=1.0,
stop=["\n"],
)