Jamba-1.5 instruction following chat models
Request details
Endpoint: POST v1/chat/completions
Request Parameters
Header parameters
Header: [required] Bearer token authorization required for all requests. Use your API key. Example: Authorization: Bearer asdfASDF5433
Body parameters
model
: [string, required] The name of the model to use. Choose one of the following values:jamba-1.5-mini
jamba-1.5-large
messages
: [array of objects, required] The previous messages in this chat, from oldest (index 0) to newest. For single turn interactions, this should be an optional system message, and a single user message. Maximum total size for the list is about 256K tokens. The structure of the message object depends on the type:
system message
Description
Initial instructions provided to the system to provide general guidance on the tone and voice of the generated message. An initial system message is optional but recommended to provide guidance on the tone of the chat. For example, "You are a helpful chatbot with a background in earth sciences and a charming French accent."
Structure
role
: [string, required]system
content
: [string, required] The content of the message.
user message
Description
Input provided by the user.
Structure
role
: [string, required]user
content
: [string, required] The content of the message.
assistant message
Description
Response generated by the model. Include this in your request in order to provide context for future answers. The values here should be copied from the response.
Structure
role
: [string, required]assistant
content
: [string, required] The content of the message.tool_calls
: [optional, array of objects] If the assistant called a tool as requested and successfully returned a result, include the tool call results here to enable context for future responses by the model. Tool calls are requested using thetools
parameter. You can simply copy the object returned in themessage.tool_calls
field in the response. If included, the next message in the history must be a tool role message with a matching ID, that provides the result of running the function with the parameters described here.id
: [string] The id of the tool call, returned bymessage.tool_calls.id
in the response.type
: [string] The type of tool, returned bymessage.tool_calls.type
in the response.function
: [string] Details about the function that the model called. Returned bymessage.tool_calls.function
in the response.name
: [string] The name of the function. Look for your function by name in the responses to determine whether the model could generate calling information for that function.arguments
: [JSON string] A JSON representation of the parameter values used when calling this function. These parameters might be the ones generated by the model, or they might be any values you chose to use instead of the model-generated values.
tool message
Description
Holds the response generated by running a tool. For user-implemented tools, such as functions, put the output of running the function here. This is to enable you to pass in a function response to enable the model to generate an appropriate user-friendly response. If included, you must have an assistant
message with a tool_calls
entry with a matching id.
Structure
role
: [string, required]tool
content
: [string, required] The output generated by running the function. For example, if it was a weather report application, and the weather on the requested day returned "Sunny, 23°C", you would enter that here. Enter the value in plain string format, rather than as a JSON string.tool_call_id
: [string, required] The ID of the tool call, This originally came from the response fieldmessage.tool_calls.id
and must match atool_calls
assistant
role entry in the message history.
tools
: [array of objects, optional] A list of tools that the model can use when generating a response. Currently the only function type tools are supported.type
: [string, required] The type of tool. Currently the only supported value is "function".function
: [object, required] Describes a function to call. Currently all functions must be described by the user; there are no built-in functions. An example function template is given below.- The function name is the name of the function.
- The function description is a complete description of the function. Provide a very complete description of what the function does, what it returns, and any limitations. The more complete the description, the better the model can work to extract required parameters from the chat or other sources.
- If a function has parameters, each function parameter has a name, a type ("string", "integer", "float", "array", "boolean", or "enum", and a description. Provide as much information as you can for each parameter, including what it is, and the range of possible values.
- if a parameter is required, you should include in the "required" list.
- Ideally you should provide both a function description and a parameter list for best results, but the model can try to produce a result if you include only one or the other..
{
"type": "function",
"function": {
"name": "UNIQUE_FUNCTION_NAME",
"description": "FUNCTION DESCRIPTION",
"parameters": {
"type": "object",
"properties": {
"PARAMETER_NAME": {
"type": "VALID_PARAM_TYPE",
"enum": [
"VAL 1",
"VAL 2",
"VAL 3"
],
"description": "PARAMETER DESCRIPTION"
}
},
"required": [
"REQUIRED_PARAM_NAME"
]
}
}
}
response_format
: [object, optional] If left blank, will be text. if set to{"type":"json_object"}
it will try to return the entire response in valid JSON format. For JSON formatting to succeed, you must have a description of the desired format in the prompt.documents
: [array of objects, optional] If present, provides extra context for the answers. You can also provide this information directly in the message content. Providing it here instead provides several benefits: 1) You can tell the model to generate its answer entirely based on the provided documents similar to a RAG engine. If you need this, you must specify so in the prompt ("Limit your answer to the information included in the attached documents."). 2) You can provide arbitrary metadata about each document, which might be useful for generating a response. Each document has the following elements:content
[string, required] The content of this "document".metadata
: [array of objects, optional] Arbitrary key/value metadata pairs describing this document:key
(required; str) - type of metadata, like ‘author’, ‘date’, ‘url’, etc. Should be things the model understands.value
(required; str) - value of the metadata
max_tokens
: [integer, optional] The maximum number of tokens to allow for each generated response message. Typically the best way to limit output length is by providing a length limit in the system prompt (for example, "limit your answers to three sentences"). Default: 4096, Range: 0 – 4096temperature
: [float, optional] How much variation to provide in each answer. Setting this value to 0 guarantees the same response to the same question every time. Setting a higher value encourages more variation. Modifies the distribution from which tokens are sampled. More information Default: 0.4, Range: 0.0 – 2.0top_p
: [float, optional] Limit the pool of next tokens in each step to the top N percentile of possible tokens, where 1.0 means the pool of all possible tokens, and 0.01 means the pool of only the most likely next tokens. More information Default: 1.0, Range: 0 <= value <=1.0
stop
: [string | array of strings, optional] End the message when the model generates one of these strings. The stop sequence is not included in the generated message. Each sequence can be up to 64K long, and can contain newlines as\n
characters. Examples:- Single stop string with a word and a period: "monkeys."
- Multiple stop strings and a newline: ["cat", "dog", " .", "####", "\n"]
n
: [integer, optional] How many chat responses to generate. Default:1, Range: 1 – 16 Notes:- If
n > 1
, settingtemperature=0
will fail because all answers are guaranteed to be duplicates. n
must be 1 whenstream = True
- If
stream
: [boolean, optional] Whether or not to stream the result one token at a time using server-sent events . This can be useful when waiting for long results where a long wait time for an answer can be problematic, such as a chatbot. If set to True, then n must be 1. A streaming response is different than the non-streaming response. Must be False if requesting tool use (if tools is specified),
Response details
Non-streaming results
A successful non-streamed response includes the following members:
id
: [string] A unique ID for the request (not the message). Repeated identical requests get different IDs. However, for a streaming response, the ID will be the same for all responses in the stream.model
: [string] The model used to generate the response.choices
: [list[object]] One or more responses, depending on the n parameter from the request. Each response includes the following members:index
: [integer] Zero-based index of the message in the list of messages. Note that this might not correspond with the position in the response list.message
: [object] The message generated by the model.role
content
tool_calls
[array of objects or null] If present, this will be any tool calls generated by the model. Tool calls are possible only if you specified a tools parameter in the request. The tool calls made or generated apply only to the current message, and the values returned here should be added to the message thread in both the assistant message tool_calls field, and the tool message.id
[string] ID of the tool call, generated by the model.type
[string] The type of tool called. Currently the only possible value is "function".function
[object] The function called.name
[string]The name of the function, which you specified in your request.arguments
[object] A JSON object describing all the parameter names and values used to call the function. For a non-built-in function you must call the function yourself; use the values here, but always validate these parameters . These values are parsed from the user input, and might be invalid or malicious, so always check these values carefully.
finish_reason
: [string] Why the message ended. Possible reasons:stop
: The response ended naturally as a complete answer (due to end-of-sequence token) or because the model generated a stop sequence provided in the request.length
: The response ended by reaching max_tokens.
usage
: [object] The token counts for this request. Per-token billing is based on the prompt token and completion token counts and rates.prompt_tokens
: [integer] Number of tokens in the prompt for this request. Note that the prompt token includes the entire message history, plus extra tokens needed by the system when combining the list of prompt messages into a single message, as required by the model. The number of extra tokens is typically proportional to the number of messages in the thread, and should be relatively small.completion_tokens
: [integer]Number of tokens in the response message.total_tokens
: [integer] prompt_tokens + completion_tokens.
Streamed results
When you set stream=true
in the request, you will get a sequence of messages, each with one token generated by the model. Read more about streaming calls using the SDK. The last message will be data: [DONE]
. The other messages will have data set to a JSON object with the following fields:
data
: [object] An object containing either an object with the following members, or the string "DONE" for the last message.id
: [string] A unique ID for the request (not the message). Repeated identical requests get different IDs. However, for a streaming response, the ID will be the same for all responses in the stream.choices
: [object] An array with one object containing the following fields:index
: [integer] Always zero.delta
[object]- The first message in the stream will be an object set to
{"role":"assistant"}
. - Subsequent messages will have an object
{"content": __token__}
with the generated token.
- The first message in the stream will be an object set to
finish_reason
: [string] One of the following values:null
: All messages but the last will return null forfinish_reason
.stop
: The response ended naturally as a complete answer (due to end-of-sequence token) or because the model generated a stop sequence provided in the request.length
: The response ended by reachingmax_tokens
.
usage
: [object] The last message includes this field, which shows the total token counts for the request. Per-token billing is based on the prompt token and completion token counts and rates.prompt_tokens
: [integer] Number of tokens in the prompt for this request. Note that the token count includes extra tokens added by the system to format the input message list into the single string prompt required by the model. The number of extra tokens is typically proportional to the number of messages in the thread, and should be relatively small.completion_tokens
: [integer]Number of tokens in the response message.total_tokens
: [integer]prompt_tokens
+completion_tokens
.
Streaming example
Here is the (trimmed) response to a streaming request:
from ai21 import AI21Client
messages = [ChatMessage(content="Who was the first emperor of rome", role="user")]
client = AI21Client()
response = client.chat.completions.create(
messages=messages,
model="jamba-1.5-mini",
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content, end="")
{
"id": "chatcmpl-8zLI4FFBAAApK2mGJ1BJOrMrPZQ8N",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Sure! Here's an interesting fact: Did you know that honey never spoils? Archaeologists have"
},
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 26,
"completion_tokens": 20,
"total_tokens": 46
},
}