This model is in private preview. Sign up for private preview waitlist here.
Sending a Request
To solicit a response to a set of messages, dispatch an HTTP request. This request should encompass:
- A sequence of text messages.
- Relevant parameters to modulate the generation of text.
For authentication, it's imperative to include your API key within the request headers.
Post submission of your request, anticipate a response encompassing the generated text message.
Request Parameters
model
| str
: The name of the model to use. Required.
Currently, the only option is `jamba-instruct-preview`.
messages
| List[ChatMessage]
: A list of messages that build up the message so far. Required.
The ChatMessage
class has the following attributes:
role
| str- A string representation the role of the message author. Can. be one of
user
,assistant
,system
.
- A string representation the role of the message author. Can. be one of
content
| str- The content of the message. For all roles except
system
, this can’t be an empty string. For thesystem
role, if this is an empty string, we ignore it.
- The content of the message. For all roles except
In the payload, both role
and content
are mandatory for every message.
The messages list should have at least 1 `user`/`assistant` message.
max_tokens
| int
: The maximum number of tokens to generate for each completion. Optional, default = 4096.
Must be under 64K.
temperature
| float
: Modifies the distribution from which tokens are sampled. Optional, default = 1.0.
Setting temperature to 1.0 samples directly from the model distribution. Lower (higher) values increase the chance of sampling higher (lower) probability tokens. A value of 0 essentially disables sampling and results in greedy decoding, where the most likely token is chosen at every step.
top_p
| float
: Sample tokens from the corresponding top percentile of probability mass. Optional, default = 1.0.
For example, a value of 0.9 will only consider tokens comprising the top 90% probability mass.
stop
| Union[str, List[str]
: Stop the generation when the model generates one of these strings. Optional.
For example, to stop at a comma or a new line use `[".", "\n"]`.
Response Parameters
The response will have the following properties:
id
| str
: A unique string id for the processed request. Repeated identical requests get different ids.
choices
| List[ChatCompletionResponseChoice]
: A list representing the completions generated by the model.
The ChatCompletionReponseChoice
class has the following attributes:
index
| int- The index of the completion. For parsing a single completion, use
index=0
.
- The index of the completion. For parsing a single completion, use
message
| ChatMessage- The message generated by the model.
finish_reason
| UsageInfo- Representing why the generation was stopped. The options:
stop
- the generation stopped naturally (due to end of sequence token) or by generating a stop sequence.length
- the generation stopped due to reachingmax_tokens
.
- Representing why the generation was stopped. The options:
usage
| str
: Usage statistics for the request.
Example Request
import requests
url = "https://api.ai21.com/studio/v1/chat/completions"
{
"model": "jamba-instruct-preview",
"messages": [
{
"role": "user",
"content": "Tell me something I don't know"
}
],
"max_tokens": 200,
"temperature": 1,
"top_p": 1,
"stop": None,
}
Example Response
{
"id": "chatcmpl-8zLI4FFBAAApK2mGJ1BJOrMrPZQ8N",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Sure! Here's an interesting fact: Did you know that honey never spoils? Archaeologists have"
},
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 26,
"completion_tokens": 20,
"total_tokens": 46
},
}