Chat Request
Overview
The Jamba API provides access to a set of instruction-following chat models. It describes the details for interacting with the chat model via the API endpoint and provides specifications for request and response structures.
Request body
The name of the model to use.
You can call our model without specifying a version by using the following model names:
jamba-large
jamba-mini
For more information on the available model versions, click here.
A list of messages representing the conversation history. The structure of the message object depends on the type:
A list of tools that the model can use when generating a response.
Currently, the only function type tools are supported.
The document parameter accepts a list of objects, each containing multiple fields.
An object defining the output format required from the model.
Setting it to { "type": "json_object" }
activates JSON mode, ensuring the generated message adheres to valid JSON structure.
The maximum number of tokens the model can generate in its response.
For Jamba models, the maximum allowed value is 4096 tokens.
Controls the variety of responses provided—a higher value results in more diverse answers.
Default: 0.4, Range: 0.0–2.0
More information
Limit the pool of next tokens in each step to the top N percentile of possible tokens, where 1.0 means the pool of all possible tokens, and 0.01 means the pool of only the most likely next tokens.
Default: 1.0, Range: 0 <= value <=1.0
More information
End the message when the model generates one of these strings. The stop sequence is not included in the generated message. Each sequence can be up to 64K long, and can contain newlines as n
characters.
Show more
Show more
- Single stop string with a word and a period: “monkeys.”
- Multiple stop strings and a newline: [“cat”, “dog”, ” .”, ”####”, “\n”]
How many chat responses to generate. Default:1, Range: 1 – 16.
Notes:
- If
n > 1
, settingtemperature = 0
will fail because all answers are guaranteed to be duplicates. n
must be 1 whenstream = True
Stream results one token at a time using server-sent events. Useful for long results to avoid long wait times. If True
, n
must be 1. Must be False
if using tools
.