Conversational RAG

Conversational RAG will be deprecated by the end of JulyWe’re migrating RAG capabilities to the AI21 Maestro, which offers richer interaction and tool use. Conversational RAG will continue to work until the deprecation date.

Overview

The Conversational RAG endpoint enables you to build conversational experiences that interact with your organizational data. Using a chat interface allows your users to refine or ask follow-up questions, with context retained between questions. The solution leverages AI21’s RAG Engine, which ensures that answers are based solely on information from your documents. This system is designed to effortlessly extract the right information from your organizational data.
Just ask your question (and any follow-ups), and get clear, accurate answers—no need for prompt engineering or detailed system messages.

Request body

messages

ChatMessage[]

required

The message history of this chat, from oldest (index 0) to newest. Messages must be alternating user/assistant messages, starting with a user message. Maximum total size for the array is about 256K tokens. Each message includes the following fields.

Show properties

role

string

required

The role of the message author. One of the following values:user: This message is a user question.assistant: This message is a response generated by the model.

content

string

required

The content of the message.

path

string

If specified, only looks in documents with a path metadata that matches this path or path prefix. That is, specifying “/pets/” would match documents with the path “/pets/” or “/pets/dogs/”. Use to focus the question on specific path in your library. See filtering documents.

labels

string[]

Specify labels to restrict answers to documents with any of these labels. Labels are exact, case-sensitive matches, no substring matches. See filtering documents.

file_ids

string[]

Specify which files should be included in the results. See filtering documents.

max_segments

integer

The maximum total number of document segments to use in formulating the answer. More segments means greater potential accuracy at the cost of speed and more tokens. If not specified, the system uses an optimal default.

retrievalsimilarity_threshold

float

Range: 0.5 – 1.5 How “similar” a source segment should be to the query in order to be added to the context used to answer the question. Similarity is judged by the RAG engine’s embedding values of the question and the source. If not specified, the system uses an optimal default.

retrieval_strategy

enum

Determines the scope of text segments added to the context during retrieval.

Show child attributes

segments [default]: Retrieve segments within the retrieval_similarity_threshold.add_neighbors: Retrieve segments within the retrieval_similarity_threshold, plus neighboring segments. This helps provide more context to the model in order to provide a better answer, though the neighbors may be unrelated to the user query. Requires more memory than “segments” and may slow results. If specified, you can also set the max_neighbors value.full_doc: Use the entire document if it contains any information pertinent to the query. Highest potential accuracy, although extra information might move the answer more off topic than “add_neighbors”, and also might result in the slowest response speed.

max_neighbors

integer

Used only when retrieval_strategy = add_neighbors. Specifies how many neighbor segments to combine with each candidate segment when generating the context during the retrieval step. Neighbors have a different topic then the candidate segment, but including them can add more context to the LLM, and potentially provide a more coherent answer. The actual number of neighbors added might be less if the segment is close to the beginning or end of the document.

hybridsearch_alpha

float

Defines the ratio between dense and sparse retrieval values used when evaluating segments in the library for eligibility for the context. Dense values are the embedding value, a conceptual or topical representation of the segment, as represented by a large vector. Sparse values is more like keyword search within the segment. 1.0 means using only dense embeddings; 0.0 means using only sparse embeddings. If you want to limit your sources to those that use specific terms, and your answers seem too broad, you might lower this value slightly. If not specified, the system uses an optimal default. Range: 0.0 – 1.0

Filtering documents
You can filter the pool of potential documents by document ID, label, or path. Note that these are intersection filters — that is, if you specify both a label value and a path value, only documents with both the label and the path will be matched. The only variation is the labels parameter, where any label in the list can be matched.

Filters	Matching docs
`labels=["red", "green", "blue"]`	matches label “red” OR “green” OR “blue”
`labels=["red", "green", "blue"]` AND `path="/colors/"`	matches (label “red” OR “green” OR “blue”) AND path=“/colors/any/other/suffix”

Response details

A successful response includes the following fields:

answer_in_context

boolean

True if an answer was found in the provided documents, False if an answer could not be found. It can be simpler to check this value rather than to look at the response text and evaluate if it includes an answer.

context_retrieved

boolean

True if the RAG engine was able to find segments related to the user’s query.

choices

array of objects

An array with one object that holds the generated response.

message

object

Contains the following fields.

Show properties

role

string

Always assistant

content

string

The generated answer. If an answer cannot be found, it will say so in natural language.

string

A unique ID for the request (not the message). Repeated identical requests get different IDs. However, for a streaming response, the ID will be the same for all responses in the stream.

search_queries

string[] | null

The questions that the model extracted from the user input thread. The model extracts the question from the most recent user message, taking into account the entire message history. If there isn’t a relevant query in the message, this will be null and nothing will be retrieved.

sources

object[]

Each object represents a segment used to generate the answer. Each source object contains the following fields.

Show child attributes

file_id: The ID of the file in the RAG library that contains this segment.
file_name: The name of the source document in the RAG library that contains this segment.
text: The full text of the retrieved segment.
score: The similarity score between this segment and the user’s query.
public_url: A URL pointing to the source document (if available). This is the publicUrl metadata provided (if any) by the caller when they uploaded the file to the library.

usage

object

The token counts for this request. Per-token billing is based on the prompt token and completion token counts and rates.

Show properties

prompt_tokens

integer

Number of tokens in the prompt for this request. The prompt token contains the entire message history and extra tokens for combining messages, proportional to the number of messages.

completion_tokens

integer

Number of tokens in the response message.

total_tokens

integer

prompt_tokens and completion_tokens.

from ai21 import AI21Client
from ai21.models.chat import ChatMessage

messages = [
    ChatMessage(content="Ask a question about your files", role="user"),
]

client = AI21Client()

client.library.files.create(
  file_path="path/to/file",
  path="path/to/file/in/library",
  labels=["my_file_label"],
)
chat_response = client.beta.conversational_rag.create(
    messages=messages,
    labels=["my_file_label"],
)

Using the APIs

Foundation Models

AI21 Maestro [Beta]

File Library Management

Conversational RAG

Overview

Request body

Response details

Using the APIs

Foundation Models

AI21 Maestro [Beta]

File Library Management

Conversational RAG

​Overview

​Request body

​Response details

Overview

Request body

Response details