Rate limits

Rate limits serve as essential mechanisms to control the flow of requests made to a system, ensuring its stability and fair usage. They impose restrictions on the number of requests that can be made within a specific time frame. Rate limits are commonly measured in terms of RPM (requests per minute) or RPS (requests per second), indicating the maximum allowed number of requests a user or application can make during that time.

Below are the rate limits for all of our models. If you require a higher limit for your particular use case, please don't hesitate to get in touch with us at [email protected].

Foundation models

Foundation models have usage limits per second (RPS) and per minute (RPM):

Foundation ModelRPSRPM
Jamba260
Jurassic-2 Light20480
Jurassic-2 Mid20480
Jurassic-2 Ultra5180

Task-specific models

Task-specific models have usage limits per minute (RPM):

Task-specific ModelRPM
Paraphrase30
Grammatical Error Correction (GEC)100
Text Improvements30
Summarize30
Summarize by Segment30
Text Segmentation200
Contextual Answers100
Semantic Search100
Embeddings30
Document Library (upload)100

Custom models (not currently available)

In the case of custom models, there are limits per second (RPS) and per minute (RPM) depending on the base model:

Custom Model (Based on):RPSRPM
Jurassic-2 Light20480
Jurassic-2 Mid20480
Jurassic-2 Ultra5180