Rate limits

Rate limits serve as essential mechanisms to control the flow of requests made to a system, ensuring its stability and fair usage. They impose restrictions on the number of requests that can be made within a specific time frame. Rate limits are commonly measured in terms of RPM (requests per minute) or RPS (requests per second), indicating the maximum allowed number of requests a user or application can make during that time.

Below are the rate limits for all of our models. If you require a higher limit for your particular use case, please don't hesitate to get in touch with us at [email protected].

Foundation models - Jurassic-2

In the case of our foundation models, there are limits per second (RPS) and per minute (RPM):

Foundation ModelRPSRPM
Jurassic-2 Light20480
Jurassic-2 Mid20480
Jurassic-2 Ultra5180

In the case of custom models, there are limits per second (RPS) and per minute (RPM) (depending on the base model):

Custom Model (Based on):RPSRPM
Jurassic-2 Light20480
Jurassic-2 Mid20480
Jurassic-2 Ultra5180

Task-specific Models

For the task-specific models, there are only limitations per minute:

Task-specific ModelRPM
Paraphrase30
Grammatical Error Correction (GEC)100
Text Improvements30
Summarize30
Summarize by Segment30
Text Segmentation200
Contextual Answers100
Semantic Search100
Embeddings30
Document Library (upload)100