Rate limits

Rate limits serve as essential mechanisms to control the flow of requests made to a system, ensuring its stability and fair usage. Rate limits restrict the number of requests that can be made within a specific time frame. Rate limits are commonly measured by RPM (requests per minute) or RPS (requests per second), indicating the maximum allowed number of requests a user or application can make during that time.

Below are the rate limits for all of our models when accessed on the AI21 Platform via our SDK or REST endpoints. If you require a higher limit for your particular use case, please don't hesitate to get in touch with us at [email protected].

πŸ“˜

Cloud-based services

Cloud providers have their own rate limits. For instance, Amazon SageMaker rate limits are determined by the instance that you deploy to hold the model. Amazon Bedrock has their own pricing and rate limits for AI21 model usage. See your cloud provider's documentation for details.

Foundation models

Foundation models have usage limits per second (RPS) and per minute (RPM):

Foundation ModelRPSRPM
Jamba260
Jurassic-2 Light20480
Jurassic-2 Mid20480
Jurassic-2 Ultra5180

Task-specific models

Task-specific models have usage limits per minute (RPM):

Task-specific ModelRPM
Paraphrase30
Grammatical Error Correction (GEC)100
Text Improvements30
Summarize30
Summarize by Segment30
Text Segmentation200
Contextual Answers100
Semantic Search100
Embeddings30
Document Library (upload)100

Custom models (not currently available)

In the case of custom models, there are limits per second (RPS) and per minute (RPM) depending on the base model:

Custom Model (Based on):RPSRPM
Jurassic-2 Light20480
Jurassic-2 Mid20480
Jurassic-2 Ultra5180