Context size on different instance types
Each of our models can be run in multiple instances. When you have decided on a model, choosing the right instance is mainly a matter of economics. Depending on your use case, you probably want the most cost-effective instance possible.
Note: Not all instances are available in all regions. Also, ml.p4de being in preview, and you need to ask from Amazon SageMaker access.
You can find the recommended instances per model below based on your input size and desired throughput, divided into Foundation models (out Jurassic-2 large language models series) and Task-specific models.
Foundation models
Our large language models support a context window of up to 8191 tokens. In large language models the context window refers to the limited amount of tokens the model considers when generating a response. The context window acts as a threshold for the amount of tokens in the prompt and the completion, namely:
prompt + completion <= context window
As an estimate for your use-case: the average token size in our tokenizer is six characters. You can also check any text you want using our tokenizer in the SDK.
Jurassic-2 Ultra (formerly Jumbo Instruct)
Recommended instances based on maximum context window:
Instance / Context window | 2048 | 4096 | 8191 |
---|---|---|---|
ml.g5.48xlarge | ✅ | ❌ | ❌ |
ml.p4d.24xlarge | ✅ | ✅ | ✅ |
ml.p4de.24xlarge | ✅ | ✅ | ✅ |
Jurassic-2 Mid (formerly Grande Instruct)
Recommended instances based on maximum context window:
Instance / Context window | 2048 | 4096 | 8191 |
---|---|---|---|
ml.g4dn.12xlarge | ✅ | ❌ | ❌ |
ml.g5.12xlarge | ✅ | ✅ | ✅ |
ml.g5.48xlarge | ✅ | ✅ | ❌ |
ml.p4d.24xlarge | ✅ | ✅ | ✅ |
Jurassic-2 Light
Recommended instances based on maximum context window:
Instance / Context window | 4096 | 8191 |
---|---|---|
ml.g5.12xlarge | ✅ | ❌ |
ml.p4d.24xlarge | ✅ | ✅ |
Deprecated models
Jurassic-2 Jumbo (Deprecated - use Ultra instead)
Recommended instances based on maximum context window:
Instance / Context window | 2048 | 4096 | 8191 |
---|---|---|---|
ml.g5.48xlarge | ✅ | ❌ | ❌ |
ml.p4d.24xlarge | ✅ | ✅ | ❌ |
ml.p4de.24xlarge | ✅ | ✅ | ✅ |
Jurassic-2 Grande (Deprecated - use Mid instead)
Recommended instances based on maximum context window:
Instance / Context window | 2048 | 4096 | 8191 |
---|---|---|---|
ml.g5.12xlarge | ✅ | ❌ | ❌ |
ml.g5.48xlarge | ✅ | ✅ | ❌ |
ml.p4d.24xlarge | ✅ | ✅ | ✅ |
Jurassic-2 Large (Deprecated - use Light instead)
Recommended instances based on maximum context window:
Instance / Context window | 4096 | 8191 |
---|---|---|
ml.g5.12xlarge | ✅ | ❌ |
ml.p4d.24xlarge | ✅ | ✅ |
Task-specific models
As plug-and-play and easy to use APIs, our task-specific models work with specified inputs and outputs. The input restrictions are therefore based on characters.
AI21 Contextual-answers
Recommended instances based on maximum characters in the context:
Instance / Input characters | 10K | 20K | 40K |
---|---|---|---|
ml.g5.12xlarge | ✅ | ❌ | ❌ |
ml.g5.48xlarge | ✅ | ✅ | ❌ |
ml.p4d.24xlarge | ✅ | ✅ | ✅ |
AI21 Summarize
Recommended instances based on the number of context characters:
Instance / Input characters | 0-10K | 10-50K |
---|---|---|
ml.g4dn.4xlarge | ❌ | ✅ |
ml.g4dn.12xlarge | ✅ | ✅ |
AI21 Paraphrase
Currently, the input text is limited to 500 chars. The recommended instance is ml.g4dn.2xlarge.
AI21 Grammatical Error Correction (GEC)
Currently, the input text is limited to 500 chars. The recommended instances are ml.g4dn.2xlarge (cheaper) or ml.g5.2xlarge (faster).
WHAT’S NEXT
Tell your users what they should do after they've finished this page
Updated 3 months ago