Choosing the Right Instance Type

How to choose the right instance type in Amazon SageMaker

Each of our models can be run in multiple instances. When you have decided on a model, choosing the right instance is mainly a matter of economics. Depending on your use case, you probably want the most cost-effective instance possible.

Note: Not all instances are available in all regions. Also, ml.p4de being in preview, and you need to ask from Amazon SageMaker access.

You can find the recommended instances per model below based on your input size and desired throughput, divided into Foundation models (out Jurassic-2 large language models series) and Task-specific models.

Foundation models

Our large language models support a context window of up to 8191 tokens. In large language models the context window refers to the limited amount of tokens the model considers when generating a response. The context window acts as a threshold for the amount of tokens in the prompt and the completion, namely: prompt + completion <= context window. As an estimate for your use-case: the average token size in our tokenizer is six characters. You can also check any text you want using our /tokenize API.

Jurassic-2 Ultra (formerly Jumbo Instruct)

Recommended instances based on maximum context window:

Instance / Context window204840968191
ml.g5.48xlarge
ml.p4d.24xlarge
ml.p4de.24xlarge

Jurassic-2 Mid (formerly Grande Instruct)

Recommended instances based on maximum context window:

Instance / Context window204840968191
ml.g4dn.12xlarge
ml.g5.12xlarge
ml.g5.48xlarge
ml.p4d.24xlarge

Jurassic-2 Light

Recommended instances based on maximum context window:

Instance / Context window40968191
ml.g5.12xlarge
ml.p4d.24xlarge

Deprecated models

Jurassic-2 Jumbo (Deprecated - use Ultra instead)

Recommended instances based on maximum context window:

Instance / Context window204840968191
ml.g5.48xlarge
ml.p4d.24xlarge
ml.p4de.24xlarge

Jurassic-2 Grande (Deprecated - use Mid instead)

Recommended instances based on maximum context window:

Instance / Context window204840968191
ml.g5.12xlarge
ml.g5.48xlarge
ml.p4d.24xlarge

Jurassic-2 Large (Deprecated - use Light instead)

Recommended instances based on maximum context window:

Instance / Context window40968191
ml.g5.12xlarge
ml.p4d.24xlarge

Task-specific models

As plug-and-play and easy to use APIs, our task-specific models work with specified inputs and outputs. The input restrictions are therefore based on characters.

AI21 Contextual-answers

Recommended instances based on maximum characters in the context:

Instance / Input characters10K20K40K
ml.g5.12xlarge
ml.g5.48xlarge
ml.p4d.24xlarge

AI21 Summarize

Recommended instances based on the number of context characters:

Instance / Input characters0-10K10-50K
ml.g4dn.4xlarge
ml.g4dn.12xlarge

AI21 Paraphrase

Currently, the input text is limited to 500 chars. The recommended instance is ml.g4dn.2xlarge.

AI21 Grammatical Error Correction (GEC)

Currently, the input text is limited to 500 chars. The recommended instances are ml.g4dn.2xlarge (cheaper) or ml.g5.2xlarge (faster).