What is Tokenization?
Tokenization is both the first and final step in language model processing. Since machine learning models can only work with numerical data, text must be converted into numbers that models can understand and manipulate. The tokenization process breaks down text into smaller units called tokens, which can represent:- Words or subwords: “hello” →
[15496]
- Characters: “AI” →
[32, 73]
- Byte-level representations: For handling any Unicode text across all languages
When is Tokenization Used?
Tokenization serves as both the entry point and exit point of text processing in language models. Since models can only work with numerical data, text must be converted into tokens with corresponding numerical indices from the tokenizer’s vocabulary. In a standard language model workflow:- Encoding Phase: We first convert input text into tokens using a tokenizer. Each token receives a unique index number that the model can process.
-
Model Processing: The tokenized input flows through the model architecture:
- Embedding layer: Transforms tokens into dense vector representations that capture semantic relationships
- Transformer blocks: Process these vectors to understand context, relationships, and generate meaningful responses
- Decoding Phase: Finally, we convert the model’s output tokens back into readable text by mapping token indices back to their corresponding words or subwords using the tokenizer’s vocabulary.
AI21’s Tokenizer
We provides a AI21-Tokenizer specifically engineered for Jamba models.Key Features
- Jamba Mini and Large support
- Async/sync operations:
- Production-ready: Enterprise-grade reliability
Installation
Prerequisites
To use tokenizers for Jamba, you’ll need access to the relevant model’s HuggingFace repository.
Install the Tokenizer
Model-Specific Tokenizers
Choose the appropriate tokenizer for your Jamba model:Basic Usage
1
Encode Text to Tokens
2
Decode Tokens to Text
Asynchronous Usage
For high-performance/server applications, use the async tokenizer:Practical Use Cases
- Cost estimation: Calculate API usage costs based on token consumption
- Prompt optimization: Ensure prompts fit within model context limits