Jamba-1.5 models
Jamba 1.5-Mini & Jamba 1.5-Large Model Details
Overview
Jamba is the world’s first production-grade Mamba based model. By enhancing Mamba Structured State Space model (SSM) technology with elements of the traditional Transformer architecture, Jamba compensates for the inherent limitations of a pure SSM model. Offering a 256K effective context window, it provides remarkable gains in throughput and efficiency. Notably, Jamba outperforms or matches other state-of-the-art models in its size class on a wide range of benchmarks.
Supported Languages
Jamba models officially support 9 languages:
- English, Spanish, French, Portuguese, Italian, Dutch, German, Arabic, Hebrew.
What Are They Good For?
Both Jamba 1.5 Mini and Jamba 1.5 Large models were trained on a massive corpus of text, making them highly versatile general purpose text-generators, capable of composing human-like text and solving complex tasks such as question answering, text summarization, information extraction, drafting, text classification and many others.
Jamba’s release marks two significant milestones in LLM innovation: successfully incorporating Mamba alongside the Transformer architecture and advancing the hybrid SSM-Transformer model to production-grade scale and quality.
Until now, LLMs have been primarily built on the conventional Transformer architecture. While undoubtedly powerful, this architecture presents two major drawbacks:
- Large memory footprint: Transformer's memory footprint scales with context length. This makes it challenging to run long context windows or numerous parallel batches without extensive hardware resources, limiting widespread opportunities to experiment and deploy.
- Slow inference as context grows: Transformer’s attention mechanism scales quadratically with sequence length and slows down throughput, as each token depends on the entire sequence that came before it—placing long context use cases outside the scope of efficient production.
Jamba’s release marks two significant milestones in LLM innovation:
- Successfully incorporating Mamba alongside the Transformer architecture.
- Advancing the hybrid SSM-Transformer model to production-grade scale and quality.
Model Details
Engineers and data scientists at AI21 labs created the model to help developers and businesses leverage AI to build real-world products with tangible value. Jamba 1.5 Mini and Jamba 1.5 Large supports zero-shot instruction-following and multi-language support. The Jamba models also provides developers with industry-leading APIs that perform a wide range of productivity tasks designed for commercial use.
- Organization developing model: AI21 Labs
- Model date: August 22nd, 2024
- Model size: 52B parameters (12B active)
- Model type: Joint Attention and Mamba (Jamba)
- Knowledge cutoff date March 5th, 2024
- Input Modality: Text
- Output Modality: Text
- License: Jamba open model license
- Contact: [email protected]
Intended Use
The Jamba family of models was designed and built primarily for developers and researchers who access its capabilities via the AI21 Studio API, our cloud partners or by running it on a local or private environment.
Given the limitations of Jamba (see below) and its wide range of general capabilities, AI21 has a set of responsible use guidelines and terms of use for our customers. We work with our customers to review their applications and have systems in place to redress issues and revoke access, if necessary.
While Jamba 1.5 Mini and Jamba 1.5 Large are general purpose language models, there are a number of tasks at which it excels. The following is a list of popular uses among our customers, but is by no means exhaustive.
Popular Use Cases
- Language modeling and completion: Text generation based on prompting/examples (zero-shot, multi-shot)
- Instruction following: Text generation and summarization based on natural language instructions
- Sentiment analysis: Categorization of text/document based on understanding its meaning
- Paraphrasing: Rewriting up to a full paragraph of text
- Summarization: Providing a summary of long-form articles and conversation transcripts
- Text recommendation: Offering improvements to a given text, for example increasing and diversifying vocabulary
- Grammatical error correction: Checking the grammar in written work
- Text segmentation: Splitting long pieces of text into appropriate segments based on topics
- Question answering and chat: Single and multi-turn conversations grounded in reference data
Performance and Benchmarks
Benchmark | Jamba 1.5 Mini | Jamba 1.5 Large |
---|---|---|
Arena Hard | 46.1 | 65.4 |
MMLU | 69.7 | 81.2 |
MMLU Pro | 42.5 | 53.5 |
GPQA | 32.3 | 36.9 |
ARC Challenge | 85.7 | 93 |
BFCL | 80.6 | 85.48 |
GSM-8K | 75.5 | 87 |
RealToxicity | 8.1 | 6.7 |
RULER Benchmark - Effective Context Length
Models | Claimed Length | Effective Length | 4K | 8K | 16K | 32K | 64K | 128K | 256K |
---|---|---|---|---|---|---|---|---|---|
Jamba 1.5 Large (94B/398B) | 256K | 256K | 96.7 | 96.6 | 96.4 | 96 | 95.4 | 95.1 | 93.9 |
Jamba 1.5 Mini (12B/52B) | 256K | 256K | 95.7 | 95.2 | 94.7 | 93.8 | 92.7 | 89.8 | 86.1 |
Gemini-1.5-pro | 1M | > 128K | 96.7 | 95.8 | 96 | 95.9 | 95.9 | 94.4 | -- |
GPT-4-1106-preview | 128K | 64K | 96.6 | 96.3 | 95.2 | 93.2 | 87 | 81.2 | -- |
Llama3.1 (70B) | 128K | 64K | 96.5 | 95.8 | 95.4 | 94.8 | 88.4 | 66.6 | -- |
Command-R-plus (104B) | 128K | 32K | 95.6 | 95.2 | 94.2 | 92 | 84.3 | 63.1 | -- |
Llama3.1 (8B) | 128K | 32K | 95.5 | 93.8 | 91.6 | 87.4 | 84.7 | 77 | -- |
Mistral-Large 2 (123B) | 128K | 32K | 96.2 | 96.1 | 95.1 | 93 | 78.8 | 23.7 | -- |
Mixtral-8x22B (39B/141B) | 64K | 32K | 95.6 | 94.9 | 93.4 | 90.9 | 84.7 | 31.7 | -- |
Mixtral-8x7B (12.9B/46.7B) | 32K | 32K | 94.9 | 92.1 | 92.5 | 85.9 | 72.4 | 44.5 | -- |
Distribution
Jamba 1.5 models are available as an API directly from the AI21 SaaS platform as well as on the following partners and platforms: Microsoft Azure, Google Cloud Vertex AI, Amazon Bedrock, Snowflake Cortex, NVIDIA NIM, LangChain and LlamaIndex, plus coming soon to Databricks Marketplace and Together.AI.
Safety Benchmarks
Benchmark | Jamba 1.5 Mini | Jamba 1.5 Large |
---|---|---|
RealToxicity* | 8.1 | 6.7 |
TruthfulQA (0-shot) | 54.1 | 58.3 |
*Lower score is better
Model Compliance and Certifications
- SOC2 compliance
- ISO 27001, ISO 27017, and ISO 27018 certifications
Ethical Considerations
AI21 Labs is on a mission to supercharge human productivity with machines working alongside humans as thought partners, thereby promoting human welfare and prosperity. To deliver its promise, this technology must be deployed and used in a responsible and sustainable way, taking into consideration potential risks, including malicious use by bad actors, accidental misuse and broader societal harms. We take these risks extremely seriously and put measures in place to mitigate them.
AI21 provides open access to Jamba that can be used to power a large variety of useful applications. We believe it is important to ensure that this technology is used in a responsible way, while allowing developers the freedom they need to experiment rapidly and deploy solutions at scale. Overall, we view the safe implementation of this technology as a partnership and collaboration between AI21 and our customers and encourage engagement and dialogue to raise the bar on responsible usage.
In order to use Jamba, you are required to comply with our Terms of Use and with the following usage guidelines. Provided you comply with these requirements, you may use Jamba to power applications with live users without any additional approval. We reserve the right to limit or suspend your access to Jamba at any time where we believe these terms or guidelines are violated.
Please check these usage guidelines periodically, as they may be updated from time to time. For any questions, clarifications or concerns, please contact [email protected].
Limitations
There are a number of limitations inherent to neural networks technology that apply to Jamba. These limitations require explanation and carry important caveats for the application and usage of Jamba.
- Accuracy: Jamba, like other large pretrained language models, lacks important context about the world because it is trained on textual data and is not grounded in other modalities of experience such as video, real-world physical interaction, and human feedback. Like all language models, Jamba is far more accurate when responding to inputs similar to its training datasets. Novel inputs have a tendency to generate higher variance in its output.
- Coherence and consistency: Responses from Jamba are sometimes inconsistent, contradictory, or contain seemingly random sentences and paragraphs.
- Western/English bias: Jamba is trained primarily on English language text from the internet, and is best suited to classifying, searching, summarizing, and generating English text. Furthermore, Jamba has a tendency to hold and amplify the biases contained in its training dataset. As a result, groups of people who were not involved in the creation of the training data can be underrepresented, and stereotypes and prejudices can be perpetuated. Racial, religious, gender, socioeconomic, and other categorizations of human groups can be considered among these factors.
- Explainability: It is difficult to explain or predict how Jamba will respond without additional training and fine tuning. This is a common issue with neural networks of this scope and scale.
- Recency: Jamba was trained on a dataset created in March 2024, and therefore has no knowledge of events that have occurred after that date. We update our models regularly to keep them as current as possible, but there are notable gaps and inaccuracies in responses as a result of this lack of recency.
Given these known challenges presented by neural network technology, we've developed usage guidelines to minimize harm and address the limitations of Jamba.
Updated 4 months ago