Quantization reduces model memory usage by representing weights with lower precision. Learn how to use quantization techniques with Jamba models for efficient inference and training.
Load Pre-quantized FP8 Model
Generate Text
Load Model with ExpertsInt8
Generate Text
Configure 8-bit Quantization
Load Model with Quantization
Run Inference
llm_int8_skip_modules=["mamba"]
.