Convert safetensors weights to quantized formats (FP8, INT8, NVFP4, MXFP8) with learned rounding optimization for ComfyUI inference.
pip install convert-to-quantOr install from source:
git clone https://github.com/silveroxides/convert_to_quant.git
cd convert_to_quant
pip install -e .| Feature | Requirement |
|---|---|
| Minimum (FP8/INT8) | Python 3.10+, PyTorch 2.8+, CUDA 12.8+ |
| Full (NVFP4/MXFP8) | Python 3.12+, PyTorch 2.10+, CUDA 13.0+, comfy-kitchen |
| INT8 Kernels | Triton (Linux native, Windows via triton-windows) |
Important
PyTorch must be installed manually with the correct CUDA version for your GPU. This package does not install PyTorch automatically to prevent environment conflicts.
Visit pytorch.org to get the correct install command.
Examples:
# CUDA 13.0 (Required for Blackwell NVFP4/MXFP8)
pip install torch --index-url https://download.pytorch.org/whl/cu130
# CUDA 12.8 (Stable)
pip install torch --index-url https://download.pytorch.org/whl/cu128
# CPU only
pip install torch --index-url https://download.pytorch.org/whl/cpu# Linux
pip install -U triton
# Windows for torch 2.10 and 2.11
pip install -U "triton-windows<3.7"
# Windows for torch 2.12
pip install -U "triton-windows<3.8"# All examples include metadata and comfy_quant layers for ComfyUI compatible quantization.
# Examples utilize low memory overhead argument to reduce peak RAM/VRAM usage.
# Basic FP8 Tensorcore quantization without learned rounding
ctq -i model.safetensors -o model-fp8mixed.safetensors --comfy_quant --save-quant-metadata --simple --low-memory
# INT8 Row-Wise quantization without learned rounding
ctq -i model.safetensors -o model-int8mixedrow.safetensors --int8 --scaling_mode row --comfy_quant --save-quant-metadata --simple --low-memory
# Blackwell MXFP8 quantization without learned rounding
ctq -i model.safetensors -o model-mxfp8mixed.safetensors --mxfp8 --comfy_quant --save-quant-metadata --simple --low-memory# Example modular usage of INT8 Row-Wise quantization of Flux2 Klein 9B
from convert_to_quant import quantize
quantize(
input="./flux-2-klein-9b.safetensors",
output="./flux-2-klein-9b-int8mixedrow.safetensors",
comfy_quant=True,
save_quant_metadata=True,
verbose="VERBOSE",
low_memory=True,
int8=True,
scaling_mode="row",
flux2=True,
simple=True,
calib_samples=8192
)Load the output .safetensors file in ComfyUI like any other model.
| Format | CLI Flag | Hardware | Optimization |
|---|---|---|---|
| FP8 (E4M3) | (default) | Ada/Hopper+ | Learned Rounding (SVD) |
| INT8 Block-wise | --int8 |
Any GPU | Learned Rounding (SVD) |
| INT8 Tensor-wise | --int8 --scaling_mode tensor |
Any GPU | High-perf _scaled_mm |
| NVFP4 (4-bit) | --nvfp4 |
Blackwell | Dual-scale optimization |
| MXFP8 | --mxfp8 |
Blackwell | Microscaling (E8M0) |
For a deep dive into how these formats work, see FORMATS.md.
| Model | Flag | Notes |
|---|---|---|
| Flux.2 | --flux2 |
Keep modulation/guidance/time/final high-precision |
| T5-XXL | --t5xxl |
Decoder removed |
| Hunyuan Video | --hunyuan |
Attention norms excluded |
| WAN Video | --wan |
Time embeddings excluded |
(See --help-filters for a full list of presets)
- Learned Rounding: SVD-based optimization minimizes quantization error.
- Bias Correction: Automatic bias adjustment using synthetic calibration data.
- Model-Specific Support: Exclusion lists for sensitive layers (norms, embeddings).
- Three-Tier Quantization: Mix different formats per layer using
--custom-layers.
Define specific excluded layers with regex patterns for models with no exclusion preset(This is just example):
ctq -i model.safetensors --exclude-layers "(double_blocks.[01]|final_layer|txt_attn.proj)" --comfy_quant# Block-wise scaling for better accuracy
ctq -i model.safetensors --scaling-mode block --block_size 64 --comfy_quantSpecial thanks to:
- Clybius – For Learned-Rounding inspiration.
- lyogavin – For ComfyUI
int8_blockwisesupport.
MIT License