Flux2 INT8 Acceleration

This node speeds up Flux2, Chroma, Z-Image in ComfyUI by using INT8 quantization, delivering ~2x faster inference on my 3090, but it should work on any NVIDIA GPU with enough INT8 TOPS. It's unlikely to be faster than proper FP8 on 40-Series and above. Works with lora*, torch compile (needed to get full speedup).

*LoRAs need to be applied using one of the following methods:

Option 1: Included INT8 LoRA Node (Recommended for Speed)

Performance: Faster inference
Quality: Possibly slightly lower quality
Use the included INT8 LoRA node

Option 2A: Included Int8 Dynamic LoRa Node

Performance: ~1.15x slower due to dynamic calculations
Quality: Possibly slightly higher quality

Option 2B: KohakuBlueleaf's Node

Performance: ~1.15x slower due to dynamic calculations
Quality: Possibly slightly higher quality
Requires the node from KohakuBlueleaf's PR #11958

We auto-convert flux2 klein to INT8 on load if needed. Pre-quantized checkpoints with slightly higher quality and enabling faster loading are available here:

https://huggingface.co/bertbobson/FLUX.2-klein-9B-INT8-Comfy

https://huggingface.co/bertbobson/Chroma1-HD-INT8Tensorwise

https://huggingface.co/bertbobson/Z-Image-Turbo-INT8-Tensorwise

Metrics:

Measured on a 3090 at 1024x1024, 26 steps with Flux2 Klein Base 9B.

Format	Speed (s/it)	Relative Speedup
bf16	2.07	1.00×
bf16 compile	2.24	0.92×
fp8	2.06	1.00×
int8	1.64	1.26×
int8 compile	1.04	1.99×
gguf8_0 compile	2.03	1.02×

Measured on an 8gb 5060, same settings:

Format	Speed (s/it)	Relative Speedup
fp8	3.04	1.00×
fp8 fast	3.00	1.00×
fp8 compile	couldn't get to work	??×
int8	2.53	1.20×
int8 compile	2.25	1.35×

Requirements:

Working ComfyKitchen (needs latest comfy and possibly pytorch with cu130)

Triton

Windows untested, but I hear triton-windows exists.

Credits:

dxqb for the entirety of the INT8 code, it would have been impossible without them:

Nerogar/OneTrainer#1034

If you have a 30-Series GPU, OneTrainer is also the fastest current lora trainer thanks to this. Please go check them out!!

silveroxides for providing a base to hack the INT8 conversion code onto.

https://github.com/silveroxides/convert_to_quant

Also silveroxides for showing how to properly register new data types to comfy

https://github.com/silveroxides/ComfyUI-QuantOps

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
LICENSE		LICENSE
README.md		README.md
Workflow.png		Workflow.png
__init__.py		__init__.py
int8_dynamic_lora.py		int8_dynamic_lora.py
int8_fused_kernel.py		int8_fused_kernel.py
int8_lora.py		int8_lora.py
int8_quant.py		int8_quant.py
int8_unet_loader.py		int8_unet_loader.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flux2 INT8 Acceleration

Option 1: Included INT8 LoRA Node (Recommended for Speed)

Option 2A: Included Int8 Dynamic LoRa Node

Option 2B: KohakuBlueleaf's Node

Metrics:

Requirements:

Credits:

dxqb for the entirety of the INT8 code, it would have been impossible without them:

silveroxides for providing a base to hack the INT8 conversion code onto.

Also silveroxides for showing how to properly register new data types to comfy

The unholy trinity of AI slopsters I used to glue all this together over the course of a day

About

Uh oh!

Releases

Packages

Languages

License

EricBCoding/ComfyUI-Flux2-INT8

Folders and files

Latest commit

History

Repository files navigation

Flux2 INT8 Acceleration

Option 1: Included INT8 LoRA Node (Recommended for Speed)

Option 2A: Included Int8 Dynamic LoRa Node

Option 2B: KohakuBlueleaf's Node

Metrics:

Requirements:

Credits:

dxqb for the entirety of the INT8 code, it would have been impossible without them:

silveroxides for providing a base to hack the INT8 conversion code onto.

Also silveroxides for showing how to properly register new data types to comfy

The unholy trinity of AI slopsters I used to glue all this together over the course of a day

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages