Skip to content

[Bug] Segmentation fault due unsufficient memory #1094

@akleine

Description

@akleine

Git commit

43a70e8

Operating System & Version

Ubuntu 24.04

GGML backends

CUDA

Command-line arguments used

./build/bin/sd-cli -m ~/SD_models/sd3/sd3.5_large-iq4_nl.gguf --t5xxl ~/SD_models/flux/t5xxl_q4_k.gguf -v -p "A cute cat"

Steps to reproduce

start above mentioned command (CUDA compiled) on a 8GB VRAM machine

Note:
t5xxl_q4_k.gguf comes from https://huggingface.co/Green-Sky/flux.1-schnell-GGUF/blob/main/t5xxl_q4_k.gguf
and sd3.5_large-iq4_nl.gguf comes from https://huggingface.co/stduhpf/SD3.5-Large-GGUF-mixed-sdcpp/blob/main/legacy/sd3.5_large-iq4_nl.gguf

What you expected to happen

a clean exit with an error message

(or maybe a complete run, but that is not on topic here)

What actually happened

crash with message:
Segmentation fault (core dumped) ./build/bin/sd-cli -m ~/SD_models/sd3/sd3.5_large-iq4_nl.gguf --t5xxl ~/SD_models/flux/t5xxl_q4_k.gguf -v -p "A cute cat"

Logs / error messages / stack trace

[INFO ] stable-diffusion.cpp:228 - loading model from '/home/xxx/SD_models/sd3/sd3.5_large-iq4_nl.gguf'
[INFO ] model.cpp:370 - load /home/xxx/SD_models/sd3/sd3.5_large-iq4_nl.gguf using gguf format
[DEBUG] model.cpp:412 - init from '/home/xxx/SD_models/sd3/sd3.5_large-iq4_nl.gguf'
[INFO ] stable-diffusion.cpp:275 - loading t5xxl from '/home/xxx/SD_models/flux/t5xxl_q4_k.gguf'
[INFO ] model.cpp:370 - load /home/xxx/SD_models/flux/t5xxl_q4_k.gguf using gguf format
[DEBUG] model.cpp:412 - init from '/home/xxx/SD_models/flux/t5xxl_q4_k.gguf'
[INFO ] stable-diffusion.cpp:312 - Version: SD3.x
[INFO ] stable-diffusion.cpp:340 - Weight type stat: f32: 192 | f16: 395 | q4_K: 218 | iq4_nl: 581
[INFO ] stable-diffusion.cpp:341 - Conditioner weight type stat: f16: 1 | q4_K: 218
[INFO ] stable-diffusion.cpp:342 - Diffusion model weight type stat: f16: 394 | iq4_nl: 529
[INFO ] stable-diffusion.cpp:343 - VAE weight type stat: f32: 192 | iq4_nl: 52
[DEBUG] stable-diffusion.cpp:345 - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:160 - vocab size: 49408
[DEBUG] clip.hpp:171 - trigger word img already in vocab
[DEBUG] clip.hpp:160 - vocab size: 49408
[DEBUG] clip.hpp:171 - trigger word img already in vocab
[INFO ] mmdit.hpp:690 - MMDiT layers: 38 (including 0 MMDiT-x layers)
[DEBUG] ggml_extend.hpp:1883 - t5 params backend buffer size = 2986.77 MB(VRAM) (219 tensors)
[ERROR] ggml_extend.hpp:83 - ggml_backend_cuda_buffer_type_alloc_buffer: allocating 4779.80 MiB on device 0: cudaMalloc failed: out of memory
[ERROR] ggml_extend.hpp:83 - alloc_tensor_range: failed to allocate CUDA0 buffer of size 5011982336
[ERROR] ggml_extend.hpp:1877 - mmdit alloc params backend buffer failed, num_tensors = 923
[DEBUG] ggml_extend.hpp:1883 - vae params backend buffer size = 94.57 MB(VRAM) (138 tensors)
[DEBUG] stable-diffusion.cpp:688 - loading weights
[DEBUG] model.cpp:1351 - using 8 threads for model loading
[DEBUG] model.cpp:1373 - loading tensors from /home/xxx/SD_models/sd3/sd3.5_large-iq4_nl.gguf
|> | 7/1386 - 7000.00it/s
Segmentation fault (core dumped) ./build/bin/sd-cli -m ~/SD_models/sd3/sd3.5_large-iq4_nl.gguf --t5xxl ~/SD_models/flux/t5xxl_q4_k.gguf -v -p "A cute cat"

Additional context / environment details

CUDA 8 GB VRAM
By the way using option --offload-to-cpu it runs complete with saving an image file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions