Command Line stable diffusion runs out of GPU memory but GUI version doesn't

Question:

I installed the GUI version of Stable Diffusion here. With it I was able to make 512 by 512 pixel images using my GeForce RTX 3070 GPU with 8 GB of memory:

However when I try to do the same thing with the command line interface, I run out of memory:

Input:
>> C:SDstable-diffusion-main>python scripts/txt2img.py --prompt "a close-up portrait of a cat by pablo picasso, vivid, abstract art, colorful, vibrant" --plms --n_iter 3 --n_samples 1 --H 512 --W 512

Error:

RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 8.00 GiB total capacity; 6.13 GiB already allocated; 0 bytes free; 6.73 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

If I reduce the size of the image to 256 X 256, it gives a result, but obviously much lower quality.

So part 1 of my question is why do I run out of memory at 6.13 GiB when I have 8 GiB on the card, and part 2 is what does the GUI do differently to allow 512 by 512 output? Is there a setting I can change to reduce the load on the GPU?

Thanks a lot,
Alex

Asked By: Alex S

Source

Answers:

Yes I faced the same issue and would like to know too. Doing some research it seems that it has to do with the number of workers that PyTorch is running. However, I’m not sure how to change the number of workers from the Stable Diffusion library or why it would be throwing this error in the first place. I followed the exact steps provided in https://www.howtogeek.com/830179/how-to-run-stable-diffusion-on-your-pc-to-generate-ai-images/#autotoc_anchor_2

Keeping an eye out

Answered By: inkblot

This might not be the only answer, but I solved it by using the optimized version here. If you already have the standard version installed, just copy the "OptimizedSD" folder into your existing folders, and then run the optimized txt2img script instead of the original:

>> python optimizedSD/optimized_txt2img.py --prompt "a close-up portrait of a cat by pablo picasso, vivid, abstract art, colorful, vibrant" --H 512 --W 512 --seed 27 --n_iter 2 --n_samples 10 --ddim_steps 50

It’s quite slow on my computer, but produces 512 X 512 images!

Thanks,
Alex

Answered By: Alex S

i get the same problem using the CPU, the process just seems to be killed when its consuming too much memory, so it may or may not be the number of workers as mentioned by @inkblot, but it seems to be not just a GPU or cuda problem either.

For me it also gets killed when i tried the optimisedSD script mentioned by @AlexS.

So im guessing both the scripts are probably not guarding for exorbitant memory consumption (where the machine runs out of total memory) and is just assuming it has enough, as most newer machines using CUDA on a GPU will.

My use case is i want it to execute to completion even if it takes much longer on my CPU as my machine cant use CUDA. so its possible that the processes memory usage should be capped and might need to be handled more sparingly on CPUs.

Answered By: jsky