Changelog

Highlights for 2024-08-31

Summer break is over and we are back with a massive update!

Support for all of the new models:

What else? Just a bit… ;)

New fast-install mode, new Optimum Quanto and BitsAndBytes based quantization modes, new balanced offload mode that dynamically offloads GPU<->CPU as needed, and more…
And from previous service-pack: new ControlNet-Union all-in-one model, support for DoRA networks, additional VLM models, new AuraSR upscaler

Breaking Changes…

Due to internal changes, you’ll need to reset your attention and offload settings!
But…For a good reason, new balanced offload is magic when it comes to memory utilization while sacrificing minimal performance!

Details for 2024-08-31

New Models…

To use and of the new models, simply select model from Networks -> Reference and it will be auto-downloaded on first use

  • Black Forest Labs FLUX.1
    FLUX.1 models are based on a hybrid architecture of multimodal and parallel diffusion transformer blocks, scaled to 12B parameters and builing on flow matching
    This is a very large model at ~32GB in size, its recommended to use a) offloading, b) quantization
    For more information on variations, requirements, options, and how to donwload and use FLUX.1, see Wiki
    SD.Next supports:
  • AuraFlow
    AuraFlow v0.3 is the fully open-sourced largest flow-based text-to-image generation model
    This is a very large model at 6.8B params and nearly 31GB in size, smaller variants are expected in the future
    Use scheduler: Default or Euler FlowMatch or Heun FlowMatch
  • AlphaVLLM Lumina-Next-SFT
    Lumina-Next-SFT is a Next-DiT model containing 2B parameters, enhanced through high-quality supervised fine-tuning (SFT)
    This model uses T5 XXL variation of text encoder (previous version of Lumina used Gemma 2B as text encoder)
    Use scheduler: Default or Euler FlowMatch or Heun FlowMatch
  • Kwai Kolors
    Kolors is a large-scale text-to-image generation model based on latent diffusion
    This is an SDXL style model that replaces standard CLiP-L and CLiP-G text encoders with a massive chatglm3-6b encoder supporting both English and Chinese prompting
  • HunyuanDiT 1.2
    Hunyuan-DiT is a powerful multi-resolution diffusion transformer (DiT) with fine-grained Chinese understanding
  • AnimateDiff
    support for additional models: SD 1.5 v3 (Sparse), SD Lightning (4-step), SDXL Beta

New Features…

  • support for Balanced Offload, thanks @Disty0!
    balanced offload will dynamically split and offload models from the GPU based on the max configured GPU and CPU memory size
    model parts that dont fit in the GPU will be dynamically sliced and offloaded to the CPU
    see Settings -> Diffusers Settings -> Max GPU memory and Max CPU memory
    note: recommended value for max GPU memory is ~80% of your total GPU memory
    note: balanced offload will force loading LoRA with Diffusers method
    note: balanced offload is not compatible with Optimum Quanto
  • support for Optimum Quanto with 8 bit and 4 bit quantization options, thanks @Disty0 and @Trojaner!
    to use, go to Settings -> Compute Settings and enable “Quantize Model weights with Optimum Quanto” option
    note: Optimum Quanto requires PyTorch 2.4
  • new prompt attention mode: xhinker which brings support for prompt attention to new models such as FLUX.1 and SD3
    to use, enable in Settings -> Execution -> Prompt attention
  • use PEFT for LoRA handling on all models other than SD15/SD21/SDXL
    this improves LoRA compatibility for SC, SD3, AuraFlow, Flux, etc.

Changes & Fixes…

  • default resolution bumped from 512x512 to 1024x1024, time to move on ;)
  • convert Dynamic Attention SDP into a global SDP option, thanks @Disty0!
    note: requires reset of selected attention option
  • update default CUDA version from 12.1 to 12.4
  • update requirements
  • samplers now prefers the model defaults over the diffusers defaults, thanks @Disty0!
  • improve xyz grid for lora handling and add lora strength option
  • don’t enable Dynamic Attention by default on platforms that support Flash Attention, thanks @Disty0!
  • convert offload options into a single choice list, thanks @Disty0!
    note: requires reset of selected offload option
  • control module allows reszing of indivudual process override images to match input image
    for example: set size->before->method:nearest, mode:fixed or mode:fill
  • control tab includes superset of txt and img scripts
  • automatically offload disabled controlnet units
  • prioritize specified backend if --use-* option is used, thanks @lshqqytiger
  • ipadapter option to auto-crop input images to faces to improve efficiency of face-transfter ipadapters
  • update IPEX to 2.1.40+xpu on Linux, thanks @Disty0!
  • general ROCm fixes, thanks @lshqqytiger!
  • support for HIP SDK 6.1 on ZLUDA backend, thanks @lshqqytiger!
  • fix full vae previews, thanks @Disty0!
  • fix default scheduler not being applied, thanks @Disty0!
  • fix Stable Cascade with custom schedulers, thanks @Disty0!
  • fix LoRA apply with force-diffusers
  • fix LoRA scales with force-diffusers
  • fix control API
  • fix VAE load refrerencing incorrect configuration
  • fix NVML gpu monitoring