Summary of Latent Consistency Models A Technical Report

Summary Latent Consistency Models A Technical Report arxiv.org

3,152 words - PDF document - View PDF document

One Line

The use of Latent Consistency Models (LCMs) incorporating LoRA distillation in Stable-Diffusion models improves image generation and memory consumption, enabling fast inference and outperforming previous solvers in generating style-specific images.

Slides

Slide Presentation (6 slides)

Copy slides outline Copy embed code Download as Word

Latent Consistency Models: Accelerating Text-to-Image Generation

Source: arxiv.org - PDF - 3,152 words - view

Introduction

• Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks.

• LCMs are distilled from pre-trained latent diffusion models (LDMs) with only 32 A100 GPU training hours.

• LCM-LoRA is a universal Stable-Diffusion acceleration module for fast inference.

LCM-LoRA Distillation

• LCM-LoRA incorporates LoRA distillation to reduce memory consumption and train larger models.

• LoRA parameters obtained through LCM distillation serve as a universal Stable-Diffusion acceleration module.

• LCM-LoRA can be directly plugged into various Stable-Diffusion fine-tuned models or LoRAs without training.

Superior Image Generation

• LCM-LoRA enables fast inference with minimal steps, outperforming previous solvers in generating style-specific images.

• Combining LCM-LoRA with specific style LoRA parameters produces high-quality images at different sampling steps.

• LCM-LoRA represents a novel class of neural network-based PF-ODE solvers with robust generalization capabilities.

Customization and Fast Inference

• LCM-LoRA allows customization by combining LCM-LoRA parameters with other LoRA parameters fine-tuned on specific style datasets.

• Customized LCM-LoRA models can generate images in specific styles with minimal sampling steps.

• LCM-LoRA offers training-free inference for quick deployment on diverse datasets.

Key Takeaways

• Latent Consistency Models (LCMs) offer accelerated text-to-image generation with high-quality results.

• LCM-LoRA is a universal Stable-Diffusion acceleration module that enhances inference speed.

• By combining LCM-LoRA with specific style LoRA parameters, customized image generation is achieved.

• LCM-LoRA represents a plug-in neural PF-ODE solver with strong generalization abilities.

Key Points

Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks.
LCMs are distilled from pre-trained latent diffusion models (LDMs) and require only 32 A100 GPU training hours.
LCM-LoRA is a universal Stable-Diffusion acceleration module that can be directly plugged into various models for fast inference.
LCM-LoRA combines LoRA parameters obtained through LCM distillation with other LoRA parameters obtained by fine-tuning on a specific style dataset.
LCM-LoRA represents a plug-in neural PF-ODE solver with strong generalization abilities.

Summaries

29 word summary

Latent Consistency Models (LCMs) incorporate LoRA distillation into Stable-Diffusion models for improved image generation and memory consumption. LCM-LoRA enables fast inference and outperforms previous solvers in generating style-specific images.

52 word summary

Latent Consistency Models (LCMs) utilize latent diffusion models (LDMs) to accelerate text-to-image generative tasks. The authors enhance LCMs by incorporating LoRA distillation into Stable-Diffusion models, resulting in improved image generation quality and reduced memory consumption. LCM-LoRA, a universal Stable-Diffusion acceleration module, enables fast inference and outperforms previous solvers in generating style-specific images.

131 word summary

Latent Consistency Models (LCMs) have been successful in accelerating text-to-image generative tasks by distilling pre-trained latent diffusion models (LDMs). The authors extend the capabilities of LCMs by applying LoRA distillation to Stable-Diffusion models, enabling the use of larger models with reduced memory consumption and improved image generation quality. They also introduce LCM-LoRA, a universal Stable-Diffusion acceleration module, which can be integrated into various models without additional training. LCMs treat the reverse diffusion process as an augmented probability flow ODE problem, resulting in high-quality image synthesis with minimal inference steps. LCM-LoRA reduces memory requirements and enables fast inference with minimal steps on fine-tuned models. Extensive experiments demonstrate the effectiveness of LCM-LoRA in generating images in specific styles with minimal sampling steps. It outperforms previous numerical PF-ODE solvers and shows strong generalization capabilities.

373 word summary

Latent Consistency Models (LCMs) have been successful in accelerating text-to-image generative tasks by distilling pre-trained latent diffusion models (LDMs). In this technical report, the authors extend the capabilities of LCMs in two ways. First, they apply LoRA distillation to Stable-Diffusion models, allowing for the use of larger models with reduced memory consumption and improved image generation quality. Second, they identify the LoRA parameters obtained through LCM distillation as a universal Stable-Diffusion acceleration module, named LCM-LoRA. LCM-LoRA can be directly integrated into various Stable-Diffusion fine-tuned models or LoRAs without additional training, making it a universally applicable accelerator for image generation tasks.

Previous efforts to accelerate LDMs have focused on advanced ODE-Solvers and distillation methods. However, these approaches still have limitations in terms of computational overhead or intensive computational requirements. LCMs address these issues by treating the reverse diffusion process as an augmented probability flow ODE problem and predicting the solution in the latent space, resulting in high-quality image synthesis with minimal inference steps. LCMs also achieve efficient distillation, requiring only 32 A100 training hours for minimal-step inference.

To further enhance LCMs, the authors introduce LCM-LoRA as a universal training-free acceleration module. By incorporating parameter-efficient fine-tuning techniques such as LoRA, LCM-LoRA reduces memory requirements and allows for fast inference with minimal steps on various fine-tuned Stable-Diffusion models or LoRAs. The combination of LCM-LoRA parameters with other fine-tuned LoRA parameters results in a model that can generate images in specific styles with minimal sampling steps.

The authors demonstrate the effectiveness of LCM-LoRA through extensive experiments on text-to-image generation. They compare the quality of images generated using LCM-LoRA distilled from different pretrained diffusion models and show that LCM-LoRA performs well across various models. They also show the generation results of combining LCM-LoRA parameters with specific style LoRA parameters, highlighting the ability to generate images in specific styles with minimal sampling steps.

In conclusion, LCM-LoRA is a universal training-free acceleration module that can be directly integrated into various Stable-Diffusion models or LoRAs for fast inference with minimal steps. It demonstrates strong generalization capabilities and superior performance compared to previous numerical PF-ODE solvers. The authors acknowledge the contributions of the leading authors and core contributors to the development of LCM-LoRA and express their gratitude to the LCM community members.

Raw indexed text (20,716 chars / 3,152 words / 341 lines)

Technical Report

LCM-L O RA: A U NIVERSAL S TABLE -D IFFUSION

A CCELERATION M ODULE

Simian Luo ∗,1 Yiqin Tan ∗ ,1 Suraj Patil †,2 Daniel Gu † Patrick von Platen 2

Apolinário Passos 2 Longbo Huang 1 Jian Li 1 Hang Zhao 1

IIIS, Tsinghua University 2 Hugging Face

{luosm22, tyq22}@mails.tsinghua.edu.cn

{suraj, patrick, apolinario}@huggingface.co

{dgu8957}@gmail.com

{longbohuang, lijian83, hangzhao}@tsinghua.edu.cn

A BSTRACT

Latent Consistency Models (LCMs) (Luo et al., 2023) have achieved impres-

sive performance in accelerating text-to-image generative tasks, producing high-

quality images with minimal inference steps. LCMs are distilled from pre-

trained latent diffusion models (LDMs), requiring only ∼32 A100 GPU train-

ing hours. This report further extends LCMs’ potential in two aspects: First,

by applying LoRA distillation to Stable-Diffusion models including SD-V1.5

(Rombach et al., 2022), SSD-1B (Segmind., 2023), and SDXL (Podell et al.,

2023), we have expanded LCM’s scope to larger models with significantly less

memory consumption, achieving superior image generation quality. Second, we

identify the LoRA parameters obtained through LCM distillation as a universal

Stable-Diffusion acceleration module, named LCM-LoRA. LCM-LoRA can be

directly plugged into various Stable-Diffusion fine-tuned models or LoRAs with-

out training, thus representing a universally applicable accelerator for diverse

image generation tasks. Compared with previous numerical PF-ODE solvers such

as DDIM (Song et al., 2020), DPM-Solver (Lu et al., 2022a;b), LCM-LoRA

can be viewed as a plug-in neural PF-ODE solver that possesses strong gen-

eralization abilities. Project page: https://github.com/luosiallen/

latent-consistency-model.

Style:

Fast:

Customized LCM 𝜽 $ !"#

Customized LDM 𝜽′

Style vector

𝝉 $ = 𝜽 $ − 𝜽 𝐛𝐚𝐬(

(Style-LoRA)

Base LDM 𝜽 𝐛𝐚𝐬𝐞

Style:

Fast:

Combination

𝝉 $ 𝐋𝐂𝐌 = 𝝀 𝟏 𝝉 $ + 𝝀 𝟐 𝝉 𝐋𝐂𝐌

(LCM-LoRA)

Acceleration vector

𝝉 𝐋𝐂𝐌 = 𝜽 𝐋𝐂𝐌 − 𝜽 𝐛𝐚𝐬(

LCM 𝜽 !"#

Style:

Fast:

Figure 1: Overview of LCM-LoRA. By introducing LoRA into the distillation process of LCM, we signif-

icantly reduce the memory overhead of distillation, which allows us to train larger models, e.g., SDXL and

SSD-1B, with limited resources. More importantly, LoRA parameters obtained through LCM-LoRA training

(‘acceleration vector’) can be directly combined with other LoRA parameters (‘style vetcor’) obtained by fine-

tuning on a particular style dataset. Without any training, the model obtained by a linear combination of the

acceleration vector and style vetcor acquires the ability to generate images of a specific painting style in mini-

mal sampling steps.

∗

†

Leading Authors

Core Contributors

1Technical Report

I NTRODUCTION

Latent Diffusion Models (LDMs) (Rombach et al., 2022) have been pivotal in generating highly

detailed and creative imagery from various inputs such as text and sketches. Despite their success,

the slow reverse sampling process inherent to LDMs hampers real-time application, compromising

the user experience. Current open-source models and acceleration techniques have yet to bridge the

gap to real-time generation on standard consumer GPUs. Efforts to accelerate LDMs generally fall

into two categories: the first involves advanced ODE-Solvers, like DDIM (Song et al., 2020), DPM-

Solver (Lu et al., 2022a) and DPM-Solver++ (Lu et al., 2022b), to expedite the generation process.

The second strategy involves distillation of LDMs to streamline their functioning. The ODE-Solver

methods, despite reducing the number of inference steps needed, still demand a significant compu-

tational overhead, especially when incorporating classifier-free guidance (Ho & Salimans, 2022).

Meanwhile, distillation methods such as Guided-Distill (Meng et al., 2023), although promising,

face practical limitations due to their intensive computational requirements. The quest for a balance

between speed and quality in LDM-generated imagery continues to be a challenge in the field.

Recently, Latent Consistency Models (LCMs) (Luo et al., 2023) have emerged, inspired by Consis-

tency Models (CMs) (Song et al., 2023), as a solution to the slow sampling issue in image genera-

tion. LCMs approach the reverse diffusion process by treating it as an augmented probability flow

ODE (PF-ODE) problem. They innovatively predict the solution in the latent space, bypassing the

need for iterative solutions through numerical ODE-Solvers. This results in a remarkably efficient

synthesis of high-resolution images, taking only 1 to 4 inference steps. Additionally, LCMs stand

out in terms of distillation efficiency, requiring merely 32 A100 training hours for a minimal-step

inference.

Building on this, Latent Consistency Finetuning (LCF) (Luo et al., 2023) has been developed as

a method to fine-tune pre-trained LCMs without starting from the teacher diffusion model. For

specialized datasets—like those for anime, photo-realistic, or fantasy images—additional steps are

necessary, such as employing Latent Consistency Distillation (LCD) (Luo et al., 2023) to distill

a pre-trained LDM into an LCM or directly fine-tuning an LCM using LCF. However, this extra

training can be a barrier to the quick deployment of LCMs across diverse datasets, posing the critical

question of whether fast, training-free inference on custom datasets is attainable.

To answer the above question, we introduce LCM-LoRA, a universal training-free acceleration

module that can be directly plugged into various Stable-Diffusion (SD) (Rombach et al., 2022) fine-

tuned models or SD LoRAs (Hu et al., 2021) to support fast inference with minimal steps. Compared

to earlier numerical probability flow ODE (PF-ODE) solvers such as DDIM (Song et al., 2020),

DPM-Solver (Lu et al., 2022a), and DPM-Solver++ (Lu et al., 2022b), LCM-LoRA represents a

novel class of neural network-based PF-ODE solvers module. It demonstrates robust generalization

capabilities across various fine-tuned SD models and LoRAs.

R ELATED W ORK

Consistency Models Song et al. (2023) have showcased the remarkable potential of consistency

models (CMs), a novel class of generative models that enhance sampling efficiency without sacri-

ficing the quality of the output. These models employ a consistency mapping technique that deftly

maps points along the Ordinary Differential Equation (ODE) trajectory to their origins, thus en-

abling expeditious one-step generation. Their research specifically targets image generation tasks

on ImageNet 64x64 (Deng et al., 2009) and LSUN 256x256 (Yu et al., 2015), demonstrating CMs’

effectiveness in these domains. Further advancing the field, Luo et al. (2023) has pioneered latent

consistency models (LCMs) within the text-to-image synthesis landscape. By viewing the guided

reverse diffusion process as the resolution of an augmented Probability Flow ODE (PF-ODE), LCMs

adeptly predict the solution of such ODEs in latent space. This innovative approach significantly re-

duces the need for iterative steps, thereby enabling the rapid generation of high-fidelity images from

text inputs and setting a new standard for state-of-the-art performance on LAION-5B-Aesthetics

dataset (Schuhmann et al., 2022).

Parameter-Efficient Fine-Tuning Parameter-Efficient Fine-Tuning (PEFT) (Houlsby et al., 2019)

enables the customization of pre-existing models for particular tasks while limiting the number of

2Technical Report

parameters that need retraining. This reduces both computational load and storage demands. Among

the assorted techniques under the PEFT umbrella, Low-Rank Adaptation (LoRA) (Hu et al., 2021)

stands out. LoRA’s strategy involves training a minimal set of parameters through the integration of

low-rank matrices, which succinctly represent the required adjustments in the model’s weights for

fine-tuning. In practice, this means that during task-specific optimization, only these matrices are

learned and the bulk of pre-trained weights are left unchanged. Consequently, LoRA significantly

trims the volume of parameters to be modified, thereby enhancing computational efficiency and

permitting model refinement with considerably less data.

Task Arithmetic in Pretrained Models Task arithmetic (Ilharco et al., 2022; Ortiz-Jimenez et al.,

2023; Zhang et al., 2023) has become a notable method for enhancing the abilities of pre-trained

models, offering a cost-effective and scalable strategy for direct edits in weight space. By apply-

ing fine-tuned weights of different tasks to a model, researchers can improve its performance on

these tasks or induce forgetting by negating them. Despite its promise, the understanding of task

arithmetic’s full potential and the principles that underlie it remain areas of active exploration.

LCM-L O RA

3.1

L O RA D ISTILLATION FOR LCM

The Latent Consistency Model (LCM) (Luo et al., 2023) is trained using a one-stage guided dis-

tillation method, leveraging a pre-trained auto-encoder’s latent space to distill a guided diffusion

model into an LCM. This process involves solving an augmented Probability Flow ODE (PF-ODE),

a mathematical formulation that ensures the generated samples follow a trajectory that results in

high-quality images. The distillation focuses on maintaining the fidelity of these trajectories while

significantly reducing the number of required sampling steps. The method includes innovations

like the Skipping-Steps technique to quicken convergence. The pseudo-code of LCD is provided in

Algorithm 1.

Algorithm 1 Latent Consistency Distillation (LCD) (Luo et al., 2023)

Input: dataset D, initial model parameter θ, learning rate η, ODE solver Ψ(·, ·, ·, ·), distance metric d(·, ·),

EMA rate µ, noise schedule α(t), σ(t), guidance scale [w min , w max ], skipping interval k, and encoder E(·)

Encoding training data into latent space: D z = {(z, c)|z = E(x), (x, c) ∈ D}

θ − ← θ

repeat

Sample (z, c) ∼ D z , n ∼ U [1, N − k] and ω ∼ [ω min , ω max ]

Sample z t n+k ∼ N (α(t n+k )z; σ 2 (t n+k )I)

ẑ t Ψ,ω

← z t n+k + (1 + ω)Ψ(z t n+k , t n+k , t n , c) − ωΨ(z t n+k , t n+k , t n , ∅)

L(θ, θ − ; Ψ) ← d(f θ (z t n+k , ω, c, t n+k ), f θ − ( ẑ t Ψ,ω

, ω, c, t n ))

θ ← θ − η∇ θ L(θ, θ − )

θ − ← stopgrad(µθ − + (1 − µ)θ)

until convergence

Since the distillation process of Latent Consistency Models (LCM) is carried out on top of the

parameters from a pre-trained diffusion model, we can consider latent consistency distillation as

a fine-tuning process for the diffusion model. This allows us to employ parameter-efficient fine-

tuning methods, such as LoRA (Low-Rank Adaptation) (Hu et al., 2021). LoRA updates a pre-

trained weight matrix by applying a low-rank decomposition. Given a weight matrix W 0 ∈ R d×k ,

the update is expressed as W 0 + ∆W = W 0 + BA, where B ∈ R d×r , A ∈ R r×k , and the rank

r ≤ min(d, k). During training, W 0 is kept constant, and gradient updates are applied only to A and

B. The modified forward pass for an input x is:

h = W 0 x + ∆W x = W 0 x + BAx.

(1)

In this equation, h represents the output vector, and the outputs of W 0 and ∆W = BA are added

together after being multiplied by the input x. By decomposing the full parameter matrix into the

product of two low-rank matrices, LoRA significantly reduces the number of trainable parameters,

thereby lowering memory usage. Table 3.1 compares the total number of parameters in the full

3Technical Report

4-Step Inference

LCM-LoRA-

SD-V1.5

LCM-LoRA-

SDXL

LCM-LoRA-

SSD-1B

Figure 2: Images generated using latent consistency models distilled from different pretrained diffusion mod-

els. We generate 512×512 resolution images with LCM-LoRA-SD-V1.5 and 1024×1024 resolution images

with LCM-LoRA-SDXL and LCM-LoRA-SSD-1B. We use a fixed classifier-free guidance scale ω = 7.5 for

all models during the distillation process. All images were obtained by 4-step sampling .

model with the trainable parameters when using the LoRA technique. It is evident that by incorpo-

rating the LoRA technique during the LCM distillation process, the quantity of trainable parameters

is significantly reduced, effectively decreasing the memory requirements for training.

Model

# Full Parameters

# LoRA Trainable Parameters

SD-V1.5 SSD-1B SDXL

0.98B

67.5M 1.3B

105M 3.5B

197M

Table 1: Full parameter number and trainable parameter number with LoRA for SD-V1.5 (Rombach

et al., 2022), SSD-1B (Segmind., 2023) and SDXL (Podell et al., 2023).

Luo et al. (2023) primarily distilled the base stable diffusion model, such as SD-V1.5 and SD-V2.1.

We extended this distillation process to more powerful models with enhanced text-to-image capa-

bilities and larger parameter counts, including SDXL (Podell et al., 2023) and SSD-1B (Segmind.,

2023). Our experiments demonstrate that the LCD paradigm adapts well to larger models. The

generated results of different models are displayed in Figure 2.

3.2

LCM-L O RA AS U NIVERSAL A CCELERATIION M ODULE

Based on parameter-efficient fine-tuning techniques, such as LoRA, one can fine-tune pretrained

models with substantially reduced memory requirements. Within the framework of LoRA, the re-

sultant LoRA parameters can be seamlessly integrated into the original model parameters. In Sec-

tion 3.1, we demonstrate the feasibility of employing LoRA for the distillation process of Latent

Consistency Models (LCMs). On the other hand, one can fine-tune on customized datasets for spe-

cific task-oriented applications. There is now a broad array of fine-tuning parameters available for

selection and utilization. We discover that the LCM-LoRA parameters can be directly combined

with other LoRA parameters fine-tuned on datasets of particular styles. Such an amalgamation

yields a model capable of generating images in specific styles with minimal sampling steps, without

4Technical Report

2-Step

4-Step

8-Step

16-Step

32-Step

PaperCut LoRA

[Prompt-1]

PaperCut LoRA

+ LCM LoRA

[Prompt-1]

PaperCut LoRA

[Prompt-2]

PaperCut LoRA

+ LCM LoRA

[Prompt-2]

Figure 3: The generation results of the specific style LoRA parameters and the combination with LCM-

LoRA parameters. We use SDXL as the base model. All images are 1024×1024 resolution. We select LoRA

parameters fine-tuned on specific painting style datasets and combine them with LCM-LoRA parameters. We

compare the quality of images generated by these models at different sampling steps. For the original LoRA

parameters, we use DPM-Solver++ (Lu et al., 2022b) sampler and classifier-free guidance scale ω = 7.5.

For the parameters obtained after combining LCM-LoRA with specific style LoRA, we use LCM’s multi-step

sampler. We use λ 1 = 0.8 and λ 2 = 1.0 for the combination.

the need for any further training. As shown in Figure 1, denote the LCM-LoRA fine-tuned param-

eters as τ LCM , which is identified as the “acceleration vector”, and the LoRA parameters fine-tuned

on customized dataset as τ ′ , which is the “style vector”, we find that an LCM which generates

customized images can be obtained as

′

θ LCM

= θ pre + τ LCM

(2)

where

′

τ LCM

= λ 1 τ ′ + λ 2 τ LCM

(3)

is the linear combination of acceleration vector τ LCM and style vector τ ′ . Here λ 1 and λ 2 are hy-

perparameters. The generation results of the specific style LoRA parameters and their combination

with LCM-LoRA parameters are shown in Figure 3. Note that we do not make further training on

the combined parameters.

C ONCLUSION

We present LCM-LoRA, a universal training-free acceleration module for Stable-Diffusion (SD).

LCM-LoRA can serve as an independent and efficient neural network-based solver module to pre-

dict the solution of PF-ODE, enabling fast inference with minimal steps on various finetuned SD

models and SD LoRAs. Extensive experiments on text-to-image generation have demonstrated

LCM-LoRA’s strong generalization capabilities and superiority.

C ONTRIBUTION & A CKNOWLEDGEMENT

This work builds upon Simian Luo and Yiqin Tan’s Latent Consistency Models (LCMs) (Luo et al.,

2023). Based on LCMs, Simian Luo wrote the original LCM-SDXL distillation code, and together

5Technical Report

with Yiqin Tan, primarily completed this technical report. Yiqin Tan discovered the arithmetic prop-

erty of LCM parameters. Suraj Patil first completed the training of LCM-LoRA, discovering its

strong generalization abilities, and conducted most of the training. Suraj Patil and Daniel Gu con-

ducted excellent refactoring of the original LCM-SDXL codebase and improved training efficiency,

seamlessly integrating it into the Diffusers library. Patrick von Platen revised and polished this tech-

nical report, as well as integrating LCM into the Diffusers library. Longbo Huang, Jian Li, Hang

Zhao co-advised the original LCMs paper, and polished this technical report. We further thanks

Apolinário Passos and Patrick von Platen for making excellent LCMs demo and deployment. We

also want to thank Sayak Paul and Pedro Cuenca for helping with writing documentation as well as

Radamés Ajna for creating demos. We appreciate the computing resources provided by the Hugging

Face Diffusers teams to support our experiments. Finally, we value the insightful discussions from

LCM community members.

R EFERENCES

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hi-

erarchical image database. In 2009 IEEE conference on computer vision and pattern recognition,

pp. 248–255. Ieee, 2009.

Jonathan Ho and Tim Salimans.

arXiv:2207.12598, 2022.

Classifier-free diffusion guidance.

arXiv preprint

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, An-

drea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for nlp.

In International Conference on Machine Learning, pp. 2790–2799. PMLR, 2019.

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang,

and Weizhu Chen. Lora: Low-rank adaptation of large language models. arXiv preprint

arXiv:2106.09685, 2021.

Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt,

Hannaneh Hajishirzi, and Ali Farhadi. Editing models with task arithmetic. arXiv preprint

arXiv:2212.04089, 2022.

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A

fast ode solver for diffusion probabilistic model sampling in around 10 steps. arXiv preprint

arXiv:2206.00927, 2022a.

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver++: Fast

solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095,

2022b.

Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. Latent consistency models: Synthe-

sizing high-resolution images with few-step inference. arXiv preprint arXiv:2310.04378, 2023.

Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik Kingma, Stefano Ermon, Jonathan Ho, and

Tim Salimans. On distillation of guided diffusion models. In Proceedings of the IEEE/CVF

Conference on Computer Vision and Pattern Recognition, pp. 14297–14306, 2023.

Guillermo Ortiz-Jimenez, Alessandro Favero, and Pascal Frossard. Task arithmetic in the tangent

space: Improved editing of pre-trained models. arXiv preprint arXiv:2305.12827, 2023.

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe

Penna, and Robin Rombach. Sdxl: improving latent diffusion models for high-resolution image

synthesis. arXiv preprint arXiv:2307.01952, 2023.

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-

resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Con-

ference on Computer Vision and Pattern Recognition, pp. 10684–10695, 2022.

6Technical Report

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi

Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. Laion-5b:

An open large-scale dataset for training next generation image-text models. arXiv preprint

arXiv:2210.08402, 2022.

Segmind.

Announcing

ssd-1b:

A leap

https://blog.segmind.com/introducing-segmind-ssd-1b/, 2023.

efficient

t2i

generation.

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv

preprint arXiv:2010.02502, 2020.

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. arXiv preprint

arXiv:2303.01469, 2023.

Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. Lsun:

Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv

preprint arXiv:1506.03365, 2015.

Jinghan Zhang, Shiqi Chen, Junteng Liu, and Junxian He. Composing parameter-efficient modules

with arithmetic operations. arXiv preprint arXiv:2306.14870, 2023.