TY - JOUR
T1 - LoRA Fusion
T2 - Enhancing Image Generation
AU - Choi, Dooho
AU - Im, Jeonghyeon
AU - Sung, Yunsick
N1 - Publisher Copyright:
© 2024 by the authors.
PY - 2024/11
Y1 - 2024/11
N2 - Recent advancements in low-rank adaptation (LoRA) have shown its effectiveness in fine-tuning diffusion models for generating images tailored to new downstream tasks. Research on integrating multiple LoRA modules to accommodate new tasks has also gained traction. One emerging approach constructs several LoRA modules, but more than three typically decrease the generation performance of pre-trained models. The mixture-of-experts model solves the performance issue, but LoRA modules are not combined using text prompts; hence, generating images by combining LoRA modules does not dynamically reflect the user’s desired requirements. This paper proposes a LoRA fusion method that applies an attention mechanism to effectively capture the user’s text-prompting intent. This method computes the cosine similarity between predefined keys and queries and uses the weighted sum of the corresponding values to generate task-specific LoRA modules without the need for retraining. This method ensures stability when merging multiple LoRA modules and performs comparably to fully retrained LoRA models. The technique offers a more efficient and scalable solution for domain adaptation in large language models, effectively maintaining stability and performance as it adapts to new tasks. In the experiments, the proposed method outperformed existing methods in text–image alignment and image similarity. Specifically, the proposed method achieved a text–image alignment score of 0.744, surpassing an SVDiff score of 0.724, and a normalized linear arithmetic composition score of 0.698. Moreover, the proposed method generates superior semantically accurate and visually coherent images.
AB - Recent advancements in low-rank adaptation (LoRA) have shown its effectiveness in fine-tuning diffusion models for generating images tailored to new downstream tasks. Research on integrating multiple LoRA modules to accommodate new tasks has also gained traction. One emerging approach constructs several LoRA modules, but more than three typically decrease the generation performance of pre-trained models. The mixture-of-experts model solves the performance issue, but LoRA modules are not combined using text prompts; hence, generating images by combining LoRA modules does not dynamically reflect the user’s desired requirements. This paper proposes a LoRA fusion method that applies an attention mechanism to effectively capture the user’s text-prompting intent. This method computes the cosine similarity between predefined keys and queries and uses the weighted sum of the corresponding values to generate task-specific LoRA modules without the need for retraining. This method ensures stability when merging multiple LoRA modules and performs comparably to fully retrained LoRA models. The technique offers a more efficient and scalable solution for domain adaptation in large language models, effectively maintaining stability and performance as it adapts to new tasks. In the experiments, the proposed method outperformed existing methods in text–image alignment and image similarity. Specifically, the proposed method achieved a text–image alignment score of 0.744, surpassing an SVDiff score of 0.724, and a normalized linear arithmetic composition score of 0.698. Moreover, the proposed method generates superior semantically accurate and visually coherent images.
KW - image generation
KW - low-rank adaptation (LoRA)
KW - merging LoRA modules
UR - http://www.scopus.com/inward/record.url?scp=85210432687&partnerID=8YFLogxK
U2 - 10.3390/math12223474
DO - 10.3390/math12223474
M3 - Article
AN - SCOPUS:85210432687
SN - 2227-7390
VL - 12
JO - Mathematics
JF - Mathematics
IS - 22
M1 - 3474
ER -