A Survey of Generative Models for Image and Video with Diffusion Model

Byoung Soo Koh, Hyeong Cheol Park, Jin Ho Park

Research output: Contribution to journalArticlepeer-review

Abstract

With recent advances in deep learning-based generative models, it is now possible to synthesize realistic data in a diverse domain. One notable method in the generative model is a diffusion-based generative model that generates realistic and high-quality images and videos. Diffusion-based generative model leverages a diffusion process to transform a Gaussian noise distribution into a complex, realistic data distribution. To illustrate the diffusion-based generative models, we give an overview diffusion probabilistic models and denoising diffusion probabilistic models. Especially, we review research that presents new methodologies for image, video, and multimedia contents generation, aiming to understand how those models efficiently learn complex data distribution using various techniques. In the meantime, using multimodal data for training generative models helps them learn more about various representations of complex data distribution, which enhances the generation of diverse images and videos. For the main contribution of this paper, we present several effective methods for synthesizing various types of data using diffusion models and multimodal data, along with their applications. In this context, we believe that presenting how diffusion models have expanded into multimedia generation along with the progression of technological advancements will provide knowledge and inspiration to many researchers.

Original languageEnglish
Article number69
JournalHuman-centric Computing and Information Sciences
Volume14
DOIs
StatePublished - 2024

Keywords

  • Deep Learning
  • Diffusion Model
  • Generative Model
  • Multimodal Learning

Fingerprint

Dive into the research topics of 'A Survey of Generative Models for Image and Video with Diffusion Model'. Together they form a unique fingerprint.

Cite this