Abstract
Video coding for machines (VCM) is an emerging approach in video compression designed to optimize content for machine analysis tasks. Although VCM was initially developed for machine vision, scalable coding frameworks have been developed to support both machine-driven analysis and human viewing as required. In this work, we focus on scenarios where high-quality encoding of regions of interest (ROIs) for machine vision and low-bitrate encoding of the background (BG) for human vision. At the decoder, severely degraded BG quality in reconstructed frames makes them unsuitable for viewing; therefore, restoring the degraded BGs by leveraging high-quality ROIs is essential. To this end, we propose the Gradient-Guided Diffusion Restoration (GGDR) algorithm, which integrates a pretrained generative diffusion model with content-aware supervision and adaptive refinement mechanisms to restore severely degraded regions robustly while maintaining visual consistency across the entire frame. The GGDR algorithm consists of two key components: (i) a content-aware supervision mechanism that preserves salient features and structural information in the input image, ensuring superior performance even with challenging high-variance inputs and (ii) a refinement block that guides the generation process of the pretrained diffusion model based on a degradation model and structural guidance. Experimental results demonstrate that the proposed algorithm outperforms state-of-the-art algorithms both qualitatively and quantitatively.
| Original language | English |
|---|---|
| Journal | IEEE Transactions on Circuits and Systems for Video Technology |
| DOIs | |
| State | Accepted/In press - 2025 |
Keywords
- Diffusion model
- image generation
- image restoration
- video coding for machines (VCM)