Object Detection-Based Video Retargeting with Spatialoral Consistency

Seung Joon Lee, Siyeong Lee, Sung In Cho, Suk Ju Kang

Research output: Contribution to journalArticlepeer-review

24 Scopus citations

Abstract

This study proposes a video retargeting method using deep neural network-based object detection. First, the meaningful regions of the input video denoted by bounding boxes of the object detection are extracted. In this case, the area is defined considering the size and number of bounding boxes for objects detected. The bounding boxes of each frame image are considered as regions of interest (RoIs). Second, the Siamese object tracking network is used to address high computational complexity of the object detection network. By dividing the video into scenes, object detection is performed for the first frame image of each scene to obtain the first bounding box. Object tracking is performed for the next sequential frame image until a scene change is detected. Third, the image is resized in the horizontal direction to alter the aspect ratio of the image and obtain the 1D RoIs of the image by projecting bounding boxes in the vertical direction. Then, the proposed method computes the grid map from the 1D RoIs to calculate new coordinates of each column data of the image. Finally, the retargeted video is obtained by rearranging all retargeted frame images. Comparative experiments conducted with various benchmark methods show an average bidirectional similarity score of 1.92, which is higher than other conventional methods. The proposed method was stable and satisfied viewers without causing cognitive discomfort as conventional methods.

Original languageEnglish
Article number9043574
Pages (from-to)4434-4439
Number of pages6
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume30
Issue number12
DOIs
StatePublished - Dec 2020

Keywords

  • convolutional neural network
  • Object detection
  • object tracking
  • video retargeting

Fingerprint

Dive into the research topics of 'Object Detection-Based Video Retargeting with Spatialoral Consistency'. Together they form a unique fingerprint.

Cite this