TY - JOUR
T1 - ESSN
T2 - Enhanced semantic segmentation network by residual concatenation of feature maps
AU - Kim, Dong Seop
AU - Arsalan, Muhammad
AU - Owais, Muhammad
AU - Park, Kang Ryoung
N1 - Publisher Copyright:
© 2020 Lippincott Williams and Wilkins. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Semantic segmentation performs pixel-level classification of multiple classes in the input image. Previous studies on semantic segmentation have used various methods such as multi-scale image, encoder-decoder, attention, spatial pyramid pooling, conditional random field, and generative models. However, the contexts of various sizes and types in diverse environments make their performance limited in robustly detecting and classifying objects. To address this problem, we propose an enhanced semantic segmentation network (ESSN) robust to various objects, contexts, and environments. The ESSN can extract multi-scale information well by concatenating the residual feature maps with various receptive fields extracted from sequential convolution blocks, and it can improve the performance of semantic segmentation without additional modules such as loss or attention during the training process.We performed the experiments with two open databases, the Stanford background dataset (SBD) and Cambridge-driving labeled video database (CamVid). Experimental results demonstrated the pixel acc. of 92.74%, class acc. of 79.66%, and mIoU of 71.67% with CamVid, and pixel acc. of 87.46%, class acc. of 81.51%, and mIoU of 71.56% with SBD, which are higher than those of the existing state-of-the-art methods. In addition, the average processing time were 31.12 ms and 92.46 ms on the desktop computer and Jetson TX2 embedded system, respectively, which confirmed that ESSN is applicable to both the desktop computer and Jetson TX2 embedded system which is widely used in autonomous vehicles.
AB - Semantic segmentation performs pixel-level classification of multiple classes in the input image. Previous studies on semantic segmentation have used various methods such as multi-scale image, encoder-decoder, attention, spatial pyramid pooling, conditional random field, and generative models. However, the contexts of various sizes and types in diverse environments make their performance limited in robustly detecting and classifying objects. To address this problem, we propose an enhanced semantic segmentation network (ESSN) robust to various objects, contexts, and environments. The ESSN can extract multi-scale information well by concatenating the residual feature maps with various receptive fields extracted from sequential convolution blocks, and it can improve the performance of semantic segmentation without additional modules such as loss or attention during the training process.We performed the experiments with two open databases, the Stanford background dataset (SBD) and Cambridge-driving labeled video database (CamVid). Experimental results demonstrated the pixel acc. of 92.74%, class acc. of 79.66%, and mIoU of 71.67% with CamVid, and pixel acc. of 87.46%, class acc. of 81.51%, and mIoU of 71.56% with SBD, which are higher than those of the existing state-of-the-art methods. In addition, the average processing time were 31.12 ms and 92.46 ms on the desktop computer and Jetson TX2 embedded system, respectively, which confirmed that ESSN is applicable to both the desktop computer and Jetson TX2 embedded system which is widely used in autonomous vehicles.
KW - Pixel-level classification
KW - Residual concatenation of feature maps
KW - Semantic segmentation
KW - Sequential convolution blocks
UR - http://www.scopus.com/inward/record.url?scp=85079331754&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2020.2969442
DO - 10.1109/ACCESS.2020.2969442
M3 - Article
AN - SCOPUS:85079331754
SN - 2169-3536
VL - 8
JO - IEEE Access
JF - IEEE Access
M1 - 21363
ER -