2D Instance-Guided Pseudo-LiDAR Point Cloud for Monocular 3D Object Detection

Rui Gao, Junoh Kim, Kyungeun Cho

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Monocular three-dimensional (3D) scene understanding tasks, e.g., object size angle and 3D position, estimation are challenging to perform. More successful current methods usually require data from 3D sensors, which limits their performance if only monocular images are used because of the lack of distance information. In recent years, 3D object detection methods based on pseudo-LiDAR have effectively improved the accuracy of 3D prediction. However, most pseudo-LiDAR methods directly use LiDAR-based 3D detection networks, which also limits their performance. In this paper, a new monocular 3d object detection framework is proposed. By redesigning the representation of pseudo-LiDAR, a 2d detection mask channel is introduced as a guide layer to guide 3D object detection. A new 3D object detection network is designed for the newly designed multi-channel data. The pillar feature encoding module is optimized by using 2d detection mask and height histogram, so that it can more effectively convert point clouds into pillar features by using 2d detection mask and height characteristics of different objects. A new feature extraction network was designed using the transformer module to extract pillar features. The transformer head was optimized using three independent heads to categorize the position, properties and specific parameters of the object. An evaluation on the challenging KITTI dataset demonstrates that the proposed method significantly improves the performance of state-of-the-art monocular methods.

Original languageEnglish
Pages (from-to)187813-187827
Number of pages15
JournalIEEE Access
Volume12
DOIs
StatePublished - 2024

Keywords

  • Autonomous driving
  • deep learning
  • monocular 3D object detection

Fingerprint

Dive into the research topics of '2D Instance-Guided Pseudo-LiDAR Point Cloud for Monocular 3D Object Detection'. Together they form a unique fingerprint.

Cite this