MSSPA-GC: Multi-Scale Shape Prior Adaptation with 3D Graph Convolutions for Category-Level Object Pose Estimation.

Lu Zou, Zhangjin Huang, Naijie Gu, Guoping Wang

Author Information

Lu Zou: University of Science and Technology of China, Hefei, 230027, Anhui, China.
Zhangjin Huang: University of Science and Technology of China, Hefei, 230027, Anhui, China; Anhui Province Key Laboratory of Software in Computing and Communication, Hefei, 230027, Anhui, China; USTC-Deqing Alpha Innovation Research Institute, Huzhou, 313299, Zhejiang, China. Electronic address: zhuang@ustc.edu.cn.
Naijie Gu: University of Science and Technology of China, Hefei, 230027, Anhui, China; Anhui Province Key Laboratory of Software in Computing and Communication, Hefei, 230027, Anhui, China.
Guoping Wang: Peking University, Beijing, 100871, Beijing, China.

PMID: 37597505 DOI: 10.1016/j.neunet.2023.07.037

Category-level object pose estimation aims to predict the 6D object pose and size of arbitrary objects from known categories. It remains a challenge due to the large intra-class shape variation. Recently, the introduction of the shape prior adaptation mechanism into the normalized canonical coordinates (i.e., NOCS) reconstruction process has been shown to be effective in mitigating the intra-class shape variation. However, existing shape prior adaptation methods simply map the observed point cloud to the normalized object space, and the extracted object descriptors are not sufficient for the perception of the object pose. As a result, they fail to predict the pose of objects with complex geometric structures (e.g., cameras). To this end, this paper proposes a novel shape prior adaption method named MSSPA-GC for category-level object pose estimation. Specifically, our main network takes the observed instance point cloud converted from the RGB-D image and the prior shape point cloud pre-trained on the object CAD models as inputs. Then, a novel 3D graph convolution network and a PointNet-like MLP network are designed to extract pose-aware object features and shape-aware object features from these two inputs, respectively. After that, the two-stream object features are aggregated through a multi-scale feature propagation mechanism to generate comprehensive 3D object descriptors that maintain both pose-sensitive geometric stability and intra-class shape consistency. Finally, by leveraging object descriptors aware of both object pose and shape when reconstructing the NOCS coordinates, our approach elegantly achieves state-of-the-art performance on the widely used REAL275 and CAMERA25 datasets using only 25% of the parameters compared with existing shape prior adaptation models. Moreover, our method also exhibits decent generalization ability on the unconstrained REDWOOD75 dataset.

3D graph convolution network 3D object detection Object pose estimation Point cloud processing Shape recovery

Generalization, Psychological

Neural Networks, Computer

OpenLB
Open Library of Bioscience

Abstract

Keywords

MeSH Term

Word Cloud

Similar Articles

Cited By

Research & Resources

Featured

Alliance & Collaboration

Conference & Outreach

About

OpenLB Open Library of Bioscience