Category-Level Object Pose Estimation with Statistic Attention.

Changhong Jiang, Xiaoqiao Mu, Bingbing Zhang, Chao Liang, Mujun Xie
Author Information
  1. Changhong Jiang: School of Electrical and Electronic Engineering, Changchun University of Technology, Changchun 130012, China. ORCID
  2. Xiaoqiao Mu: School of Mechanical and Electrical Engineering, Changchun University of Technology, Changchun 130012, China.
  3. Bingbing Zhang: School of Computer Science and Engineering, Dalian Minzu University, Dalian 116602, China. ORCID
  4. Chao Liang: Collage of Computer Science and Engineering, Changchun University of Technology, Changchun 130012, China.
  5. Mujun Xie: School of Electrical and Electronic Engineering, Changchun University of Technology, Changchun 130012, China. ORCID

Abstract

Six-dimensional object pose estimation is a fundamental problem in the field of computer vision. Recently, category-level object pose estimation methods based on 3D-GC have made significant breakthroughs due to advancements in 3D-GC. However, current methods often fail to capture long-range dependencies, which are crucial for modeling complex and occluded object shapes. Additionally, discerning detailed differences between different objects is essential. Some existing methods utilize self-attention mechanisms or Transformer encoder-decoder structures to address the lack of long-range dependencies, but they only focus on first-order information of features, failing to explore more complex information and neglecting detailed differences between objects. In this paper, we propose SAPENet, which follows the 3D-GC architecture but replaces the 3D-GC in the encoder part with HS-layer to extract features and incorporates statistical attention to compute higher-order statistical information. Additionally, three sub-modules are designed for pose regression, point cloud reconstruction, and bounding box voting. The pose regression module also integrates statistical attention to leverage higher-order statistical information for modeling geometric relationships and aiding regression. Experiments demonstrate that our method achieves outstanding performance, attaining an mAP of 49.5 on the 5��2 cm metric, which is 3.4 higher than the baseline model. Our method achieves state-of-the-art (SOTA) performance on the REAL275 dataset.

Keywords

References

  1. IEEE Trans Image Process. 2022;31:6907-6921 [PMID: 36315551]

Grants

  1. 20230201111GX/Science and Technology Development Program Project of Jilin Province
  2. 20230201039GX/Science and Technology Development Program Project of Jilin Province

Word Cloud

Created with Highcharts 10.0.0pose3D-GCinformationstatisticalobjectestimationmethodslong-rangedependencieshigher-orderregressionmodelingcomplexAdditionallydetaileddifferencesobjectsfeaturesattentionmethodachievesperformanceSix-dimensionalfundamentalproblemfieldcomputervisionRecentlycategory-levelbasedmadesignificantbreakthroughsdueadvancementsHowevercurrentoftenfailcapturecrucialoccludedshapesdiscerningdifferentessentialexistingutilizeself-attentionmechanismsTransformerencoder-decoderstructuresaddresslackfocusfirst-orderfailingexploreneglectingpaperproposeSAPENetfollowsarchitecturereplacesencoderpartHS-layerextractincorporatescomputethreesub-modulesdesignedpointcloudreconstructionboundingboxvotingmodulealsointegratesleveragegeometricrelationshipsaidingExperimentsdemonstrateoutstandingattainingmAP4955��2cmmetric34higherbaselinemodelstate-of-the-artSOTAREAL275datasetCategory-LevelObjectPoseEstimationStatisticAttention

Similar Articles

Cited By

No available data.