Aerial scene understanding in the wild: Multi-scene recognition via prototype-based memory networks.

Advanced Search

Yuansheng Hua, Lichao Mou, Jianzhe Lin, Konrad Heidler, Xiao Xiang Zhu

Author Information

Yuansheng Hua: Remote Sensing Technology Institute (IMF), German Aerospace Center (DLR), Oberpfaffenhofen, 82234 Wessling, Germany.
Lichao Mou: Remote Sensing Technology Institute (IMF), German Aerospace Center (DLR), Oberpfaffenhofen, 82234 Wessling, Germany.
Jianzhe Lin: Electrical and Computer Engineering (ECE), University of British Columbia (UBC), V6T 1Z2, Canada.
Konrad Heidler: Remote Sensing Technology Institute (IMF), German Aerospace Center (DLR), Oberpfaffenhofen, 82234 Wessling, Germany.
Xiao Xiang Zhu: Remote Sensing Technology Institute (IMF), German Aerospace Center (DLR), Oberpfaffenhofen, 82234 Wessling, Germany.

PMID: 34219969 DOI: 10.1016/j.isprsjprs.2021.04.006

Aerial scene recognition is a fundamental visual task and has attracted an increasing research interest in the last few years. Most of current researches mainly deploy efforts to categorize an aerial image into one scene-level label, while in real-world scenarios, there often exist multiple scenes in a single image. Therefore, in this paper, we propose to take a step forward to a more practical and challenging task, namely multi-scene recognition in single images. Moreover, we note that manually yielding annotations for such a task is extraordinarily time- and labor-consuming. To address this, we propose a prototype-based memory network to recognize multiple scenes in a single image by leveraging massive well-annotated single-scene images. The proposed network consists of three key components: 1) a prototype learning module, 2) a prototype-inhabiting external memory, and 3) a multi-head attention-based memory retrieval module. To be more specific, we first learn the prototype representation of each aerial scene from single-scene aerial image datasets and store it in an external memory. Afterwards, a multi-head attention-based memory retrieval module is devised to retrieve scene prototypes relevant to query multi-scene images for final predictions. Notably, only a limited number of annotated multi-scene images are needed in the training phase. To facilitate the progress of aerial scene recognition, we produce a new multi-scene aerial image (MAI) dataset. Experimental results on variant dataset configurations demonstrate the effectiveness of our network. Our dataset and codes are publicly available.

Convolutional neural network (CNN) Memory network Multi-head attention-based memory retrieval Multi-scene aerial image dataset Multi-scene recognition in single images Prototype learning

IEEE Trans Image Process. 2021;30:1382-1394 [PMID: 33237858]
J Comp Psychol. 2008 May;122(2):132-45 [PMID: 18489229]
ISPRS J Photogramm Remote Sens. 2019 Mar;149:188-199 [PMID: 31007387]
IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):5729-5746 [PMID: 33909560]
ISPRS J Photogramm Remote Sens. 2019 Aug;154:151-162 [PMID: 31417230]
IEEE Trans Image Process. 2020 Mar 03;: [PMID: 32149687]
ISPRS J Photogramm Remote Sens. 2020 May;163:152-170 [PMID: 32377033]

Journal Article

OpenLB
Open Library of Bioscience