Title
Optical Remote Sensing Image Cloud Detection with Self-Attention and Spatial Pyramid Pooling Fusion
Abstract
Cloud detection is a key step in optical remote sensing image processing, and the cloud-free image is of great significance for land use classification, change detection, and long time-series landcover monitoring. Traditional cloud detection methods based on spectral and texture features have acquired certain effects in complex scenarios, such as cloud-snow mixing, but there is still a large room for improvement in terms of generation ability. In recent years, cloud detection with deep-learning methods has significantly improved the accuracy in complex regions such as high-brightness feature mixing areas. However, the existing deep learning-based cloud detection methods still have certain limitations. For instance, a few omission alarms and commission alarms still exist in cloud edge regions. At present, the cloud detection methods based on deep learning are gradually converted from a pure convolutional structure to a global feature extraction perspective, such as attention modules, but the computational burden is also increased, which is difficult to meet for the rapidly developing time-sensitive tasks, such as onboard real-time cloud detection in optical remote sensing imagery. To address the above problems, this manuscript proposes a high-precision cloud detection network fusing a self-attention module and spatial pyramidal pooling. Firstly, we use the DenseNet network as the backbone, then the deep semantic features are extracted by combining a global self-attention module and spatial pyramid pooling module. Secondly, to solve the problem of unbalanced training samples, we design a weighted cross-entropy loss function to optimize it. Finally, cloud detection accuracy is assessed. With the quantitative comparison experiments on different images, such as Landsat8, Landsat9, GF-2, and Beijing-2, the results indicate that, compared with the feature-based methods, the deep learning network can effectively distinguish in the cloud-snow confusion-prone region using only visible three-channel images, which significantly reduces the number of required image bands. Compared with other deep learning methods, the accuracy at the edge of the cloud region is higher and the overall computational efficiency is relatively optimal.
Year
DOI
Venue
2022
10.3390/rs14174312
REMOTE SENSING
Keywords
DocType
Volume
cloud detection, self-attention, pyramid pooling module, semantic segmentation, optical remote sensing image
Journal
14
Issue
ISSN
Citations 
17
2072-4292
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Weihua Pu100.34
Zhipan Wang200.34
Di Liu3375.69
Qingling Zhang41353100.93