面向光学遥感目标的全局上下文检测模型设计

张瑞琰; 姜秀杰; 安军社; 崔天舒

doi:10.37188/CO.2020-0057

面向光学遥感目标的全局上下文检测模型设计

doi: 10.37188/CO.2020-0057

cstr: 32171.14.CO.2020-0057

张瑞琰^{1, 2,},
姜秀杰^1, ,,
安军社¹,
崔天舒^{1, 2}

1.
中国科学院国家空间科学中心复杂航天系统电子信息技术重点实验室，北京 100190
2.
中国科学院大学，北京 100049

基金项目: 中国科学院复杂航天系统电子信息技术重点实验室自主部署基金（No. Y42613A32S）

详细信息

作者简介:
张瑞琰（1995—），女，河南商丘人，博士研究生，2016年于南开大学获得理学学士学位，2018年于中国科学院大学国家空间科学中心转为硕博连读，主要从事空间数据处理、遥感目标检测网络优化及压缩的相关研究。E-mail：zhangruiyan16@mails.ucas.ac.cn

姜秀杰（1965—），女，黑龙江鹤岗人，博士，研究员，博士生导师，1988年于北京航空航天大学获得工学学士学位，1991年于中国科学院大学获得工学硕士学位，2007年于清华大学获得工学博士学位，主要从事火箭探空技术研究、电场探测技术研究和空间综合电子技术研究。E-mail：jiangxj@nssc.ac.cn

中图分类号: TP391.4
计量
- 文章访问数: 2449
- HTML全文浏览量: 649
- PDF下载量: 120
- 被引次数: 0
出版历程
- 收稿日期: 2020-04-07
- 修回日期: 2020-05-11
- 网络出版日期: 2020-10-22
- 刊出日期: 2020-12-01

Design of global-contextual detection model for optical remote sensing targets

ZHANG Rui-yan^{1, 2
,},
JIANG Xiu-jie^{1
, ,},
AN Jun-she¹,
CUI Tian-shu^{1, 2}

1.
Key Laboratory of Electronics and Information Technology for Space Systems, National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China
2.
University of Chinese Academy of Sciences, Beijing 100049, China

Funds: Supported by Laboratory Fund of Key Laboratory of Electronics and Information Technology for Space Systems, CAS (No. Y42613A32S)

More Information

Corresponding author: jiangxj@nssc.ac.cn

摘要

摘要: 在复杂背景下的光学遥感图像目标检测中，为了提高检测精度，同时降低检测网络复杂度，提出了面向光学遥感目标的全局上下文检测模型。首先，采用结构简单的特征编码-特征解码网络进行特征提取。其次，为提高对多尺度目标的定位能力，采取全局上下文特征与目标中心点局部特征相结合的方式生成高分辨率热点图，并利用全局特征实现目标的预分类。最后，提出不同尺度的定位损失函数，用于增强模型的回归能力。实验结果表明: 当使用主干网络Root-ResNet18时，本文模型在公开遥感数据集NWPU VHR-10上的检测精度可达97.6%AP⁵⁰和83.4%AP⁷⁵，检测速度达16 PFS，基本满足设计需求，实现了网络速度和精度的有效平衡，便于后续算法在移动设备端的移植和应用。
- 计算机视觉 /
- 目标检测 /
- 遥感图像 /
- 特征编码-特征解码 /
- 全局上下文特征
Abstract: To improve the detection accuracy and reduce the complexity of optical remote sensing of target images with a complex background, a global context detection model based on optical remote sensing of targets is proposed. First, a feature encoder-feature decoder network is used for feature extraction. Then, to improve the positioning ability of multi-scale targets, a method that combines global-contextual features and target center local features is used to generate high-resolution heat maps. The global features are used to achieve the pre-classification of targets. Finally, a positioning loss function at different scales is proposed to enhance the regression ability of the model. Experimental results show that the mean average precision of the proposed model reaches 97.6% AP⁵⁰ and 83.4% AP⁷⁵ on the NWPU VHR-10 public remote sensing data set, and the speed reaches 16 PFS. This design can achieve an effective balance between accuracy and speed. It facilitates subsequent porting and application of the algorithm on the mobile device side, which meets design requirements.
- computer vision /
- object detection /
- remote sensing image /
- feature encoder-feature decoder /
- global-contextual feature

HTML全文

图 1 特征编码-特征解码网络架构

Figure 1. Framework of the feature encoder-feature decoder network

下载: 全尺寸图片幻灯片

图 2 全局上下文检测模型总体架构

Figure 2. Overall framework of the global-contextual detection model

下载: 全尺寸图片幻灯片

图 3 普通卷积采样和变形卷积采样示意图

Figure 3. Sampling diagrams in standard convolution and deformation convolution

下载: 全尺寸图片幻灯片

图 4 全局上下文特征提取流程

Figure 4. Flow chart of global-contextual feature extraction

下载: 全尺寸图片幻灯片

图 5 （a）添加目标框的原图及（b）高斯椭圆掩模示意图

Figure 5. (a) Original image with a target box and (b) schematic diagram of gaussian elliptical mask

下载: 全尺寸图片幻灯片

图 6 不同γ值对结果的影响

Figure 6. Effects of different γ values on results

下载: 全尺寸图片幻灯片

图 7 不同ν值对结果的影响

Figure 7. Effects of different ν values on results

下载: 全尺寸图片幻灯片

图 8 GCDN的可视化检测效果图

Figure 8. Visual detection results of the GCDN

下载: 全尺寸图片幻灯片

表 1 ResNet18与Root-ResNet18结构

Table 1. Structures of ResNet18 and Root-ResNet18

阶段	输出尺寸	ResNet18	Root-ResNet18
C1	128×128	7×7, 64	3×3, 64
			3×3, 64
			3×3, 64
			3×3, MaxPool
			$\left[ {\begin{array}{*{20}{c}} {3 \times 3,64} \\ {3 \times 3,64} \end{array}} \right]\times 2$
C2	64×64	3×3, MaxPool	$\left[ {\begin{array}{*{20}{c}} {3 \times 3,128} \\ {3 \times 3,128} \end{array}} \right]\times 2$
C2	64×64	$\left[ {\begin{array}{*{20}{c}} {3 \times 3,64} \\ {3 \times 3,64} \end{array}} \right]\times 2$
C3	32×32	$\left[ {\begin{array}{*{20}{c}} {3 \times 3,128} \\ {3 \times 3,128} \end{array}} \right]\times 2$	$\left[ {\begin{array}{*{20}{c}} {3 \times 3,256} \\ {3 \times 3,256} \end{array}} \right]\times 2$
C4	16×16	$\left[ {\begin{array}{*{20}{c}} {3 \times 3,256} \\ {3 \times 3,256} \end{array}} \right]\times 2$	$\left[ {\begin{array}{*{20}{c}} {3 \times 3,512} \\ {3 \times 3,512} \end{array}} \right]\times 2$
C5	8×8	$\left[ {\begin{array}{*{20}{c}} {3 \times 3,512} \\ {3 \times 3,512} \end{array}} \right]\times 2$	—

下载: 导出CSV

表 2 数据集NWPU VHR-10目标尺寸统计表

Table 2. Statistics of target sizes in the NWPU VHR-10 dataset

尺度(pixel)	0−10	10−40	40−100	100−300	300−500	500以上
宽	0	0.1327	0.6948	0.1599	0.0126	0
高	0	0.1448	0.7205	0.1242	0.0105	0

下载: 导出CSV

表 3 不同模型在数据集NWPU VHR-10上的平均精确度对比

Table 3. Comparison of mean average precisions of different models in the NWPU VHR-10 dataset

模型	主干网络	飞机	船舰	油罐	棒球场	网球场	篮球场	田径场	港口	桥梁	车辆	平均精确度(mAP)
RICAOD	AlexNet	0.9970	0.9080	0.9061	0.9291	0.9029	0.8031	0.9081	0.8029	0.6853	0.8714	0.8712
SSD	VGG16	0.9839	0.8993	0.8918	0.9851	0.8791	0.8481	0.9949	0.7730	0.7828	0.8739	0.8912
YOLOv3	Darknet53	0.9091	0.9091	0.9081	0.9913	0.9086	0.9091	0.9947	0.9005	0.9091	0.9035	0.9243
MMDFN	VGG16	0.9934	0.9227	0.9918	0.9668	0.9632	0.9756	1.0000	0.9740	0.8027	0.9136	0.9504
MSDN	ResNet50	0.9976	0.9721	0.8383	0.9909	0.9734	0.9991	0.9868	0.9719	0.9267	0.9010	0.9558
MSCNN	ResNet50	0.9940	0.9530	0.9180	0.9630	0.9540	0.9670	0.9930	0.9550	0.9720	0.9330	0.9600
GCDN	Root-ResNet18	0.9991	0.9983	0.9677	0.9916	0.9991	0.9759	0.9988	0.9412	0.9224	0.9636	0.9757

下载: 导出CSV

表 4 检测阈值0.75下的不同模型平均精确度对比

Table 4. Comparison of mean-average precision of different models under the 0.75 detection threshold

模型	平均准确率 (AP)			平均准确率(AP)($I_{ {\rm{{ou} } } }$=0.50:0.95)				平均召回率(AR)($I_{ {\rm{{ou} } } }$=0.50:0.95)
模型	$I_{ {\rm{{ou} } } }$=0.50:0.95	$I_{ {\rm{{ou} } } }$=0.50	$I_{ {\rm{{ou} } } }$=0.75		小目标	中目标	大目标		小目标	中目标	大目标
CenterNet	0.663	0.968	0.768		0.576	0.669	0.704		0.630	0.725	0.741
MSCNN	0.706	0.960	0.824		0.547	0.578	0.701		0.600	0.605	0.700
GCDN-woGC	0.679	0.973	0.775		0.572	0.690	0.750		0.639	0.744	0.786
GCDN	0.705	0.976	0.834		0.612	0.727	0.706		0.683	0.773	0.740

下载: 导出CSV

表 5 不同模型的平均检测时间对比

Table 5. Comparison of the average detection times with different models

模型	输入尺寸	时间/s
RICAOD	400×400	2.89
MMDFN	400×400	0.75
YOLOv3	640×640	0.13
MSDN	600×800	0.21
GCDN	640×640	0.06

下载: 导出CSV

表 6 不同模型在数据集DOTA上的平均精确度对比

Table 6. Comparison of the mean-average precisions with different models in the DOTA dataset

模型	主干网络	平均检测精度(mAP)
YOLOv2	GoogleNet	39.20
R-FCN	ResNet101	52.58
Faster-RCNN	ResNet101	60.46
SCFPN-scf	ResNet101	75.22
GCDN	Root-ResNet18	75.95

下载: 导出CSV

参考文献(30)

[1]	许夙晖, 慕晓冬, 柯冰, 等. 基于遥感影像的军事阵地动态监测技术研究[J]. 遥感技术与应用,2014,29(3):511-516. XU S H, MU X D, KE B, et al. Dynamic monitoring of military position based on remote sensing image[J]. Remote Sensing Technology and Application, 2014, 29(3): 511-516. (in Chinese)
[2]	VALERO S, CHANUSSOT J, BENEDIKTSSON J A, et al. Advanced directional mathematical morphology for the detection of the road network in very high resolution remote sensing images[J]. Pattern Recognition Letters, 2010, 31(10): 1120-1127. doi: 10.1016/j.patrec.2009.12.018
[3]	DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]. Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, 2005: 886-893.
[4]	LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110. doi: 10.1023/B:VISI.0000029664.99615.94
[5]	LIU W, ANGUELOV D, ERHAN D, et al.. SSD: single shot multibox detector[C]. Proceedings of the 14th European Conference on Computer Vision, Springer, 2016: 21-37.
[6]	REDMON J, FARHADI A. Yolov3: an incremental improvement[J]. arXiv: 1804.02767, 2018.
[7]	马永杰, 宋晓凤. 基于YOLO和嵌入式系统的车流量检测[J]. 液晶与显示,2019,34(6):613-618. doi: 10.3788/YJYXS20193406.0613 MA Y J, SONG X F. Vehicle flow detection based on YOLO and embedded system[J]. Chinese Journal of Liquid Crystals and Displays, 2019, 34(6): 613-618. (in Chinese) doi: 10.3788/YJYXS20193406.0613
[8]	LIN T Y, GOYAL P, GIRSHICK R, et al.. Focal loss for dense object detection[C]. Proceedings of 2017 IEEE International Conference on Computer Vision, IEEE, 2017: 2999-3007.
[9]	LAW H, DENG J. Cornernet: detecting objects as paired keypoints[C]. Proceedings of the 15th European Conference on Computer Vision, Springer, 2018: 765-781.
[10]	ZHOU X Y, WANG D Q, KRÄHENBÜHL P. Objects as points[J]. arXiv: 1904.07850, 2019.
[11]	XIAO B, WU H P, WEI Y CH. Simple baselines for human pose estimation and tracking[C]. Proceedings of the 15th European Conference on Computer Vision, Springer, 2018: 472-487.
[12]	LI K, CHENG G, BU SH H, et al. Rotation-insensitive and context-augmented object detection in remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(4): 2337-2348. doi: 10.1109/TGRS.2017.2778300
[13]	MA W P, GUO Q Q, WU Y, et al. A novel multi-model decision fusion network for object detection in remote sensing images[J]. Remote Sensing, 2019, 11(7): 737. doi: 10.3390/rs11070737
[14]	梁华, 宋玉龙, 钱锋, 等. 基于深度学习的航空对地小目标检测[J]. 液晶与显示,2018,33(9):793-800. doi: 10.3788/YJYXS20183309.0793 LIANG H, SONG Y L, QIAN F, et al. Detection of small target in aerial photography based on deep learning[J]. Chinese Journal of Liquid Crystals and Displays, 2018, 33(9): 793-800. (in Chinese) doi: 10.3788/YJYXS20183309.0793
[15]	姚群力, 胡显, 雷宏. 基于多尺度卷积神经网络的遥感目标检测研究[J]. 光学学报,2019,39(11):1128002. doi: 10.3788/AOS201939.1128002 YAO Q L, HU X, LEI H. Object detection in remote sensing images using multiscale convolutional neural networks[J]. Acta Optica Sinica, 2019, 39(11): 1128002. (in Chinese) doi: 10.3788/AOS201939.1128002
[16]	邓志鹏, 孙浩, 雷琳, 等. 基于多尺度形变特征卷积网络的高分辨率遥感影像目标检测[J]. 测绘学报,2018,47(9):1216-1227. DENG ZH P, SUN H, LEI L, et al. Object detection in remote sensing imagery with multi-scale deformable convolutional networks[J]. Acta Geodaetica et Cartographica Sinica, 2018, 47(9): 1216-1227. (in Chinese)
[17]	董潇潇, 何小海, 吴晓红, 等. 基于注意力掩模融合的目标检测算法[J]. 液晶与显示,2019,34(8):825-833. doi: 10.3788/YJYXS20193408.0825 DONG X X, HE X H, WU X H, et al. Object detection algorithm based on attention mask fusion[J]. Chinese Journal of Liquid Crystals and Displays, 2019, 34(8): 825-833. (in Chinese) doi: 10.3788/YJYXS20193408.0825
[18]	WANG CH, BAI X, WANG SH, et al. Multiscale visual attention networks for object detection in VHR remote sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2019, 16(2): 310-314. doi: 10.1109/LGRS.2018.2872355
[19]	左俊皓, 赵聪, 朱晓龙, 等. Faster-RCNN和Level-Set结合的高分遥感影像建筑物提取[J]. 液晶与显示,2019,34(4):439-447. doi: 10.3788/YJYXS20193404.0439 ZUO J H, ZHAO C, ZHU X L, et al. High-resolution remote sensing image building extraction combined with Faster-RCNN and Level-Set[J]. Chinese Journal of Liquid Crystals and Displays, 2019, 34(4): 439-447. (in Chinese) doi: 10.3788/YJYXS20193404.0439
[20]	于渊博, 张涛, 郭立红, 等. 卫星视频运动目标检测算法[J]. 液晶与显示,2017,32(2):138-143. doi: 10.3788/YJYXS20173202.0138 YU Y B, ZHANG T, GUO L H, et al. Moving objects detection on satellite video[J]. Chinese Journal of Liquid Crystals and Displays, 2017, 32(2): 138-143. (in Chinese) doi: 10.3788/YJYXS20173202.0138
[21]	LIU W, RABINOVICH A, BERG A C. Parsenet: looking wider to see better[J]. arXiv: 1506.04579, 2015.
[22]	ZHANG H, DANA K, SHI J P, et al.. Context encoding for semantic segmentation[C]. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2018: 7151-7160.
[23]	HE K M, ZHANG X Y, REN SH Q, et al.. Deep residual learning for image recognition[C]. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2016: 770-778.
[24]	ZHU R, ZHANG SH F, WANG X B, et al.. ScratchDet: training single-shot object detectors from scratch[C]. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2019: 2263-2272.
[25]	LIN M, CHEN Q, YAN SH CH. Network in network[J]. arXiv: 1312.4400, 2013.
[26]	CHENG G, ZHOU P CH, HAN J W. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(12): 7405-7415. doi: 10.1109/TGRS.2016.2601622
[27]	XIA G S, BAI X, DING J, et al.. DOTA: a large-scale dataset for object detection in aerial images[C]. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2018: 3974-3983.
[28]	ZEILER M D, FERGUS R. Visualizing and understanding convolutional networks[C]. Proceedings of the 13th European Conference on Computer Vision, Springer, 2014: 818-833.
[29]	CHEN K, WANG J Q, PANG J M, et al.. MMDetection: open MMLab detection toolbox and benchmark[J]. arXiv: 1906.07155v1, 2019.
[30]	CHEN CH Y, GONG W G, CHEN Y L, et al. Object detection in remote sensing images based on a scene-contextual feature pyramid network[J]. Remote Sensing, 2019, 11(3): 339. doi: 10.3390/rs11030339