留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于轻型自限制注意力的结构光相位及深度估计混合网络

朱新军 赵浩淼 王红一 宋丽梅 孙瑞群

朱新军, 赵浩淼, 王红一, 宋丽梅, 孙瑞群. 基于轻型自限制注意力的结构光相位及深度估计混合网络[J]. 中国光学(中英文), 2024, 17(1): 118-127. doi: 10.37188/CO.2023-0066
引用本文: 朱新军, 赵浩淼, 王红一, 宋丽梅, 孙瑞群. 基于轻型自限制注意力的结构光相位及深度估计混合网络[J]. 中国光学(中英文), 2024, 17(1): 118-127. doi: 10.37188/CO.2023-0066
ZHU Xin-jun, ZHAO Hao-miao, WANG Hong-yi, SONG Li-mei, SUN Rui-qun. A hybrid network based on light self-limited attention for structured light phase and depth estimation[J]. Chinese Optics, 2024, 17(1): 118-127. doi: 10.37188/CO.2023-0066
Citation: ZHU Xin-jun, ZHAO Hao-miao, WANG Hong-yi, SONG Li-mei, SUN Rui-qun. A hybrid network based on light self-limited attention for structured light phase and depth estimation[J]. Chinese Optics, 2024, 17(1): 118-127. doi: 10.37188/CO.2023-0066

基于轻型自限制注意力的结构光相位及深度估计混合网络

基金项目: 国家自然科学基金 (No. 61905178);天津市教委科研计划项目 (No. 2019KJ021)
详细信息
    作者简介:

    朱新军(1985—),男,山东临沂人,博士,副教授,硕士生导师,2008年于临沂师范学院获得学士学位, 2011年于山东理工大学获得硕士学位,2015年于天津大学获得博士学位,主要从事光学三维测量与智能计算成像的研究。E-mail:xinjunzhu@tiangong.edu.cn

  • 中图分类号: TP394.1;TH691.9

A hybrid network based on light self-limited attention for structured light phase and depth estimation

Funds: Supported by National Natural Science Foundation of China (No. 61905178); Science & Technology Development Fund of Tianjin Education Commission for Higher Education (No. 2019KJ021)
More Information
    Corresponding author: xinjunzhu@tiangong.edu.cn
  • 摘要:

    相位提取与深度估计是结构光三维测量中的重点环节,目前传统方法在结构光相位提取与深度估计方面存在效率不高、结果不够鲁棒等问题。为了提高深度学习结构光的重建效果,本文提出了一种基于轻型自限制注意力(Light Self-Limited-Attention,LSLA)的结构光相位及深度估计混合网络,即构建一种CNN-Transformer的混合模块,并将构建的混合模块放入U型架构中,实现CNN与Transformer的优势互补。将所提出的网络在结构光相位估计和结构光深度估计两个任务上进行实验,并和其他网络进行对比。实验结果表明:相比其他网络,本文所提出的网络在相位估计和深度估计的细节处理上更加精细,在结构光相位估计实验中,精度最高提升31%;在结构光深度估计实验中,精度最高提升26%。该方法提高了深度神经网络在结构光相位估计及深度估计的准确性。

     

  • 图 1  FPP系统原理图

    Figure 1.  Schematic diagram of the FPP system

    图 2  网络结构图

    Figure 2.  Network structure diagram

    图 3  CNN-Transformer模块结构图

    Figure 3.  Structure of the CNN-Transformer module

    图 4  部分数据示例图。第一行为仿真数据,第二行为真实数据。(a)仿真条纹图;(b)仿真条纹图D;(c)仿真条纹图M;(d)仿真条纹图包裹相位;(e)真实条纹图;(f)真实条纹图D;(g)真实条纹图M;(h)真实条纹图包裹相位

    Figure 4.  Sample maps in some datasets. The first lines are simulation data, the second lines are real data. (a) Simulation fringe map; (b) simulation fringe map D; (c) simulation fringe map M; (d) simulation fringe wrapped phase; (e) real fringe map; (f) real fringe map D; (g) real fringe map M; (h) real fringe wrapped phase

    图 5  不同网络仿真和真实数据包裹相位对比。蓝色框为仿真数据,橙色框为真实数据。(a)UNet;(b)DPH;(c)R2UNet;(d)SUNet;(e)Ours;(f)标签

    Figure 5.  Comparison of different network simulation and real data wrapped phases. The blue boxes are the simulation data, and the orange boxes are the real data. (a) UNet; (b) DPH; (c) R2UNet; (d) SUNet; (e) Ours; (f) Label

    图 6  包裹相位结果曲线图。(a)仿真数据结果比较;(b)真实数据结果比较

    Figure 6.  Wrapped phase curves.(a) Comparison of simulation data; (b) comparison of real data

    图 7  生成数据集流程图。(a) 模型导入;(b) 调整大小;(c) 投影条纹

    Figure 7.  Flowchart of dataset generation. (a) Model import; (b) adjust of the model size; (c) projection fringe

    图 8  部分数据示例图。(a)仿真条纹图;(b)真实条纹图;(c)仿真深度图;(d)真实深度图

    Figure 8.  Sample maps in the dataset. (a) Simulated fringe map; (b) real fringe map; (c) simulation depth map; (d) real depth map

    图 9  不同方法深度估计视觉结果比较。蓝色框为仿真数据,橙色框为真实数据。(a) 输入数据; (b) UNet;(c) DPH;(d) R2UNet;(e) Ours;(f)标签

    Figure 9.  Comparison of the visual results of depth estimation by different methods. The blue boxes are the simulation data, and the orange boxes are the real data. (a) Input data; (b) UNet; (c) DPH; (d) R2UNet; (e) Ours; (f) Label

    表  1  不同包裹相位计算方法比较

    Table  1.   Comparison of the different wrapped phase calculation methods

    MSE 时间t/s
    直接预测包裹相位 0.2833 5.89
    分别预测DM 0.1739 11.7
    同时预测 DM 0.16806 7.54
    下载: 导出CSV

    表  2  包裹相位预测方法比较

    Table  2.   Comparison of the wrapped phase prediction methods

    仿真数据 真实数据
    MSE 时间t/s MSE 时间t/s
    UNet 0.02658 6.67 0.16806 7.54
    DPH 0.02710 11.65 0.12974 11.78
    R2UNet 0.02734 13.69 0.12905 14.30
    SUNet 0.02717 7.95 0.14350 8.29
    Ours 0.02395 11.06 0.11622 11.67
    下载: 导出CSV

    表  3  消融实验结果比较

    Table  3.   Comparison of ablation experiment results

    MSE 时间t/s
    CMT 11.32 6.89
    CMT替换LSLA 9.17 6.45
    CMT替换FFN 11.34 5.54
    CMT+U形结构 8.94 9.68
    下载: 导出CSV

    表  4  不同方法深度估计结果比较

    Table  4.   Comparison of the depth estimation results by different methods

    仿真数据真实数据
    MSE时间t/sMSE时间t/s
    Unet8.785.989.976.44
    DPH8.038.669.8610.59
    R2UNet7.578.738.7210.92
    Ours6.438.097.648.44
    下载: 导出CSV
  • [1] 左超, 张晓磊, 胡岩, 等. 3D真的来了吗?—三维结构光传感器漫谈[J]. 红外与激光工程,2020,49(3):0303001. doi: 10.3788/IRLA202049.0303001

    ZUO CH, ZHANG X L, HU Y, et al. Has 3D finally come of age?——An introduction to 3D structured-light sensor[J]. Infrared and Laser Engineering, 2020, 49(3): 0303001. (in Chinese) doi: 10.3788/IRLA202049.0303001
    [2] 王永红, 张倩, 胡寅, 等. 显微条纹投影小视场三维表面成像技术综述[J]. 中国光学,2021,14(3):447-457. doi: 10.37188/CO.2020-0199

    WANG Y H, ZHANG Q, HU Y, et al. 3D small-field surface imaging based on microscopic fringe projection profilometry: a review[J]. Chinese Optics, 2021, 14(3): 447-457. (in Chinese) doi: 10.37188/CO.2020-0199
    [3] 冯世杰, 左超, 尹维, 等. 深度学习技术在条纹投影三维成像中的应用[J]. 红外与激光工程,2020,49(3):0303018. doi: 10.3788/IRLA202049.0303018

    FENG SH J, ZUO CH, YIN W, et al. Application of deep learning technology to fringe projection 3D imaging[J]. Infrared and Laser Engineering, 2020, 49(3): 0303018. (in Chinese) doi: 10.3788/IRLA202049.0303018
    [4] SU X Y, CHEN W J. Fourier transform profilometry: a review[J]. Optics and Lasers in Engineering, 2001, 35(5): 263-284. doi: 10.1016/S0143-8166(01)00023-9
    [5] ZHENG D L, DA F P, KEMAO Q, et al. Phase-shifting profilometry combined with Gray-code patterns projection: unwrapping error removal by an adaptive median filter[J]. Optics Express, 2017, 25(5): 4700-4713. doi: 10.1364/OE.25.004700
    [6] AN Y T, HYUN J S, ZHANG S. Pixel-wise absolute phase unwrapping using geometric constraints of structured light system[J]. Optics Express, 2016, 24(16): 18445-18459. doi: 10.1364/OE.24.018445
    [7] GHIGLIA D C, ROMERO L A. Robust two-dimensional weighted and unweighted phase unwrapping that uses fast transforms and iterative methods[J]. Journal of the Optical Society of America A, 1994, 11(1): 107-117. doi: 10.1364/JOSAA.11.000107
    [8] FENG SH J, CHEN Q, GU G H, et al. Fringe pattern analysis using deep learning[J]. Advanced Photonics, 2019, 1(2): 025001.
    [9] NGUYEN H, WANG Y Z, WANG ZH Y. Single-shot 3D shape reconstruction using structured light and deep convolutional neural networks[J]. Sensors, 2020, 20(13): 3718. doi: 10.3390/s20133718
    [10] VAN D J S, DIRCKX J J J. Deep neural networks for single shot structured light profilometry[J]. Optics Express, 2019, 27(12): 17091-17101. doi: 10.1364/OE.27.017091
    [11] 张钊, 韩博文, 于浩天, 等. 多阶段深度学习单帧条纹投影三维测量方法[J]. 红外与激光工程,2020,49(6):20200023. doi: 10.3788/irla.12_2020-0023

    ZHANG ZH, HAN B W, YU H T, et al. Multi-stage deep learning based single-frame fringe projection 3D measurement method[J]. Infrared and Laser Engineering, 2020, 49(6): 20200023. (in Chinese) doi: 10.3788/irla.12_2020-0023
    [12] RANFTL R, BOCHKOVSKIY A, KOLTUN V. Vision transformers for dense prediction[C]. Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, 2021.
    [13] YANG G L, TANG H, DING M L, et al. Transformer-based attention networks for continuous pixel-wise prediction[C]. Proceedings of 2021 IEEE/CVF International Conference on Computer Vision, IEEE, 2021.
    [14] QI F, ZHAI J Z, DANG G H. Building height estimation using Google Earth[J]. Energy and Buildings, 2016, 118: 123-132. doi: 10.1016/j.enbuild.2016.02.044
    [15] ZHU X J, HAN ZH Q, YUAN M K, et al. Hformer: hybrid CNN-transformer for fringe order prediction in phase unwrapping of fringe projection[J]. Optical Engineering, 2022, 61(9): 093107.
    [16] GENG J. Structured-light 3D surface imaging: a tutorial[J]. Advances in Optics and Photonics, 2011, 3(2): 128-160. doi: 10.1364/AOP.3.000128
    [17] GUO J Y, HAN K, WU H, et al. CMT: convolutional neural networks meet vision transformers[C]. Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2022.
    [18] CHEN ZH ZH, HANG W, ZHAO Y X. ViT-LSLA: Vision Transformer with Light Self-Limited-Attention[J]. arXiv:2210.17115.
    [19] RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]. Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2015.
    [20] WANG L, LU D Q, QIU R W, et al. 3D reconstruction from structured-light profilometry with dual-path hybrid network[J]. EURASIP Journal on Advances in Signal Processing, 2022, 2022(1): 14. doi: 10.1186/s13634-022-00848-5
    [21] 袁梦凯, 朱新军, 侯林鹏. 基于R2U-Net的单帧投影条纹图深度估计[J]. 激光与光电子学进展,2022,59(16):1610001.

    YUAN M K, ZHU X J, HOU L P. Depth estimation from single-frame fringe projection patterns based on R2U-Net[J]. Laser & Optoelectronics Progress, 2022, 59(16): 1610001. (in Chinese)
    [22] FAN CH M, LIU T J, LIU K H. SUNet: swin transformer UNet for image denoising[C]. Proceedings of 2022 IEEE International Symposium on Circuits and Systems, IEEE, 2022.
    [23] ZHU X J, ZHANG ZH ZH, HOU L P, et al. Light field structured light projection data generation with Blender[C]. Proceedings of 2022 3rd International Conference on Computer Vision, Image and Deep Learning & International Conference on Computer Engineering and Applications, IEEE, 2022.
  • 加载中
图(9) / 表(4)
计量
  • 文章访问数:  507
  • HTML全文浏览量:  201
  • PDF下载量:  200
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-04-14
  • 修回日期:  2023-05-15
  • 网络出版日期:  2023-09-18

目录

    /

    返回文章
    返回