On-chip training and its noise immunity for hybrid optical-electronic optical convolutional neural networks
-
摘要:
光电混合的光学卷积神经网络(OCNN)通过结合光子元件的并行线性计算能力和电子元件的非线性处理优势,在分类任务中展现了巨大的潜力。然而,光子元件的制备误差即不精确性和执行后向传播的FPGA中的电路噪声对网络性能有显著影响。本文搭建了光电混合的OCNN,其中的线性计算由基于马赫-曾德尔干涉仪(MZI)的光学计算层完成,而池化计算及训练过程则在FPGA中完成。本文着重研究了在FPGA上的片上训练方案,分析了噪声对片上训练效果的影响,并提出了增强OCNN抗噪能力的网络优化策略。具体地,通过调整池化方式和尺寸以增强OCNN的抗噪性能,并在池化层后引入Dropout正则化以进一步提升模型的识别准确率。实验结果表明,本文采用的片上训练方案能够有效修正光子元件的不精确性带来的误差,但电路噪声是限制OCNN性能的主要因素。此外,当电路噪声较大时,其造成的MZI相位误差标准差为0.003,最大池化方式与Dropout正则化的结合可以显著提升OCNN的测试准确率(最高达78%)。本研究为实现OCNN的片上训练提供了重要的参考依据,同时为光电混合架构在高噪声环境下的实际应用探索提供了新的思路。
Abstract:The hybrid optical-electronic optical convolutional neural network (OCNN) combines the parallel linear computation capabilities of photonic devices with the nonlinear processing advantages of electronic components, demonstrating significant potential in classification tasks. However, the fabrication inaccuracies of photonic devices and the circuit noise in FPGA-based backpropagation notably degrade the network performance. In this work, the hybrid optical-electronic OCNN is constructed, where the linear computations are performed by optical computing layers based on Mach-Zehnder interferometers (MZIs), while the pooling operations and the training process are implemented on the FPGA. This study focuses on the feasibility of on-chip training on FPGA, analyzing the impact of noise on training performance and proposing the network optimization strategies to enhance the noise immunity of OCNN. Specifically, the noise immunity is improved by adjusting the pooling method and pooling size, and the Dropout regularization is introduced after the pooling layer to further enhance the model's recognition accuracy. Experimental results indicate that the proposed on-chip training scheme effectively mitigates errors caused by the fabrication inaccuracy in the photonic devices. However, the circuit noise remains the primary factor limiting the OCNN performance. Notably, under the high circuit noise conditions, e.g. when the standard deviation of MZI phase error caused by circuit noise reaches 0.003, the combination of maximum pooling and Dropout regularization significantly improves the recognition accuracy of OCNN, which achieves a maximum of 78%. This research provides valuable insights for implementing on-chip training in OCNNs and explores new approaches for deploying hybrid optical-electronic architectures in high-noise environments.
-
Key words:
- OCNN /
- on-chip training /
- Mach-Zehnder interferometer
-
表 1 OCNN的模型参数
Table 1. Model parameters of OCNN
功能层 滤波器尺寸 输入尺寸 输出尺寸 卷积层
(等宽填充)8×3×3 28×28×1 28×28×8 池化层 N×N 28×28×8 $ \dfrac{{28}}{N} \times \dfrac{{28}}{N} \times 8 $ 全连接层 $ \dfrac{{28}}{N} \times \dfrac{{28}}{N} \times 8 $ $ \dfrac{{28}}{N} \times \dfrac{{28}}{N} \times 8 $ 10 表 2 采用不同池化方式和不同尺寸的OCNN的识别准确率
Table 2. The recognition accuracies of OCNN with different sizes by different pooling methods and sizes
测试准确率(%) 池化后特征图尺寸 8×7×7 8×5×5 8×3×3 平均池化 39.36 68.52 63.84 最大池化 52.80 64.01 72.27 -
[1] ZHAO X, WANG L M, ZHANG Y F, et al. A review of convolutional neural networks in computer vision[J]. Artificial Intelligence Review, 2024, 57(4): 99. doi: 10.1007/s10462-024-10721-6 [2] SUN Y N, XUE B, ZHANG M J, et al. Evolving deep convolutional neural networks for image classification[J]. IEEE Transactions on Evolutionary Computation, 2020, 24(2): 394-407. doi: 10.1109/TEVC.2019.2916183 [3] WALDROP M M. More than moore[J]. Nature, 2016, 530(7589): 144-147. [4] WANG Y. Neural networks on chip: From CMOS accelerators to in-memory-computing[C]. Proceedings of 2018 31st IEEE International System-on-Chip Conference (SOCC), IEEE, 2018: 1-3. [5] FU T ZH, ZHANG J F, SUN R, et al. Optical neural networks: progress and challenges[J]. Light: Science & Applications, 2024, 13(1): 263. [6] FELDMANN J, YOUNGBLOOD N, WRIGHT C D, et al. All-optical spiking neurosynaptic networks with self-learning capabilities[J]. Nature, 2019, 569(7755): 208-214. doi: 10.1038/s41586-019-1157-8 [7] GU L X, ZHANG L F, NI R H, et al. Giant optical nonlinearity of Fermi polarons in atomically thin semiconductors[J]. Nature Photonics, 2024, 18(8): 816-822. doi: 10.1038/s41566-024-01434-x [8] ASHTIANI F, ON M B, SANCHEZ-JACOME D, et al. Photonic max-pooling for deep neural networks using a programmable photonic platform[C]. Proceedings of 2023 Optical Fiber Communications Conference and Exhibition (OFC), IEEE, 2023: 1-3. [9] SHAO X F, SU J Y, LU M H, et al. All-optical convolutional neural network with on-chip integrable optical average pooling for image classification[J]. Applied Optics, 2024, 63(23): 6263-6271. doi: 10.1364/AO.524502 [10] YU Y Z, CAO Y, WANG G, et al. Optical diffractive convolutional neural networks implemented in an all-optical way[J]. Sensors, 2023, 23(12): 5749. doi: 10.3390/s23125749 [11] XU SH F, WANG J, WANG R, et al. High-accuracy optical convolution unit architecture for convolutional neural networks by cascaded acousto-optical modulator arrays[J]. Optics Express, 2019, 27(14): 19778-19787. doi: 10.1364/OE.27.019778 [12] CHENG Y, ZHANG J N, ZHOU T K, et al. Photonic neuromorphic architecture for tens-of-task lifelong learning[J]. Light: Science & Applications, 2024, 13(1): 56. [13] QI J, WANG SH, LIU ZH, et al. A gradient-free training approach for optical neural networks based on stochastic functions[J]. Proceedings of SPIE, 2024, 13236: 132360R. [14] GU J, ZHU H, FENG C, et al. L2ight: enabling on-chip learning for optical neural networks via efficient in-situ subspace optimization[C]. Proceedings of the 35th International Conference on Neural Information Processing Systems, Curran Associates Inc. , 2021: 662. [15] FANG M Y S, MANIPATRUNI S, WIERZYNSKI C, et al. Design of optical neural networks with component imprecisions[J]. Optics Express, 2019, 27(10): 14009-14029. doi: 10.1364/OE.27.014009 [16] SHOKRANEH F, GEOFFROY-GAGNON S, LIBOIRON-LADOUCEUR O. The diamond mesh, a phase-error- and loss-tolerant field-programmable MZI-based optical processor for optical neural networks[J]. Optics Express, 2020, 28(16): 23495-23508. doi: 10.1364/OE.395441 [17] MOJAVER K H R, ZHAO B K, LEUNG E, et al. Addressing the programming challenges of practical interferometric mesh based optical processors[J]. Optics Express, 2023, 31(15): 23851-23866. doi: 10.1364/OE.489493 [18] TSAI Y H, HAMSICI O C, YANG M H. Adaptive region pooling for object detection[C]. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2015: 731-739. [19] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: a simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958. [20] CLEMENTS W R, HUMPHREYS P C, METCALF B J, et al. Optimal design for universal multiport interferometers[J]. Optica, 2016, 3(12): 1460-1465. doi: 10.1364/OPTICA.3.001460 [21] MITTAL S. A survey of FPGA-based accelerators for convolutional neural networks[J]. Neural Computing and Applications, 2020, 32(4): 1109-1139. doi: 10.1007/s00521-018-3761-1 [22] HAMERLY R, BANDYOPADHYAY S, ENGLUND D. Asymptotically fault-tolerant programmable photonics[J]. Nature Communications, 2022, 13(1): 6831. doi: 10.1038/s41467-022-34308-3 [23] SALEHIN I, KANG D K. A review on dropout regularization approaches for deep neural networks within the scholarly domain[J]. Electronics, 2023, 12(14): 3106. doi: 10.3390/electronics12143106 [24] GHOLAMALINEZHAD H, KHOSRAVI H. Pooling methods in deep neural networks, a review[J]. arxiv preprint arxiv:, 2009, 07485: 2020. [25] JU Y G. Scalable optical convolutional neural networks based on free-space optics using lens arrays and a spatial light modulator[J]. Journal of Imaging, 2023, 9(11): 241. doi: 10.3390/jimaging9110241 [26] HE K M, ZHANG X Y, REN SH Q, et al. Deep residual learning for image recognition[C]. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2016: 770-778. [27] PEARL N, TREIBITZ T, KORMAN S. NAN: Noise-aware NeRFs for burst-denoising[C]. Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2022: 12662-12671. [28] XIE J, LIU S Y, CHEN J X, et al. Huber loss based distributed robust learning algorithm for random vector functional-link network[J]. Artificial Intelligence Review, 2023, 56(8): 8197-8218. doi: 10.1007/s10462-022-10362-7 -