| 307 | 7 | 351 |
| 下载次数 | 被引频次 | 阅读次数 |
图像语义分割是计算机视觉领域中的一项重要技术,在自动驾驶、医学影像分析、智能家居和安防监控等领域都有广泛的应用。近年,利用深度学习模型进行图像语义分割的方法得到了广泛关注和研究。然而,深度学习模型很容易出现过拟合问题,并且面对一些存在遮挡、噪声的图像时容易预测出错,从而导致模型分割精度下降。针对这个问题,提出了一种联合注意力机制的U2-Net图像语义分割优化方法,在以VGG为主干网络的U2-Net模型中,增加CBAM注意力模块,使网络模型能够更加关注与分割任务相关的区域,忽略掉一些无关或噪声干扰的区域,增强特征图的表征,进而能够有效地提高模型的性能和泛化能力。实验结果表明,在增加CBAM模块后,U2-Net模型的MIoU及准确率分别提高了8.21%和4%。
Abstract:Image semantic segmentation is an important technology in the field of computer vision, and has a wide range of applications in the fields of automatic driving, medical image analysis, smart home and security monitoring.In recent years, the method of image semantic segmentation using deep learning model has been widely concerned and researched. However, deep learning models are prone to overfitting problems and are prone to prediction errors when facing images with occlusion and noise, resulting in a decrease in model segmentation accuracy. To address this problem, a U2-Net image semantic segmentation optimization method based on joint attention mechanism is proposed.In the U2-Net model with VGG as the backbone network, the CBAM attention module is added to make the network model pay more attention to the regions related to the segmentation task, ignore some irrelevant or noise interference regions, and enhance the representation of the feature map, which can effectively improve the performance and generalization ability of the model. The experimental results show that the MIoU and accuracy of the U2-Net model are increased by 8.21 % and 4 % respectively after adding the CBAM module.
[1] LONGJ, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C].Proceedings of the IEEE conference on computer vision and pattern recognition.Piscataway, NJ, USA:IEEE, 2015:3431-3440.
[2] RONNEBERGER O, FISCHER P, BROX T. U-net:Convolutional networks for biomedical image segmentation[C].Medical Image Computing and Computer-Assisted Intervention–MICCAI2015:18th International Conference, Munich, Germany:Springer International Publishing, 2015:234-241.
[3] QIN X, ZHANG Z, HUANG C, et al. U2-Net:Going deeper with nested U-structure for salient object detection[J]. Pattern recognition, 2020(106):107404.
[4]刘孟轩,张蕊,曾志远,等.基于注意力机制的全卷积神经网络模型[J].现代信息科技,2021,5(23):92-95.
[5]欧阳柳,贺禧,瞿绍军.全卷积注意力机制神经网络的图像语义分割[J].计算机科学与探索,2022,16(5):1136-1145.
[6]彭鹄.基于全卷积神经网络的航空遥感图像语义分割及改进方法研究[D].哈尔滨:哈尔滨工业大学, 2020.
[7]张宸嘉,朱磊,俞璐.卷积神经网络中的注意力机制综述[J].计算机工程与应用, 2021,57(20):64-72.
[8] WOO S, PARK J, LEE J Y, et al. Cbam:Convolutional block attention module[C].Proceedings of the European conference on computer vision(ECCV). Munich, Germany:Springer,2018:3-19.
[9]李鑫.基于深度学习的图像语义分割方法研究[D].绵阳:西南科技大学, 2023.
[10]李艳青.基于注意力机制和U-Net的图像显著性检测算法研究与实现[D].北京:北京交通大学,2021.
[11]于海洋,景鹏,张文涛,等.基于残差与注意力机制的道路裂缝检测U-Net改进模型[J].计算机工程, 2023, 49(6):265-273.
[12] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected crfs[J]. Computer Science, 2014(4):357-361.
[13] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Deeplab:Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE transactions on pattern analysis and machine intelligence,2017, 40(4):834-848.
[14] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL].(2017-06-17)[2023-06-10]. https://arxiv.org/abs/1706.05587.
[15] CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoderdecoder with atrous separable convolution for semantic image segmentation[C].Proceedings of the European conference on computer vision(ECCV). Springer, 2018:801-818.
[16]陈其浩,孙林,张倩.基于改进U2-Net的透明件划痕检测方法[J].科学技术与工程, 2022,22(2):620-627.
[17]王健.基于U2-Net眼底OCT图像黄斑水肿的语义分割方法研究[D].哈尔滨:黑龙江科技大学, 2023.
[18] HARIHARAN B, ARBELáEZ P, BOURDEV L, et al. Semantic contours from inverse detectors[C].2011 international conference on computer vision. Piscataway, NJ, USA:IEEE,2011:991-998.
[19] EVERINGHAM M,VAN GOOL L,WILLIAMS C K I,et al. The PASCAL Visual Object Classes Challenge 2012(VOC2012)Results[DB/OL].(2020-10-25)[2023-06-10].http://www.pascalnetwork.org/challenges/VOC/voc2012/workshop/index.html.
[20] PASZKE A, GROSS S, MASSA F, et al. Pytorch:An imperative style, high-performance deep learning library[C].Advances in neural information processing systems. MIT Press, 2019.
[21] KINGMA D P, BA J. Adam:A method for stochastic optimization[J]. arXiv preprint arXiv:1412.6980, 2014.
[22] HU J, SHEN L, Sun G. Squeeze-and-excitation networks[C].Proceedings of the IEEE conference on computer vision and pattern recognition. Piscataway, NJ, USA:IEEE, 2018:7132-7141.
基本信息:
中图分类号:TP391.41
引用信息:
[1]刘帅,邓晓冰,杨火祥,等.基于注意力机制的U~2-Net图像语义分割[J].深圳信息职业技术学院学报,2023,21(05):1-8.
2023-10-28
2023-10-28