Recognition of apple targets before fruit thinning by robot based on R-FCN deep convolution neural network
(2.Key Laboratory of Agricultural Internet of Things, Ministry of Agriculture and Rural Affairs, Yangling, China 712100)
(3.Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service, Yangling, China 712100)
【Abstract】Before fruit thinning, the factors such as complex background, various illumination conditions, foliage occlusion, fruit clustering, especially the extreme similarities between apples and background, made the recognition of small apple targets very difficult. To solve these problems, we proposed a region-based fully convolutional network (R-FCN) recognition method. Firstly, the deep convolution neural network including ResNet-50 based on R-FCN and ResNet-101 based on R-FCN were studied and analyzed. After the framework of these two networks was analyzed, it was obvious that the difference between these two networks was the ‘conv4’ block. The ‘conv4’ block of ResNet-101 based on R-FCN was 51 more layers than that of ResNet-50 based on R-FCN, but the recognition accuracy of these two networks was almost the same. By comparing the framework and recognition result of ResNet-50 based on R-FCN and ResNet-101 based on R-FCN, we designed the R-FCN based on ResNet-44 to improve the recognition accuracy and simplify the network. The main operation to simplify the network was to simplify the ‘conv4’ block, and the ‘conv4’ block of ResNet-44 based on R-FCN was six layers fewer than that of ResNet-50 based on R-FCN. The Res Net-44 based on R-FCN consisted of ResNet-44 fully convolutional network (FCN), region proposal network (RPN) and region of interest (RoI) sub-network. Res Net-44 FCN, the backbone network of R-FCN, was used to extract the features of the image. The features were then used by RPN to generate RoIs. After that, the features extracted by ResNet-44 FCN and RoIs generated by RPN were used by RoI sub-network to recognize and locate small apple targets. A total of 3 165 images were captured in an experimental apple orchard in the College of Horticulture, Northwest A&F University, in Yangling City, China. After image resizing and manual annotation, 332 images, including 85 images captured under sunny direct sunlight condition, 88 images captured under sunny backlight condition, 86 images captured under cloudy direct sunlight condition, 74 images captured under cloudy backlight condition, were selected as test set, and the other 2 833 images were used to train and optimize the network. To enrich image training set, the data augment, including brightness enhancement and reduction, chroma enhancement and reduction, contrast enhancement and reduction, sharpness enhancement and reduction, and adding Gaussian noise, was performed, then a total of 28 330 images were obtained with 23 591 images randomly selected as the training sets, and the other 4 739 images as validation sets. After training, the simplified ResNet-44 based on R-FCN was tested on the test set, and the experimental results indicate that the method can effectively apply to the images captured under different illumination conditions. The method can recognize clustering apples, occluded apples, vague apples and apples with shadows, strong illumination and weak illumination on the surface. In addition, the apples divided into parts by branched or petiole cloud can also be recognized effectively. Overall, the recognition recall rate can achieve 85.7%. The recognition accuracy and false recognition rate are 95.2% and 4.9%, respectively. The average recognition time is 0.187 s per image. To further test the performance of the proposed method, we compared the other three methods, including Faster R-CNN, ResNet-50 based on R-FCN and ResNet-101 based on R-FCN. The F1 of the proposed method is increased by 16.4, 0.7 and 0.7 percentage points, respectively. The average running time of the proposed method improves by 0.010 s and 0.041 s compared with that of Res Net-50 based on R-FCN and Res Net-101 based on R-FCN, respectively. The proposed method can achieve the recognition of small apple targets before fruits thinning which cannot be realized by traditional methods. It can also be widely applied to the recognition of other small targets whose features are similar to the background.
【Keywords】 image processing; algorithms; image recognition; small apple; target recognition; deep learning; R-FCN;
(Translated by LIU T)
 Gongal A, Amatya S, Karkee M, et al. Sensors and systems for fruit detection and localization: A review [J]. Computers & Electronics in Agriculture, 2015, 116: 8–19.
 Jiang G Q, Zhao C J. Apple recognition based on machine vision [C]. //International Conference on Machine Learning and Cybernetics. IEEE, 2012: 1148–1151.
 Lu J, Sang N. Detecting citrus fruits and occlusion recovery under natural illumination conditions [J]. Computers & Electronics in Agriculture, 2015, 110: 121–130.
 Ji W, Zhao D, Cheng F, et al. Automatic recognition vision system guided for apple harvesting robot [J]. Computers & Electrical Engineering, 2012, 38 (5): 1186–1195.
 Rizon M, Yusri N A N, Kadir M F A, et al. Determination of mango fruit from binary image using randomized Hough transform [C]. //Eighth International Conference on Machine Vision. International Society for Optics and Photonics, 2015, 9875 (3): 1–5.
 Silwal A, Gongal A, Karkee M. Identification of red apples in field environment with over the row machine vision system [J]. Agricultural Engineering International: The CIGR Journal, 2014, 16 (4): 66–75.
 Rakun J, Stajnko D, Zazula D. Detecting fruits in natural scenes by using spatial-frequency based texture analysis and multiview geometry [J]. Computers & Electronics in Agriculture, 2011, 76 (1): 80–88.
 Chaivivatrakul S, Dailey M N. Texture-based fruit detection [J]. Precision Agriculture, 2014, 15 (6): 662–683.
 Feng J, Wang S, Liu G, et al. A separating method of adjacent apples based on machine vision and chain code information [C]. //International Conference on Computer and Computing Technologies in Agriculture, 2012: 258–267.
 Arefi A, Motlagh A M, Mollazade K, et al. Recognition and localization of ripen tomato based on machine vision [J]. Australian Journal of Crop Science, 2011, 5 (10): 1144–1149.
 Zhou R, Damerow L, Sun Y, et al. Using colour features of cv. ‘Gala’ apple fruits in an orchard in image processing to predict yield [J]. Precision Agriculture, 2012, 13 (5): 568–580.
 Wachs J P, Stern H I, Burks T, et al. Low and high-level visual feature-based apple detection from multi-modal images [J]. Precision Agriculture, 2010, 11 (6): 717–735.
 Zhu A, Yang L. An improved FCM algorithm for ripe fruit image segmentation [C]. //IEEE International Conference on Information and Automation. IEEE, 2014: 436–441.
 Linker R, Cohen O, Naor A. Determination of the number of green apples in RGB images recorded in orchards [J]. Computers & Electronics in Agriculture, 2012, 81 (1): 45–57.
 Arefi A, Motlagh A M. Development of an expert system based on wavelet transform and artificial neural networks for the ripe tomato harvesting robot [J]. Australian Journal of Crop Science, 2013, 7 (5): 699–705.
 Lv Q, Cai J R, Liu B, et al. Identification of fruit and branch in natural scenes for citrus harvesting robot using machine vision and support vector machine [J]. International Journal of Agricultural & Biological Engineering, 2014, 7 (2): 115–121.
 Zhao C Y, Lee W S, He D J. Immature green citrus detection based on colour feature and sum of absolute transformed difference (SATD) using colour images in the citrus grove [J]. Computers & Electronics in Agriculture, 2016, 124: 243–253.
 Zhao Kaixuan, He Dongjian. Recognition of individual dairy cattle based on convolutional neural networks [J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2015, 31 (5): 181–187 (in Chinese with English abstract).
 Guo Y, Liu Y, Oerlemans A, et al. Deep learning for visual understanding: A review [J]. Neurocomputing, 2016, 187: 27–48.
 Zhou Yuncheng, Xu Tongyu, Zheng Wei, et al. Classification and recognition approaches of tomato main organs based on DCNN [J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2017, 33 (15): 219–226 (in Chinese with English abstract).
 Fu Longsheng, Feng Yali, Elkamil Tola, et al. Image recognition method of multi-cluster kiwifruit in field based on convolutional neural networks [J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2018, 34 (2): 205–211 (in Chinese with English abstract).
 Sa I, Ge Z, Dayoub F, et al. Deep Fruits: A fruit detection system using deep neural networks [J]. Sensors, 2016, 16 (8): 1222.
 Bargoti S, Underwood J. Deep fruit detection in orchards [C]. //IEEE International Conference on Robotics and Automation, 2017: 3626–3633.
 Chen S W, Skandan S S, Dcunha S, et al. Counting apples and oranges with deep learning: a data driven approach [J]. IEEE Robotics & Automation Letters, 2017, 2 (2): 781–785.
 Bargoti S, Underwood J P. Image segmentation for fruit detection and yield estimation in apple orchards [J]. Journal of Field Robotics, 2017, 34 (6): 1039–1060.
 Rahnemoonfar M, Sheppard C. Deep count: Fruit counting based on deep simulated learning [J]. Sensors, 2017, 17 (4): 905.
 Liu X, Chen S W, Aditya S, et al. Robust fruit counting: combining deep learning, tracking, and structure from motion [J]. International Conference on Intelligent Robots and Systems, 2018: 1045–1052.
 Stein M, Bargoti S. Underwood J. Image based mango fruit detection, localization and yield estimation using multiple view geometry [J]. Sensors, 2016, 16 (11): 1915.
 Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks [C]. //International Conference on Neural Information Processing Systems, 2015: 91–99.
 Dai J, Li Y, He K, et al. R-fcn: Object detection via region-based fully convolutional networks [C]. //Advances in Neural Information Processing Systems. Curran Associates Inc. 2016.
 Jiang Sheng, Huang Min, Zhu Qibing, et al. Pedestrian detection method based on R-FCN [J]. Computer Engineering and Applications, 2018, 54 (18): 180–183 (in Chinese with English abstract).
 Sang Nong, Ni Zihan. Gesture recognition based on R-FCN in complex scenes [J]. Journal of Huazhong University of Science and technology: Natural Science Edition, 2017 (10): 54–58 (in Chinese with English abstract).
 Xu Yizhi, Yao Xiaojing, Li Xiang, et al. Object detection in high resolution remote sensing images based on fully convolution networks [J]. Bulletin of Surveying and Mapping, 2018, 490 (1): 80–85 (in Chinese with English abstract).
 Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks [C]. //International Conference on Neural Information Processing Systems. Curran Associates Inc., 2012: 1097–1105.
 Zeiler M D, Fergus R. Visualizing and understanding convolutional networks [C]. //European Conference on Computer Vision, 2014: 818–833.
 Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition [C]. //International Conference on Learning Representations, 2015: 1–14.
 Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions [C]. //IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2015: 1–9.
 He K, Zhang X, Ren S, et al. Deep residual learning for image recognition [C]. //IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770–778.