Multi-Class on-Tree Peach Detection Using Improved YOLOv5s and Multi-Modal Images

Qing Luo, Yuan Rao, Xiu Jin, Zhaohui Jiang, Tan Wang, Fengyi Wang, Wu Zhang

Research output: Contribution to journalArticlepeer-review

12 Scopus citations

Abstract

Accurate peach detection is a prerequisite for automated agronomic management, e.g., peach mechanical harvesting. However, due to uneven illumination and ubiquitous occlusion, it is challenging to detect the peaches, especially when the peaches are bagged in orchards. To this end, an accurate multi-class peach detection method was proposed by means of improving YOLOv5s and using multi-modal visual data for mechanical harvesting in this paper. RGB-D dataset with multi-class annotations of naked and bagging peach was proposed, including 4127 multi-modal images of corresponding pixel-aligned color, depth, and infrared images acquired with consumer-level RGB-D camera. Subsequently, an improved lightweight YOLOv5s (small depth) model was put forward by introducing a direction-aware and position-sensitive attention mechanism, which could capture long-range dependencies along one spatial direction and preserve precise positional information along the other spatial direction, helping the networks accurately detect peach targets. Meanwhile, the depthwise separable convolution was employed to reduce the model computation by decomposing the convolution operation into convolution in the depth direction and convolution in the width and height directions, which helped to speed up the training and inference of the network while maintaining accuracy. The comparison experimental results demonstrated that the improved YOLOv5s using multi-modal visual data recorded the detection mAP of 98.6% and 88.9% on the naked and bagging peach with 5.05 M model parameters in complex illumination and severe occlusion environment, increasing by 5.3% and 16.5% than only using RGB images, as well as by 2.8% and 6.2% when compared to YOLOv5s. As compared with other networks in detecting bagging peaches, the improved YOLOv5s performed best in terms of mAP, which was 16.3%, 8.1% and 4.5% higher than YOLOX-Nano, PP-YOLO-Tiny, and EfficientDet-D0, respectively. In addition, the proposed improved YOLOv5s model offered better results in different degrees than other methods in detecting Fuji apple and Hayward kiwifruit, verified the effectiveness on different fruit detection tasks. Further investigation revealed the contribution of each imaging modality, as well as the proposed improvement in YOLOv5s, to favorable detection results of both naked and bagging peaches in natural orchards. Additionally, on the popular mobile hardware plat‐ form, it was found out that the improved YOLOv5s model could implement 19 times detection per second with the considered five-channel multi-modal images, offering real-time peach detection. These promising results demonstrated the potential of the improved YOLOv5s and multi-modal visual data with multi-class annotations to achieve visual intelligence of automated fruit harvesting systems.

Original languageEnglish
Pages (from-to)84-104
Number of pages21
JournalSmart Agriculture
Volume4
Issue number4
DOIs
StatePublished - 30 Dec 2022
Externally publishedYes

Keywords

  • deep learning
  • mechanical harvesting
  • multi-class detection
  • multi-modal visual data
  • YOLOv5s

ASJC Scopus subject areas

  • Agronomy and Crop Science
  • Agricultural and Biological Sciences (miscellaneous)
  • Engineering (miscellaneous)

Fingerprint

Dive into the research topics of 'Multi-Class on-Tree Peach Detection Using Improved YOLOv5s and Multi-Modal Images'. Together they form a unique fingerprint.

Cite this