Visual object detection is an important component in several applications of automated agriculture. In this paper we consider how to properly apply modern deep networks for detection tasks in agricultural contexts, benchmark their performance, and compare their accuracy to human performance. Seven diverse datasets were collected for the benchmark, with three recent networks tested. Experiments have revealed that handling small objects and large scale variance are important failure points, and hence a multiple-resolution approach for network usage was developed, which significantly increased detection accuracy on most datasets. Detection results were compared to human accuracy, judged based on the consistency of multiple annotators. Quantitative analysis shows that for large unoccluded objects accuracy of both algorithms and humans is near perfect, and quantifies the degradation of both due to occlusion and scale difficulties. Finally, application-specific accuracy metrics were suggested based on the needs of several agricultural tasks, and used for estimating the best performing detectors.
- Deep networks
- Multiple resolution processing
- Object detection