Weed Identification in Sugarcane Plantation Through Images Taken from Remotely Piloted Aircraft (RPA) and kNN Classifier

The sugarcane is one of the most important crops in Brazil, the world ́s largest sugar producer and the second largest ethanol producer. The presence of weeds in the sugarcane plantation can cause losses up to 90% of the production, caused by the competition for light, water and nutrients, between the crop and the weeds. Usually sugarcane plantations occupy large fields, and due to this, the weeds control is mostly chemical, which is more practical and cheaper than mechanical control. In the chemical control, the dosage and type of herbicides has been calculated by sampling, which causes problems of waste and misapplication of herbicides, since the degree of infestation may be variant from one location to another, as well as the species presents in the plantation. In order to avoid unnecessary waste in the herbicides application, there are some studies about weed identification using images taken from satellites, solution that have proved to have the advantage of covering the whole plantation, solving the problems of sample surveying, nevertheless, this method its dependent of a high weed density to ensure a good pattern recognition and its affected by the influence of clouds in the imagery quality. This work proposes a system for weed identification based on pattern recognition in imagery taken from a Remotely Piloted Aircraft (RPA). The RPA is able to fly at low altitude, so it is possible to take images closer to the plants and make the weed identification even in low infestation levels. In an initial evaluation, the system reached an overall accuracy of 83.1% and kappa coefficient of 0.775, using k-Nearest Neighbors (kNN) classifier.


Introduction
Sugarcane is an important tropical crop, largely used for sugar and ethanol production [1], especially for Brazil, the world´s largest sugarcane producer [2]. The presence of weeds in the sugarcane plantation can cause losses up to 90% of the production [3], these plants compete with crop for nutrients, water and light, hamper harvesting, and also can reduce the longevity of sugarcane plantation [4].
The sugarcane plantation occupies large fields, and for this reason, the weed control is done by herbicides [5]. The herbicide dosage estimation is calculated by sampling, causing problems of herbicide misapplication, due to the growth spatial variability of several weed species in the plantation.
In order to achieve a better precision in the herbicide application, studies about weed surveying using satellite imagery have been performed [6], however, to obtain good results, this method depends on large spots of weeds, because of the low spatial resolution, and also due to the low temporal resolution [7].
Another way for weed surveying is the use of Remotely Piloted Aircraft (RPA) imagery. The RPA can be used in low altitude flights, take images very close to the plants and do not have the problem of cloud interference. Even though the RPA cannot be used on rainy or windy days, usually, it can provide images more frequently than images taken from satellites. In [8] there is an example of a map of three categories of weed coverage produced from images taken from an RPA, which achieved 86% of overall accuracy using a multispectral camera and Object Based Image Analysis (OBIA).
The proposition of this work was identify three previous selected species of plants, the crop, one large leaf weed and one narrow leaf weed. This aim was achieved by k-Nearest Neighbors (kNN) model generated by Weka Software, using RGB images taken from an RPA as input data and reached 83.1% of overall accuracy in preliminary tests. RGB cameras are lighter and cheaper than multispectral ones, resulting in much longer drone flight time, and also can be used for a larger number of farmers.

Image Acquisition
The first step of the Weed Identification Process is Image Acquisition, which was done by a GoPro Hero3 Silver Edition 10 MP in a DJI Phantom ( Figure 2). The flight occurred at the experimental field of Agricultural Engineering Faculty (FEAGRI) of Unicamp, latitude: 22°48" 57' south, longitude: 47° 03" 33' west. The images were taken at heights of 3 to 4 meters.

Image Processing
The aim of this work was identify three species of plants, presented in Figure 3, sugarcane (crop), coco-grass (narrow leaf weed) and morning glory (large leaf weed), having this in mind, the kNN classifier was chosen, because of its supervised machine learning technique and has been used in others studies of weed identification before with good results [9]. The input data for the kNN model development were statistical descriptors of samples of crop and weeds.  The extracting of crop and weed samples was a manual activity and the identification of the weed samples was performed by specialists [10]. Figure 4 shows in the left side the original parts of images taken from a RPA of sugarcane (a), large leaf weed (c) and narrow leaf weed (e), and in the right side the same parts of images with the samples extracted of sugarcane (b), large leaf weed (d) and narrow leaf weed (f), from these parts of images were also extracted samples of soil and straw.  The crop and weed samples were divided into small group of pixels (sub-images), like a grid [11]. A program written in Java language calculated the statistical descriptors for each subimage. Statistical descriptors give more information than only the range of pixels values and this surplus of information can be useful for the pattern recognition process. In this work were used eight statistical descriptors (mean, average deviation, standard deviation, variance, kurtosis, skewness, maximum and minimum values) [12]. The average of pixels values of a subimage is the mean and the measures of statistical dispersion around the mean are the variance, standard deviation and average deviation. The skewness, minimum and maximum pixel values gives the shape of the histogram and the kurtosis reflects the sharpness of the histogram [13].
These eight statistical descriptors were applied for all three channels of the RGB system, totalling 24 descriptors for each sub-image to be used in the machine learning training and in the identification process.
In this work, better results were achieved by 5x5 grid unit (pixel area), however the sub-image size depends on the image resolution and the weed infestation level, higher weed infestation are easier to identify, so the identification can be done in larger sub-image, in the opposite lower weed infestation requires smaller sub-image size.
The sub-images extracted from the samples were divided into two groups -training and validation sets. For a better training 240 sub-images of each class were used in the training set, if a class had less than 240 sub-images, some sub-images were used more than once to complete this number. The kNN Classifier achieved 83.1% of overall accuracy using 40 sub-images of each class to be identified in the validation set. In this preliminary training process four classes were chosen: the crop (sugarcane), a narrow leaf weed (coco-grass), a large leaf weed (morning glory) and straw and soil. The straw and soil was the fourth class, which was necessary to distinguish them from the plants.

Model Creation
The core of the identification process is the Weka, a Data Mining with Open Source Machine Learning Software, developed by the University of Waikato, Hamilton, New Zealand. The Weka kNN Classifier used the statistical data calculated from the samples of the images to generate a model, which can be used for three plant species prediction. The model performance was evaluated using the validation set, in order to calculate the overall accuracy and kappa coefficient [14].
In k-Nearest Neighbors (kNN), a predefined number of closest elements in distance define the class of the new element. The number of nearest neighbors of the new element that needs to be classified is the value of k. Each nearest neighbor is a vote, the most voted class among its k nearest neighbors will be assigned to the new element. The standard Euclidean distance is the most common choice to calculate the distance of the k closest elements [15]. In Figure 5, the X element will be classified as red circle instead of blue square, because there are more red circles neighbors than blue squares neighbors near it.

Weed Mapping
The fourth and last step is Generating Map, where a program written in Java language will read the output of the Weka kNN Model and draw a weed map. In this map the crop and the weeds are identified by a color-coding, red for sugarcane, blue for narrow leaf weed and yellow for large leaf weed. From images like Figure 6a the statistical descriptors were calculated in the Image Processing activity (step 2), in the output of the step 2 the model generated in step 3 (Model Creation activity) were applied to identify each sub-image as crop, large leaf weed, narrow leaf weed, or straw or soil. Once identified a map can be drawn. Figure 6b is an example of a map in which the crop and the weeds were identified, there are several sub-images incorrectly identified, due to the variations of colors found in the leaves of the same plant, caused by difference in leaves age, sunlight exposure and nutrient deficiencies, but most of them were correctly identified.  Table 1 presents the confusion matrix of k-Nearest Neighbor execution. The letter "a" represents sugarcane, so as the letter "b" for coco-grass, letter "c" for morning-glory, and letter "d" for straw and soil. The corrected instances predicted by the classifiers are located in the diagonals of the tables, all the values outside the diagonals are wrong predictions made by the classifiers. The straw and soil was the class with the best accurate rate, sugarcane also had good results, for coco-grass and morning glory the kNN made some mistakes involving these two classes and both had the worst results. The Weka Software generated for this execution an overall accuracy of 83.1% and kappa coefficient of 0.775, accordingly with Table 2 these results can be classified as substantial agreement. Although the map produced in [8] were about three categories of weed coverage and these results were about three species of plants (crop and two weeds), both studies mapped three classes and achieved similar accuracy assessment. The weed identification process also produces a weed mapping, which allows crop and weed identification by a color-coding for a better visualization. The Figure 6b is an example of weed mapping, even though there were some wrong identification, most of the sub-images were correctly identified. In order to reduce the number of errors, a larger sample size will be tested in the Model Creation activity in future studies.

Conclusions
A weed mapping can be a useful tool for a more efficient application of herbicides in sugarcane plantations. In this work, a weed identification process was developed using RGB images taken from a RPA and kNN classifier. In preliminary tests, the overall accuracy achieved 83.1% and 0.775 of Kappa coefficient, which can be interpreted as substantial agreement.
These initial accuracy assessment and weed map with the plants represented by a color code demonstrating that kNN can be used for a weed map construction.
As a continuation of this work, studies for a map production with the weeds geolocation data will be made.