Prediction of Leaves Using Convolutional Neural Network

: Plants have a significant role in every corner, let it be for humans, animals, and the environment. They play a significant role in saving each other lives by providing each one with the necessities. For saving these plants, humans should be able to identify the plants in order to give proper treatment to the plants. The species of the plants can be easily identified by the venation of the leaves. This paper focuses on the Convolution Neural Networks (CNN) classification methodology, which helps to classify the leaves accurately. The work uses leaf images of apple, grape and tomatoes from the plant village dataset for getting the features and further classification of the leaves. The prediction of the leaves will be done by using the deep learning techniques in which the input layer will be the features extracted using the proposed algorithm. The proposed algorithm is based on Local Binary Pattern (LBP), which is a simple yet very efficient method to identify the pixels of the image by threshold in the neighborhood of each pixel and consider the result as a binary number. The proposed algorithm is efficient for its computational simplicity, which makes it possible to analyze images in challenging real-time settings in the field of image processing and computer vision.


Introduction
Leaves play a great role in identifying the health of a plant. The health of the plant can be measured by the health of the leaf. This paper will be focusing on identifying and prediction of the healthy leaves of Tomatoes, Grapes and Apples. Moreover We are entirely dependent on the food we get from the plants, and we must keep the plants healthy, and for automating this mundane task, many researchers have developed, algorithms that helps to identify the leaves of the plants, and help to know in prior that the leaves are unhealthy, detection of the leaves is not only necessary for finding the diseases, but it is also essential to know about the plants, to have a proper diagnosis.
The prediction of the leaves will be done by using the deep learning techniques (CNN), in which the input layer will be the features extracted using the proposed algorithm, The proposed algorithm is completely based on LBP, this algorithm is already a well known algorithm as it has been used for facial recognition and it is one of the efficient algorithms to find out the descriptors in an image. With the help of these descriptors and some popular machine learning techniques one can easily find similar images from a group of images, which helps in classification of the images.

Literature Review
Mohammad Aminul Islam et al [1] proposed a method to detect the plant using Histogram of Oriented Gradients (HOG) and Local Binary Patterns (LBP) features, with Suport Vector Machine (SVM). Using a combination of both these features on the Flavia leaf dataset, they got the accuracy rate of 91.25%. Their methodology involved pre-processing of the collected image data using segmentation and normalization techniques. After pre-processing, the features were extracted using HOG and LBP followed by their classification using SVM.
Jyotismita Chaki and Ranjan Parekh [2] used Neural Network classifier for recognition of leaves, depending on their shape, Moment Invariant (MI), hybrid representation, and centroid radii model. For experimentation they had used the plant scan dataset. They had divided the dataset into three classes namely Pittosporum tobira, Betula pendula, and Cercis siliquastrum. With a resolution of 350×350, each class contains 60 image samples. Ninety images were used for training and the rest ninety as test data. The accuracy obtained was 95.5%. Naveen Kumar Singla [3] used ANN techniques for multiclass classification of leaf images. FOS, GLCM, Gabor Filter, Gabor wavelets, and HU moments are the features extracted in his work, with CLEF 2012 dataset as the source of data collection. Post feature extraction, the ROI for the grayscale leaf image (with size 256×256) was found followed by converting them to their binary representations. Initially, the texture intensity is determined using Hu Moments and Gabor wavelets and next using none. The feature extraction from the two approaches were 67 and 155 respectively, Using this number of features they conducted two experiments to classify the data, each in two stages with the first being individual leaf classification followed by an overall ranking of the leaf data. The first experiment, performed on leaves of five different plant species, gave truthfulness of 78.98%. Here, the classification was done using an artificial neural network (ANN) approach taking more number of features (155). The low resolution of the images is the limiting aspect of this research work.
Vi Nguyen Thanh Le et al [4] have proposed a methodology for discrimination of plants using the combination of LBP and multiclass SVM. The data collection was done using a custom-built testing facility at ESRI. They pre-processed the data followed by segmentation and feature extraction. The accuracy rate for the unsegmented dataset is 95.24%, and for the segmented one, it is 98.07%.
Milan Šulc and Jiří Matas [5] have used a technique for identifying fine-grained plant images. Further, they explained that there is a high variance in the intra-class and small variation in the interclass. The outcome of their proposed texture analysis and deep learning methods are then evaluated and compared. The results achieved from the proposed methodology were better in terms of their state-ofthe-art leaf and bark classification. The results obtained from the proposed method makes it clear that the recognition of segmented leaves is a problem that can be solved practically, provided there is a large dataset for CNN to work as expected. This also helps in recognition of the plants in the wild but the difficulty is increased when there is high occlusion.
After the leaf is detected, it is crucial to discover the diseases in the plant leaves. Using image segmentation and soft computing technique, Vijai Singh and A. K. Misra [6] developed an algorithm for the same. Following their work, with the concept of chromosomes, there is a sequence of k chromosome cluster centers, where every chromosome represents a solution. Only the best chromosomes survive in the succeeding rounds and eventually, they get the clusters on which k-means clustering was applied. Further, to extract features, color co-occurrence method is used, in which the RGB images are converted to HSI images. With plant leaves' image viz rose with a bacterial disease, bean leaves with bacterial/fungal infection, lemon leaf with Sunburn disease, banana leaf with early scorch disease; as input, the accuracy for this algorithm was 95.71% when SVM was used for classification.
Sujatha R et al [7] have focussed on the detection of the leaf diseases, the steps followed by the researchers involve the collection of affected leaf images followed by their segmentation. Further, sharp edges were removed by applying contrast enhancement methods. Then, the RGB image is converted to HSI image followed by feature extraction using K-means, and classification using SVM.
Apart from using LBP for leaf feature extraction, the algorithm proved to be effective in the field of face recognition. The method proposed by ZHI-HUA XIE et al [8], involves the use of this algorithm to generate LBP code from normalized face image data. Additionally, the space location information of the face is partitioned into non-overlapping regions. LBPH is extracted from each local part of LBP image to build the local representation of the infrared face. Then the selection proposed algorithm is applied to the LBPH of each local region to generate a histogram. Next, these local histograms are concatenated into one feature vector to build global representation. Finally, the nearest neighbor classifier based dissimilarity of final features between training datasets and test face is employed to perform the classification task.
The research work by Anna Liza A. Ramos et al [9], outperforms the traditional LBP and PCA+LDA. The data was collected by capturing the face images from different angles and lighting conditions. Using haar-Cascade classifier for face detection, Color Feature Extraction for face-makeup detection and LBP for feature extraction; the algorithm recognizes faces and detects any makeup with a 90.83% accuracy. Moreover, the accuracy reached 100%, provided the face-angle is zero degrees, but that may give adverse results due to face occlusion, resolution, noise and distance issues.
Abdulrahman Al Rashidi et al [10] have demonstrated the usefulness of LBP in anomaly detection, in the urban areas. Using the UMN dataset, they integrated the LOG and LBP to extract features from the individual images and finally cascaded them into a feature vector. This vector is then fed into an MLP neural network for backpropagation, to detect any existing anomalies. With results at hand, the researchers worked out the scale-space volume. Nominating a point as "bright", given its value is higher w.r.t its neighbors, that directly leads to an efficient and robust method for feature extraction.
Zhicheng Yan et al [11] have introduced the hierarchical deep CNNs (HD-CNNs) by using the deep CNNs for twolevel categories namely coarse category classifier for separating easy classes and a fine category classifier for separating difficult classes. The proposed algorithm follows the following steps for training the algorithm, firstly the HD-CNN is pertained after the initialization of the coarse category component the fine category component is pretrained and the final step includes the fine tuning the HD-CNN. The experiment done by the author uses the CIFAR100 and ImageNet dataset. The outcome of the experiment shows to recognize the images of CIFAR100 the time taken was 0.10 seconds and the memory taken was 286mb and for ImageNet the time taken was 5.28 and memory taken was 6863mb when the HD-CNN was applied with conditional execution and parameter compression. While concluding the authors plan to extend HD-CNN archi-tectures to those with more than 2 hierarchical levels. Step 1: Collecting images from the publically available Plant Village Dataset [12]. We have collected the leaf images of Apple, Grape and Tomato from this dataset. The collected images were already pre-preprocessed (Grayscale Images) in the dataset.

Methodology
Step 2: Applied the proposed algorithm, on the collected images.
Step 3: CNN model is created for experimentation.
Step 4: Features were extracted using the proposed algorithm. They were forwarded to CNN to find the accuracy.
Step 5: Checking the prediction by giving test inputs to the CNN model.
The existing methods [1,2,3] uses Local Binary Pattern (LBP), Improved Local Binary Pattern (ILBP), and Median Binary Pattern; for feature detection, where a sliding window of size 3×3 is traversed through the image matrix.
In LBP, the center-most pixel is threshold against its neighborhood of 8 pixels. The value is substituted using the binary threshold function (Figure 1), which gives the output as 1 or 0. After substituting a matrix with 0s and 1s, a single decimal value replaces the 3×3 matrix.
These steps are repeated for the complete image array. Finally, we get an LBP image representation of the input image.
In ILBP, instead of the center pixel value of the image array, the mean value of the 3×3 matrix is used for thresholding. After processing, we get the ILBP image representation of the input image as indicated in.
Median Binary Pattern, the same steps as in ILBP are performed, except that the thresholding uses the median of the matrix.
Tweaking the methodologies discussed above, the proposed algorithm is as follows: Calculates the mode-value of the 3×3 matrix and finds the feature accordingly.
If Further, the dataset is kept on google drive so that it can easily be fetched by google collab.
By loading images on the google collab, training data is created with respective labels of the leaves. Concurrently, we design the CNN model which consists of..., The Python package, "random" is used to genuinely shuffle the data.
The training data is given to the CNN model as an input to find the prediction accuracy.
The model is tested manually by giving some test inputs after the prediction accuracy is generated.

Results and Discussion
The proposed methodology uses the plant village dataset in which there are three types of images for each leaf, i.e., color, grayscale, and segmented. The color folder contains the actual RGB images, the grayscale folder contains the grayscale version of the raw image, and the segmented folder contains the RGB images with just the leaf segmented and color corrected. Each folder contains 14 leaf images out of these some leaf images are healthy as well as infected. The proposed work uses healthy grayscale images of apples, grapes, and tomatoes. The total images used for research were 3147 composed of all the types.
The accuracies obtained from various methods are given in the table below:

Conclusion
In this paper, a novel technique is used to detect plant leaf images by combining the proposed algorithm with the CNN to classify the leaf images more accurately. The proposed methodology uses the publicly available dataset, plant village dataset, the cell size for each image was fixed for 256 which helps in faster processing of the algorithm to extract the features. There is always scope of improvement in the proposed methodology. The improvements can be done by testing the proposed algorithm against a vast variety of leaf images so that the accuracy can be calculated more extensively.