Classical Image Based Classification of Coffee Beans on Their Botanical Origins in Tongo and Wambara, Benishangul Gumuz, Ethiopia

Ethiopia is a homeland of coffee. Coffee is a major export commodity of Ethiopia, which has a significant role in earning foreign currency. This research was conducted with the objective of developing an appropriate computer routine algorithm that can characterize different varieties of Beneshanguel coffee based on their growing region. Imaging techniques were employed to automatically classify the coffee bean samples according to provenance in Beneshanguel (Tongo and Wombera) which corresponds to their botanical origins. Important coffee bean features, namely, color, shape and size and texture were extracted from 100 images (50 images from each location). For the purpose of classification, altogether 24 features (12 colors, 6 shapes and size and 6 textures) were extracted from images of the coffee samples from the two locations. Artificial neural network (ANN) was employed to automatically categorize the coffee beans according to their provenance. We have compared classification approaches of Neural Network classifiers were employed based on the features used for color, morphology (shapes and size), texture, and the combination of morphology and color respectively. To evaluate the classification accuracy, from the total of 100 sample images of the training 70% (70 images), validation 20% (20 images) and testing 10% (10 images) data. Classification scores of 93%, and 99.3% were achieved for color, morphology, texture and a combination of morphology and color features, respectively. The classification results of the network indicated that morphology and a combination of morphological and color features exhibited the highest accuracy. In conclusion, the results of this study have revealed that imaging technique could be used as the most effective method to determine coffee bean qualities for export. However, it is suggested that the repeatability of this coffee quality testing method be validated using a large data set before employing the algorithm for the purpose of classifying coffee beans as a daily routine.


Background of the Study
Coffee is an edible commodity. It is widely used as a beverage but now a day's its use as input in some food processing industries is increasing [3]. For instance, it is used as a flavoring to various pastries, ice-creams, chocolate, etc.
There are different types of coffee in the world. Among different types of coffee, the major economic species are coffee Arabica and coffee Robusta. Arabica accounts 80% of the world coffee trade, and Robusta most of the remaining 20%. Coffee Liberica and Excelsa together supply less than 1% [19].
The origins of the coffee crop can be traced back to the Ethiopian highlands for coffee Arabica and the forest of West and Central Africa for coffee Robusta (Canephora). Coffee was well established as a beverage in Yemen by the 14 th other Middle Eastern countries in the 15 th century, from where it spread to century and across Arabian Sea to India. Today coffee is widely cultivated and used throughout the tropics [12].
Ethiopia has a suitable environment to grow all Arabica coffee varieties. Currently, only Coffee Arabica is grown in Ethiopia. Other coffee species are not cultivated yet. Ethiopia being the home of Arabica coffee, the first coffee was discovered from south-western massive highlands of Ethiopia called Kaffa, more specifically from a district called Buno. In Ethiopia, coffee production is concentrated in the Oromia and Southern regions of the country, though the majority of Ethiopian regions are still suitable for coffee growth [12]. Ethiopia is not only the icon of coffee, but it thrives on coffee and people drink coffee regularly in every part of the country. Coffee is closely associated with the Ethiopian culture. Most people in the country start their day by taking a cup or two of coffee in the morning. Coffee ceremony, the tradition of serving coffee in Ethiopia is unique.

Statement of the Problem
In agricultural industry, quality assessment and varieties of agricultural products create the main problems. Nowadays, the quality and varieties of grain seed have been determined manually through a visual inspection by experienced technicians. So it requires high degree of accuracy to satisfy customer need of high level of quality, as well as correctness for a non-destructive quality evaluation method, which is proposed, based on image processing [13]. In different parts of Ethiopia, different coffee varieties are grown. The coffee beans produced in different parts of the country have been prepared for local consumption and for international market by traditional inspection, which makes it subjective and non effective. Identifications are based on physical properties like color, size, shape and flavor all of which are frequently examined by human inspection.
The main interest of this research is to develop computer algorithm that can classify Benishangul region coffee of two botanical origins. Coffee beans that are produced in different parts of the country have distinct physical properties. One of the best ways to classify coffee bean based on their botanical origin is image analysis. Image analysis uses to prepare appropriate model that uses for multipurpose techniques like sorting, classification of their botanical regions and variety classification.
Once the algorithm is developed, it is possible to do classification of variety, identification of growing region and sorting of coffee beans. Using traditional method, one cannot distinguish all the features that computer vision can detect. Due to this reason, automate classification, identification and sorting of coffee bean will not be subjective and non efficient. On the other hand, the use of technology based model will make the country more competent and the coffee to be accepted without doubt, all over the world.
Therefore, this thesis work will initiate a model for Ethiopian coffee variety classification which is consistent, efficient and cost effective by exploring the technology of image analysis.
In agricultural industry, quality assessment and varieties of agricultural products create the main problem. Nowadays the qualities and varieties of grain seed have been determined manually through a visual inspection by an experienced technician. However, it requires high degree of accuracy to satisfy customers' need of high level of quality, as well as correctness for a non-destructive quality evaluation method, image processing is a preferred method [13].
Ethiopian coffee is an important source of coffee genetic resources for the world coffee industry. As a matter of fact, Ethiopia is the only center of origin and diversity of Arabica coffee (Coffea. arabica L) [15].
In Ethiopia, coffee grows over a wide range of agro-ecology zones and geographical regions. [7]put it, size and shape difference of coffee beans were influenced by botanical variety and environmental growth circumstances. Coffee grows under diverse environmental condition ranging from 550 to 2600 m above sea level, with annual rainfall from 1000 to 2000mm, and minimum and maximum temperature ranges of 8 to 15°C, and 24 to 31°C, respectively. Coffee requires deep, well drained, loamy and slightly acidic soils [7].
Ethiopian coffee is designated by geographical place name, letters (to tell sub regions) and grade 1 through 9 to tell quality of the coffee and also all coffee divide into four large groups as, commercial washed, commercial unwashed, specialty washed and specialty unwashed [9].

Results and Discussion
In this study, MATLAB version R2013a has been used to implement all image processing and analysis algorithms. All steps of MATLAB scripts that have been used to perform the above pre-processing steps are clearly illustrated in Appendix.
Results on segmentation, feature extraction and classification are given followed by feature extraction and finally, Benishangule coffee bean classification are made based on ANN in order to permit comparison of our system with those introduced in the literature.
Pre-Processing Images of Coffee Beans Samples will be collected from selected coffee growing parts of the Benishanguel region (Tongo and Wombera). From the local market five Quintals (one Quintal contains 100kg) will be randomly selected, from each selected places and 100gm will be taken from each Quintal (1kg from each of selected part of the country). From the collected samples of 100gm, 50gram will be used for image analysis and the rest are used as reference. 10 images will be taken per each Quintal, which means 50 image from each selected coffee growing part of the country and totally 100 images will be captured and out of this 60% to 70% of the image will be used for practice purpose and the rest will be used for thesis work. The coffee bean images shows in figure 1 are the first experiment with a matrix size of 256×256 pixels.  Figure 2 and Figure 3 shows pre -processed images of the sample of coffee beans. The RGB images of sample coffee beans captured were converted into gray level images for feature extraction after the original images were enhanced by stretched contrast. Then the gray level image was changed to binary image by using automatic thresholding. The binary image indicated in the figure was obtained, but due to some defects on the surface of the coffee beans, there exists white color surface which resulted as hole. These holes were filled with their neighboring pixels, and then image of binary images with holes was filled as shown in the figure.

Result of Colour Features
Summarizations of the result of color feature are given in Table 1. In Table 1 Tongo coffee beans have the largest value of green and blue components compared to the coffee beans from other Benishangule coffee beans types, this gives Tongo coffee beans a bluish shade of green appearance.

Result of Texture Features
Texture features results are summarized in Table 2. Table 2 Wombera coffee beans has the largest value of energy and less value in entropy compared to the other Benishangul coffee beans type. This gives Wombera has the highest similarity and highest homogeneity compared to the other types of Benishangule coffee beans.

Result of Shape and Size Features
Shape and size features result are summarized in Table 3. In Table 3 Tongo were found to be the biggest in size where as Wombera were found to be the smallest in size and rounded in shape.  To adjust classification set-up using color features twelve inputs, eighteen hidden layers and five output layers of neuron of particle recognition toolbox of MATLAB software of version R2013a and five outputs were used as described in Figure 4 below.   As indicated in the Table the summary result of neural classifier using color feature alone showed that from the total of 100 images, 82 (82%) were correctly classified and 18 (18%) were incorrectly classified.
From the result we could conclude that BGWCB3 has a unique color which makes it distinct from the others. There was also a strong color relationship between BGTCB1 and BGTCB2.

Classification of Texture Feature
In this model, the six texture features (energy, contrast, entropy, homogeneity, correlation, inverse difference moment) of Beneshanguel coffee were used as input to the network. Hence, the neuron numbers of the input layer were six. The output neurons were five that correspond to the five predefined coffee growing regions considered in this study. The numbers of neurons in the hidden layers eleven as described in the Figure 5 below. The classification result based texture features are shown in Table 5 of confusion matrix. The Table shows the confusion matrix that indicates the correct classification and misclassification of 100 images of the training 70% (70 images), validation 25% (25 images) and testing 5% (5 images) data.
From the result we could say that all Beneshanguel coffee samples share textural features, BGTCB1 with BGTCB2, BGWCB3 with BGWCB1 and BGWCB2, which may be attributed to the proximity of the region from which the coffee samples were drawn which possibly make them share certain genotype similarities. Resizing may be the cause because it may lose some basic information.

Classification Result of Shape and Size Features
In this model, six shape and size features were selected namely, area, perimeter, maximum diameter, minimum diameter, equivalent diameter and Surface roundness. Hence, the neuron numbers of the input layer were six. The output neurons were five that correspond to the five predefined coffee growing regions considered in this study the number of neurons in the hidden layers were thirteen as shown in figure below. The classification result based shape and size features are shown in Table 6 of confusion matrix. The Table shows the confusion matrix that indicates the correct classification and misclassification of 100 images of the training 70% (70 images), validation 25% (25 images) and testing 5% (5 images) data. As indicated in the Table the summary result of ANN classifier confusion matrix on the shape and size feature alone showed that from the total of 100 images, 82 images (82%) were correctly classified and 18 (18%) were misclassified.

Analysis of the Result
The result of Artificial Neural Network (ANN) classification using shape and size features showed that the classification accuracy of BGTCB1, BGTCB2, BGWCB1, BGWCB2 and BGWCB3 coffee were 96%, 94%, 80%, 94% and 100%, respectively.

Classification Model Result Based on All Features
In this model, twenty four features corresponding to six shape and size features, twelve color features and six textural features of Benshanguel coffee were used as input to the neural network hence; there were twenty four neuron numbers for the input layer. The same to others, this experimentation has five output classes corresponding to the predefined coffee growing regions. The numbers of neurons in the hidden layers were also eighteen as shown in the Figure 7. The classification result based on all features is shown in Table 7 of con fusion matrix. The Table shows the confusion matrix that indicates the correct classification and misclassification of 100 images of the training 70% (70 images), validation 25% (25 images) and testing 5% (5 images) data. This shows that the classification result based on the combined features exhibited the highest accuracy and also there is strong relation between BGTCB1 and BGTCB1. There is a significant relation between of BGWCB1 and BGWCB2. In other words, there is an ambiguity of classification between BGTCB1 & BGTCB2, and also between BGWCB1 & BGWCB3.

Conclusion
Coffee is a commercial commodity that plays a major role in earning foreign currency among export commodities of Ethiopia. The sub-sector is getting governmental and nongovernmental attention due its significance in commercial activities. The brand patent creation of each coffee variety based on region of origin was an issue in current period. Beneshangul coffee brands are not nationally (ECX) and internationally recognized and registered as property right to Ethiopia with their distinct character, flavor and taste. Beneshangul coffee beans which fulfill all the requirements especially Wombera coffee beans more power full than Wollega coffee beans in my researches and also traditionally very testy. The limitation of this area coffee beans are the researchers not attention to research this place so that the private and governmental organization with my study or additional research in other types of research methodology further research to recognize to get patent of this botanical coffee types.
The experimental results showed that color and morphology (shape and size) features have more accuracy to classify Beneshangul coffee based on growing regions than the other one feature (texture) in artificial neural network (ANN) classification. But the classification accuracy of coffee bean increases when all the features were used together, which shows the importance of incorporating all features in the classification Beneshangul coffees of different regions. Therefore, the coffee industry can use image processing as a quick way to identify the types of coffee beans.