Land Suitability Prognostic Model for Crop Planting Using Data Mining Technique

This study aims to formulate a classification model which farmers can use to determine the suitability of a land for supporting cultivation based on information about identified factors. Structured interview with farmers and agro-specialists were conducted in order to identify the factors associated with the classification of land suitability. Fuzzy membership function was used to formulate the input and output variables of the classification model for land suitability based on the risk factors identified. The model was simulated using MATLAB® R2015b -Fuzzy Logic Tool. The results showed that 7 risk factors were associated with the classification of the suitability of land for crop planting. The risk factors identified are annual rainfall, months of dry season, relative humidity, abundance of clay soil, abundance of sand soil, abundance of organic carbon and pH value of soil on land. 2 and 3 triangular membership functions were appropriate for the formulation of the linguistic variables of the factors using appropriate linguistic variables while the target suitability of land was formulated using four triangular membership functions for the linguistic variables unsuitable, fairly suitable, moderately suitable and highly suitable. 288 inferred rules were formulated using IF-THEN statements which adopted the values of the factors as antecedent and the suitability of land for planting crops as the consequent part of each rule. This study concluded that based on the assessment of information about the factors associated with the classification of land suitability a reasonable conclusion can be made about the possible use of land.


Introduction
From the beginning of time, land has been an indispensable means of livelihood for many families and countries on a large scale. The ability to optimize the capacity to utilize land also has great potentials in increasing its overall economic value on the long run. In developing countries like Nigeria, the use of land for farming and agricultural activities has reduced over the years owing to the unsuitability of land for farming in many regions especially in the south-south part of Nigeria due to various underlying human activities such as oil drilling and oil spillage into neighboring rivers [1]. The early assessment of the suitability of land for farming using observable associated factors can improve the detection of the productivity of possible polluted lands for supporting farmers. Soil classification is considered as an important means of communication at both national and international levels [2,3]. However, few soil classification studies have been published [4]. The lack of soil classification reduces our knowledge and affects our land use decision. This difficulty is compounded by the fact that the hierarchical classifications are often built on criteria that vary greatly from one to the other. There is a need for the development of a land suitability classification model which can be adopted by farmers and land owners in polluted areas to determine the suitability of land for plating food or cash crops, hence this study.

Literature Review
Cogent decisions on suitability of a particular land area to yield maximum productivity for farming purpose remain an open research in the field of agriculture. A farm land suitability assessment for profitable potential is described as the process of evaluating land performance for farming purpose. The study [5] opined two types of land suitability evaluation approaches: qualitative and quantitative. The study [6] also stated that it was possible to evaluate a farming land prospect in qualitative terms, such as highly suitable, moderately suitable, or not suitable. In the second approach, quantitative, assessment of land suitability is given by numeric indicators. The study [7] made several descriptions of qualitative parameters of soil and plant growth, measurable at various scales of assessment, which can be used as numeric indicators of farm land suitability. Among the identified parameters are; weighting factors related to water infiltration (aggregate stability, surface porosity), water absorption (porosity, total C, earthworms), degradation resistance (aggregate stability, microbial processes) and plant growth (factors affecting rooting depth, water relations, nutrient relations and acidity).

Concept of Fuzzy Logic
Linguistic variables are a subset of the many possible attributes that could be used to denote the condition of a particular land under sampling. They can be gotten from biophysical, economic, social, management and institutional attributes, and from a range of measurement types [8]. Linguistic variables are also defined as valuable tools for assessment and evaluation of a given condition because they produce information needed to understand a complex system. It is possible to develop a fuzzy linguistic variable model, which would be useful for decisions regarding problems related with evaluation of farm land suitability. There exist two type of Linguistic variables namely; individual fuzzy indicators (IFI) and combined fuzzy indicators (CFI). The IFI indicates the extent of agreement of j attribute with requests of i user group and k task of land suitability assessment. Examples of possible j attributes include: soil characteristics, crop yields, or landscape properties. The IFI is defined as a number in the range from 0 to 1, which reflected an expert concept and modeled by an appropriate membership function, for which the expert concept has to take into account the specific of j attribute, i user group and k task of resource evaluation. The choice of a membership function is somewhat arbitrary and should mirror the subjective expert concept. The CFI is defined using fuzzy aggregated operations to merge the IFI. Therefore, the CFI gives an incorporated inference of farm lands suitability.
A number of membership functions have been proposed in the past few years, namely the triangular, trapezoidal and bell shape membership function. It is defined as a graph that defines how each point in the input space is mapped to the membership value [0 1]. The input value is often referred to as the universe of discourse set (u), which contain all the possible elements of concern in each particular application. The only condition a membership function must really satisfy is that it must vary between 0 and 1. The function itself can be an arbitrary curve whose shape we can define as a function that suits us from the point of view of simplicity, convenience, speed, and efficiency.
The MATLAB fuzzy logic toolbox includes 11 built-in membership function types. These functions are, in turn, built from several basic functions, namely: piece-wise linear functions, the Gaussian distribution function, the sigmoid curve and quadratic and cubic polynomial curves [9]. The resultant fuzzy logic makes use of fuzzy inference which is the process of formulating the mathematical model based on a mapping from an input set to an output value using fuzzy logic theory. The mapping provides a basis from which decision can be made, or patterns discerned. The process of fuzzy inference involves logical operations, such as the use of If-Then rules and membership functions. The Fuzzy Inference System (FIS) is a rule-based system; it is used as a tool for representing different forms of knowledge about a problem. FIS is also used for modeling the interactions and relationships that exist between its variables [5]. FIS takes into account all the fuzzy rules in the rule base and learns how to transform a set of inputs to corresponding outputs.

Related Works
The study [10] collected 1297 soil samples and measured the content of soil total nitrogen (TN), soil available phosphorus (AP) and soil available potassium (AK) in Zengcheng, north of the Pearl River Delta, China. The TN, AP and AK of soil were predicted in the study area based on auxiliary variables after dimensionality reduction, along with stepwise linear regression (SLR), support vector machine (SVM), random forest (RF) and back-propagation neural network (BPNN) models; 324 independent points were used to verify the predictive performance. The BPNN model demonstrated the best predictive accuracy among all method. The study concluded that the application of hyper spectral images (visible-near-infrared data) with BPNNOK model was found to be an efficient method for mapping and monitoring soil nutrients at the regional scale. The study [11] proposes a remote sensing technology to predict the nutritional content of farmland. The study made use of genetic algorithm optimization combined with a back propagation neural network GA-BPNN method for the prediction. The research implied that the GA-BPNN model provided the potential to map the soil total potassium content for large farm area.
The study [12] developed a Basal Stem Rot disease detection model for oil palm plant using thermal imaging technique. Thermal images of canopy section of oilpalm trees from healthy and BSR-infected trees were captured. The images were processed to extract pixel value representing thermal properties of the trees. These values were statistically analysed. Selected principal component scores were used in classification k-nearest neighbour (KNN) and Support Vector Machine (SVM) multivariate classification algorithms. The results demonstrated that when average pixel value of trees were used, the SVM-based model resulted in the highest average overall classification accuracy of 89.2% for training set and 84.4% for test set. The study was limited to the use of images for the classification of diseases affecting plants. The study [13] worked on the review of the detection of disease affecting plants based on images collected from plants upon observation. The study performed a literature review of study covering image processing of leaves, stem, branches and roots of plants using segmentation and filtering techniques. The results of the review showed that most of the techniques adopted include: removal of noise, segmentation, color, texture and edge feature extraction of the affected area of the plant, and classification of diseases using classifiers. The study was limited to the review of image processing techniques adopted for disease detection in plants.
The study [14] developed an expert knowledge-based fuzzy soil inference scheme SoLIM. The scheme was used to obtain the relationship between soils and their formative environmental condition. The results of the research showed that soil information product derived from SoLIM are of qualities in terms of both level of spatial detail and degree of attribute accuracy. However the degree of success of the SoLIM highly depends on the availability and quality of conventional data and the quality of knowledge on soilenvironmental relationships over the study area. The study [15] developed a fuzzy logic based on the Mamdani inference system (MFIS) to determine to what extent soil classified as Solonchak in World Reference Base WRB can interfere with Calcisols and Gypsisols. For that purpose, membership values of Solonchaks (Is), Calcisols (Ic), and Gypsisols (Ig) indices were calculated from 194 soil profiles previously classified as Solonchak in WRB. Soil classification obtained by employing MFIS was analogous to that provided by WRB; however, MFIS exhibited high precision concerning the membership value between soils and their intergrades.

Methodology
To develop the fuzzy logic-based model for the classification of land suitability for crop planting, factors influencing the suitability of land for farming were identified with the help of a botanist. This stage was followed by formulation of classification model using the fuzzy membership functions. The inference engine for the fuzzy logic model was furnished by knowledge of land suitability factors and variables of land suitability elicited from the botanist.

Identification of Associated Land Characteristics Factors
Following the process of the review of related works over the internet, a number of associated lands characteristic factors were identified which have a relative relationship with the classification of land suitability. Eight factors were identified with their respective labels. These variables were given different labels which were fuzzified using the triangular membership function with their respective crisp interval defined. The variables are presented alongside their respective labels as shown in Table 1.

Fuzzification of Variables
For the purpose of developing a classification model for land suitability for crop planting, each variables identified was fuzzified using a triangular membership function. The triangular membership function required the provision of 3 parameters which consisted of the left-hand base of triangle (a), the central apex of the triangle (b) and the right-hand base of the triangle (c). The values (a, b, c) of the triangular membership function corresponded to an interval of a ≤ b ≤ c such that the parameters are numeric valued. The interval of this parameter was used to define the crisp interval within which each crisp value required for calling the linguistic variable was assigned. As a result of this, since there were 2 or 4 linguistic variables defined for each factor identified then there were 2 (for identified features) or 4 (for suitability of land) triangular membership functions such that one was assigned to each linguistic variable identified for each associated factor as appropriate. Therefore, 2 triangular membership functions were formulated for each associated factor that was identified in this study based on the mathematical expression in equation (1). The expression shows how the triangular membership function was used to formulate the label of a variable called variable_label by fitting a numerical value x into a crisp interval of (a, b, c). (1) Using 2 or 3 triangular membership functions, the labels of the identified factors were formulated using the crisp intervals of (-0.5, 0.5), (0.5, 1.5) and (1.5, 2.5) to model the linguistic variables for 0, 1 and 2 respectively such that they are the center b of each interval as shown in Table 2.

Fuzzification of the Classification of Land Suitability
Following the identification and the fuzzification of the factors of land suitability, there was a need to formulate the target variable that was used to define the classification of land suitability. The triangular membership function was used to formulate the fuzzy logic model for the target variable by assigning crisp values of 0, 1, 2 and 3 to the target class labels, namely: Unsuitable, fairly Suitable, Moderately Suitable and Highly Suitable using the intervals (-0.5, 0.5), (0.5, 1.5), (1.5, 2.5) and (2.5, 3.5) respectively. Therefore, four (4) triangular membership functions were used to formulate the fuzzy logic model required to describe the 4 labels of the target class that was used to describe the suitability of land using the identified crisp as shown in table 3. Using the description provided in Table 3, the relationship between the factors and land suitability was proposed using the fuzzy inference system. The construction of the rule base used to design the fuzzy inference engine is presented in the following paragraphs.

Fuzzy Inference System Design
In order to construct the knowledge base of the classification model using fuzzy logic, a number of IF-THEN rules were used by combining the land characteristic factors as the precedence while the suitability of land for crop planting was used as the consequent variable. A typical rule that can be inferred is as follows: IF (Annual Rainfall="Low") AND (Month of Dry Season="Many") AND (Relative Humidity="Low") AND (Clay="Low") AND (Sand="Low") AND (Organic Carbon="Low") AND (pH Value="Acidic") THEN (Land Suitability="Unsuitable").
The number of rules that were required to be formulated for the fuzzy model were estimated from the product of the number of linguistic variable for each variable. Therefore, since annual rainfall had 3 linguistic variables, months of dry season had 2 linguistic variables, relative humidity had 2 linguistic variables, clay had 2 linguistic variables, sand had 2 linguistic variables, organic carbon had 2 linguistic variables, and pH value had 3 linguistic variables. Therefore, the total numbers of rules were 288 rules.

Simulation Environment
Fuzzy Logic Toolbox™ provides MATLAB functions, graphical tools, and a Simulink* block for analyzing, designing, and simulating systems based on fuzzy logic. For this study, five primary GUI tools (Elements of the MATLAB Fuzzy Logic System) for building, editing, and observing fuzzy inference systems in the toolbox were used;

Result and Discussion
In formulating the model for classifying land suitability, 2 and 3 triangular membership functions were used to formulate the fuzzy logic model for the labels of each characteristic factors with centers 0 and, and 0, 1 and 2 respectively as appropriate. Also, the allocation of the values was done based on the increasing effect of the labels of the identified factors used in this study. Therefore, the results of the mathematical representation of the fuzzy logic model formulation using the triangular membership function for each of the labels is presented in equation (2) to (4).

Simulation Result
The results of the simulation of the model for the classification of suitability of farm land is shown in Figure 1 such that the interval [-0.5, 0.5] with center 0 was used to model unsuitable, [0.5, 1.5] with center 1 was used to model fairly suitable, [1.5, 2.5] with center 2 was used to model moderately suitable while [2.5, 3.5] with center 3 was used to model highly suitable.  Figure 2 shows the complete insertion of the 288 rules that were inferred for determining the classification of the suitability of land for crop planting. It is clear that each rule inferred is unique and does not contain linguistic variables occurring in the same pattern in any of the rules defined. Therefore, for any given set of rules r and s within the 288 rules there is no rule r that has the set of linguistic variables as another rule s. Figure 3 displays the graphical region of each variable selected by each rule with respect to the linguistic variables of the classification of the suitability of land for crop planting. As shown in the bottom left part of the Figure 3, the crisp values entered were 0, 1, 1, 0, 0, 1, 1 and 0 which were consistent with the linguistic values namely: no for presence of brownish-gray coloured leaves, yes for presence of burnt orange-spotted leaves, yes for presence of whitish skin-like layer on stem-base, no for presence of brownish cortical leaves, no for presence of pale green foliage, yes for presence of dark brown spear rot, yes for presence of seed rot and no for presence of infected oil palm plant. According to rule# 102, the combination of these linguistic variables should have moderate classification of the severity of fungal diseases affecting oil palm plant which amounted to a crisp value of 2 which is within the interval of moderate.

Discussion of Results
This study developed a classification model which can be used by land owners for the classification of the suitability of land for planting crops based on information provided about associated risk factors. The study identified 7 non-invasive risk factors required for the classification of the suitability of land for planting crops. Each risk factor was defined using a number of linguistic variables for which central crisp values were assigned based on the association with the classification of the suitability of land for planting crops. The higher the association of the linguistic variable then the higher the central crisp values assigned.
The crisp values for each of the identified risk factor was done by allocating the values 0, 1 and 2 to some factors with 3 values and 0 and 1 to binary risk factors in increasing order of association with the classification of the suitability of land for planting crops. Each factor was divided into 2 or 3 parts such that the values of 0 and 1 or 0, 1 and 2 were allocated to each linguistic variable defined. Therefore, crisp intervals with centers of 0, 1 and 2 were used to define the labels of the identified factor using triangular membership functions to identify labels in intervals [-0.5 0.5], [0.5 1.5] and [1. 5 2.5] respectively.
For the purpose of establishing a relationship between the identified factors identified, 288 rules were inferred from the experts in order to determine the relationship between the risk factors identified and classification of the suitability of land for planting crops. In order to construct the knowledge base of the classification model using fuzzy logic, a number of IF-THEN rules were used by combining the factors as the precedence while theclassification of the suitability of land for planting crops was used as the consequent variable. Using the risk factors that were identified for assessing the classification of the suitability of land for planting crops, the process of inference rule generation was achieved.

Conclusion
This study developed a fuzzy logic-based model for the classification of the suitability of land for crop planting which was required for assessing the suitability of the assessed land for plating activity of various food and cash crops by farmers. 7 factors identified were annual rainfall, months of dry season, relative humidity, abundance of clay soil, abundance of sand soil, abundance of organic carbon and pH value of soil on land. 2 and 3 triangular membership functions were appropriate for the formulation of the linguistic variables of the factors using appropriate linguistic variables while the target suitability of land was formulated using four triangular membership functions for the linguistic variables unsuitable, fairly suitable, moderately suitable and highly suitable. 288 inferred rules were formulated using IF-THEN statements which adopted the values of the factors as antecedent and the suitability of land for planting crops as the consequent part of each rule.
This study recommends that the developed classification model for the suitability of land for plating crops requires additional associated factors that will improve the effectiveness of the classification. Also, the study recommends that information about the associated factors alongside the classification of land suitability from a structured database will provide a means for the development of a data-driven model using machine learning thus providing a model with more objectivity compared to possible bias provided by expert information.