Fuzzy Logic Model to Forecast Environmental Related Health Diseases in Nigeria

This paper identified the risk factors for environmental health related diseases and formulated a fuzzy logic based predictive model based on the identified variables. Related literatures were reviewed so as to understand the body of knowledge surrounding environmental health related diseases and their corresponding risk factors, interviews with community health officers were conducted in order to validate the identified variables. Fuzzy logic was used to formulate the predictive model using Matlab Fuzzy logic tool box. Data was collected from five different states in Nigeria. The result showed that there are cases of environmental related diseases in the areas where there is no potable water and in locations that lack good toilet facilities. In the areas where there is no toilet facility or where bucket and bush are used as toilet, there are always cases of cholera. In these areas during the rainy season cholera outbreaks are common occurrences. All these points to fact that, if there is a good environmental health tracking system with predictive features, then environmental health officers would be able to easily monitor, manage and track any area which may be prone to any of these environmental health diseases.


Introduction
Environmental health related diseases are diseases that occur as a result of the poor conditions of the environment. Globally, diseases have the potential for serious negative impact on the social and economic development of the people living with the diseases and the country where the people reside [1]. Epidemics which result from poor environmental conditions are common in developing nations, especially SSA nations [2]. The most prevalent among these environmental related diseases in the region include typhoid, cholera, malaria, diarrhoea and guinea worm. SSA region has the highest number of people living with these diseases worldwide [3].
According to Onwuliri [4], in developing nations, poor water supply, poor sanitation and hygiene tend to account for a large part of the burden of illness and death. About 4 billion cases of diarrhoea per year cause an estimated 2.2 million deaths with majority of causalities being children under the age of five [5]. In addition, diarrhoea also accounts for 4.3% of the total global disease burden and an estimated 88% of this burden is attributable to unsafe drinking water, inadequate sanitation and poor hygiene. Three hundred million people are estimated to suffer from malaria yearly [4].
In SSA region, Nigeria is the most populous country with estimated population of over 150 million [6]. The country is characterised with various health related problems ranging from epidemic diseases, environmental pollutions to other public health related problems. The existence of these deadly diseases is as a result of environmental health problems [7].
Generally, water, housing condition and health are interrelated in many ways. Consumption of contaminated water may result in water-borne diseases such as typhoid, cholera, dysentery, infective hepatitis and other diseases that cause diarrhoea. Scarcity of water may affect personal hygiene, and may influence the spread of skin and eye infections like scabies, conjunctivitis and trachoma. Water based diseases and water-related vector-borne diseases can result from the development of water supply projects such as dams and irrigation structures. Projects of this nature inadvertently provide habitats for mosquitoes and snails that are intermediate hosts of parasites that cause malaria, schistosomiasis, lymphatic filariasis, onchocerciasis and Japanese encephalitis [8].
Environmental health diseases like cholera, typhoid, meningococcal, malaria, and measles are claiming a lot of life [9]. Almost 9 percent of children die at birth and over 13 percent die before the age of 5 due to different childhood environment related diseases [10]. According to World Health Organisation at least three million children die before their fifth birthday due to environment related diseases [11]. Within the first forty days of year 2009 alone, meningococcal disease killed over 100 people in 19 states out of the 36 states of the country and many local government areas had a lot of reported cases of the same disease [12].
In SSA region, almost all the countries still face the major traditional environmental hazards. These include unsafe water, lack of sanitation, contaminated food, and indoor air pollution among other things. Most rural dwellers in Nigeria get their drinking water directly from rivers, streams or ponds. In most cities and urban areas, the safety of drinking water is still an issue. In Lagos, one of the biggest cities in Nigeria, very few people have direct access to potable water. In other major cities in the country such as Ibadan, Port Harcourt and Kano most houses do not have basic amenities such as toilet facilities, bathroom, and potable water among other things. These are the major causes of environmental health related problems such as waterborne diseases to residents [13].
Clean water, and proper sanitation are the keys to good health and eradication of diseases in SSA region most especially Nigeria [14]. If a proper surveillance system can be in place, which can be used to monitor clean and safe drinking water and general sanitation, the money expended on environment related diseases will greatly reduce.
In addition, most of the environmental related diseases in Nigeria, which are caused by bad housing condition, eating or drinking contaminated water or foods cannot be prevented by vaccines but through the use of efficient and effective surveillance or tracking system [13].
Information Technology (IT) has been identified as an important tool to effectively address health related problems. The effect of its usage include reduced healthcare cost, improved healthcare delivery, increase in effective management of diseases and improved decision support within the health sector [15,16]. The advent of IT has changed almost everything in life.
In order to address this problem, environmental health tracking systems, using such like data mining and machine learning have become more and more essential assisting the community health officers to take correct decisions. There is a need for the development of a predictive model for environmental health related diseases using fuzzy logic for the early detection of the environmental health related disease for providing decision support and improving the living standard of Nigerians.

Related Works
Idowu et al [17] addressed the problem of environmental health monitoring system facing Nigeria as a whole. Environment and the factors that are associated with it are the root causes of many epidemic diseases both in the developed and developing nations. In Nigeria, environmental health problems arise from population pressure on housing, poor environmental sanitation, coupled with lack of safe drink water and basic housing facilities. Despite the deplorable state of environmental health (lack of clean and safe drinking water, bad housing condition, and so on), there is no reliable and timely means of surveillance or any monitoring system. The result of this research makes it possible for environmental health workers to capture environmental health situation of any house in Nigeria real time while on the field. In conclusion, this paper presents result of a research which developed a web based environmental health tracking system for Nigeria.
Idowu [18] developed a predictive model for the classification of the risk of hypertension among Nigerians using decision trees algorithms based on historical information elicited about the risk of hypertension among selected respondents in Nigeria. The predictive model was simulated using the Waikato Environment for Knowledge Analysis (WEKA) using the 10-fold cross validation technique for model training and testing. The results revealed that the decision trees algorithms selected some risk factors among those identified as most predictive for the risk of hypertension based on the information inferred from the dataset collected. The variables identified by the algorithms can help assist cardiologists concentrate on a smaller yet important set of risk factors for identifying the risk of hypertension using rules derived from the path along the decision trees based on the value of the risk factors of the individual.
Egejuru et al [19] developed a model to forecast the risk of osteoporosis using supervised machine learning algorithm. The study identified the variables that were monitored by experts in determining osteoporosis risk, formulated and simulated the predictive model. The predictive model for osteoporosis risk was formulated using two (2) supervised machine learning algorithms, namely Naïve Bayes' (NB) classifier and the Multi-layer Perceptron (MLP) based on the identified risk factors. The results of the identification and data collection showed that there were 20 risk factors identified including the CD4 count level stratified as low, moderate and high risk based on information collected from 45 patients in Nigerian hospitals. The results of the model validation using the 10-fold cross validation revealed that the MLP had the best performance with a value of 100% over the accuracy of NB with a value of 71.4%. The result further showed that the performance of the MLP over the NB was influenced by the ability of the complex nature of the perceptron network to model the problem of identifying the risk of osteoporosis from the values of the risk factors presented in the training dataset.
Mhambe et al [20] identified the risk factors for mental illness and formulated a predictive model based on the identified variables. Naïve Bayes' and the Decision Trees' Classifiers were used to formulate the predictive model for the risk of mental illness based on the identified and validated variables using the WEKA software. Data was collected from 30 patients with an almost equal distribution of no, low, moderate and high risk of mental illness cases. The results showed that there were three classes of risk factors associated with mental illness, namely: biological factors, psychological factors and environmental factors. The results further showed that the formulation with Decision Trees Classifiers revealed the most relevant variables for the risks of mental illness such as losing anyone close. C4.5 decision trees algorithm with an accuracy of 83.3% outperformed the Naïve Bayes' algorithm which had an accuracy of 76.7%. The study concluded that the variables identified by the C4.5 Decision Trees algorithm can assist mental health experts to apply the rules deduced by the algorithm for the early detection of mental illness.
Most of the existing models are not on environmental health related diseases which is the focus of this paper and this make this paper to be quite different from all the existing model on diseases prediction.

Methods
In order to develop the predictive model for environmental health related diseases in Nigeria. Environmental health and sanitation data Environmental health and sanitation data were collected from some selected states in Nigeria. The descriptive statistics of environmental health and sanitation data are showed in tables 1 and 2. In order to model the environmental health based diseases using fuzzy logic approach, there are two inputs and two outputs to the controller and the identified variables are: Water (WT), Toilet Facility (TF), Refuse Facility (RF), and General Sanitation (GS) while the environmental based diseases variables which are cholera (CL) and malaria (ML) are the output of the controller. Tables 3, 4, 5, 6, and 7 show the fuzzification of the inputs and outputs.
Tables 1 and 2 present the descriptive statistics of the environmental health and sanitation condition of the urban and rural study area. It is glaring from the data that rural area of the south western Nigeria is prone to environmental health diseases based on the environmental health data. Also, in urban areas, drinkable water is major problem though is better than that of rural areas. The model to predict the likelihood of environmental related diseases based on environmental health problems was formulated using the Matlab Fuzzy Logic Toolbox.   Tables 3, 4, 5, 6, and 7 present the fuzzification of the input and the output data of environmental based diseases.

Results
Fuzzification is the first process in modelling a fuzzy logic system. The first step in the modelling of the controller is data fragmentation (fuzzification) into input that can be accepted by fuzzy logic. The fuzzification converts each units of input data to a degree of membership by a call on some membership function in the Matlab fuzzy logic toolbox. In the process of fuzzification, each input data is mapped with the conditions of the rule to establish the degree of fitness on how each rule matches the particular input.

Rule Production
At this stage, the truth value of each rule is computed, and then applied to the corresponding part of each rule. The rule based system for the controller uses four input variables and two output variables as the conditions and the conclusion of the rules. The inputs variables for the controller are Water (WT), Toilet Facility (TF), Refuse Facility (RF), and General Sanitation (GS), while the two output are cholera (CL) and malaria (ML) as environmental health based diseases.
The rule format for the controller is of the form "if…then" format. There are 16 rules in the knowledge base of the controller for the environmental health diseases. The "if the" rule statement is used to formulate the conditional statements that consists the knowledge base. It assumes the form "if α is β then γ is 1" The "if part is called the premise whereas the "then" part is called the consequence. The rule base of this system makes use of forward chaining system. The forward chaining system processes the initial fact first, the rules are used to draw conclusion based on the processed data. The forward chaining system is said to be data driven. Samples of the applied rules are shown below.

Fuzzification Process of the Variables
Fuzzification requires two main stages; derivation of the membership functions for both input and output variables and the linguistic representation of these functions. Different types of membership functions can be applied such as triangular, trapezoidal, bell shaped, Gaussian, etc for fuzzification. Triangular or trapezoidal waveforms could be applied to the systems which has large variation of data. Gaussian or sigmoidal waveforms could be applied to the more sensitive systems that need high control accuracy. Thus, triangular and trapezoidal waveforms were applied for the fuzzy logic model of the research.

Fuzzification Process of the Input Variables
The variable DW is divided into one triangular and two trapezoidal fuzzy subsets. Figure 1 shows the fuzzification of the input variable DW. As seen in figure 1, the data clusters have centers around 0.05, 0.5 and 0.95. Thus, three fuzzy subsets are defined for the variable DW.  The variable TF is divided into one triangular and two trapezoidal fuzzy subsets. Figure 2 shows the fuzzification of the input variable TF. As seen in figure 2, the data clusters have centers around 0.05, 0. 5  The variable GS is divided into one triangular and two trapezoidal fuzzy subsets. Figure 3 shows the fuzzification of the input variable GS. As seen in figure 3, the data clusters have centers around 0.05, 0.5 and 0.95. Thus, three fuzzy subsets are defined for the variable GS.
The variable RF is divided into one triangular and two trapezoidal fuzzy subsets. Figure 4 shows the fuzzification of the input variable RF. As seen in figure 4, the data clusters have centers around 0.05, 0.5 and 0.95. Thus, three fuzzy subsets are defined for the variable RF. Mathematical

Fuzzification Process of the Output Variables
The variable CL is divided into one triangular and two trapezoidal fuzzy subsets. Figure 5 shows the fuzzification of the input variable CL. As seen in figure 5, the data clusters have centers around 0.05, 0.5 and 0.95. Thus, three fuzzy subsets are defined for the output variable CL. Subsets CN represents the decision of the first level which means that cholera is not expected, CP represents second level which means that cholera may likely occur and CE represents the The variable ML is divided into one triangular and two trapezoidal fuzzy subsets. Figure 6 shows the fuzzification of the input variable ML. As seen in figure 6, the data clusters have centers around 0.05, 0.5 and 0.95. Thus, three fuzzy subsets are defined for the output variable ML.  Subsets MN represents the decision of the first level which implies that cholera is not expected, MP represents second level which means that malaria may likely occur and ME represents the third level which mean cholera is expected.
Mathematical Expressions of the Variable ML.

Defuzzification Process
Defuzzification Process is the process of converting each aggregated fuzzy output into a single crisp value through the developed fuzzy rules. CoG defuzzification method is applied for the model. Following equation is the mathematical expression of the CoG defuzzification method for the discrete fuzzy systems.
Each fuzzy rule gives a single number that represents the truth value of that rule. The input for the implication process is a single number given by the antecedent, and the output is a fuzzy set.
where y* is the output of one set of input variables    Figure 7. Each of input data was entered to the FIS and output results were taken.
The four inputs and two output have input-output mapping as shown in Figure 9 and Figure 10. This is the mesh plot of the relationship between the four inputs WT, TF, RF, & GS and the two outputs CL and ML. The plot comes out from the rule base with 8 rules each and the surface of the plot is more or less bumpy. The conical surface of the plot is due to the triangular shape of the inputs. The vertical shape from the origin shows the sensitivity of the controller to change in any of the inputs. This is an advantage because the controller can easily respond or peak a little change of the inputs.
The input values for cholera from the rule viewer of the fuzzy logic were 0.5, and 0.5 for water, and toilet facility respectively suggested for middle linguistic categories of each variable. 0.5 value shows that water and toilet facility are fairly good and the output also shows 0.5 which implies that there is probability of occurrence of cholera.  Also, if the values for water is 1 and value for toilet facility is 1, then the probability for the occurrence of cholera is 0.11 and this outcome does not give an expected value of probability of zero (0), because fuzzy logic is highly abstract and employs heuristic requiring human experts to discover rules about data relationship (Obi and Imainvan 2011) though the probability of 0.11 is very close to probability of 0 but the expected value would have been probability of 0.
In case of malaria, which is similar to that of cholera too, if the input value is 0.35 each for the two input variables, the output is expected to be around 0.35 but the output is 0.57. This output does not give an expected value for the inputs also due to that fact that fuzzy logic is highly abstract and employs heuristic rules about data relationship which is the same case with cholera.

Discussion
In this paper, it was assumed that If probability of Environmental health diseases (Ehd) is 1.000, it implies that environmental related diseases will occur. If is 0.500 it implies that it may not occur and if is 0.000 it implies that it will not occur. The probabilities of cholera from the rule viewer of the fuzzy logic were 0.500, and 0.500 for water and toilet facility respectively. This was suggested for middle linguistic categories of each variable. 0.500 value shows that water and toilet facility are fairly good and the output also shows 0.500, which implies cholera may likely occur.
If the probability of water is also 1.000 and probability of toilet facility is 1.000, then the probability for the occurrence of cholera is 0.13. This outcome does not give an expected value of probability of 0.000. Although the probability of 0.130 is very close to the probability of 0.000, but the expected value would have been the probability of 0.000. In case of malaria, which is similar to that of cholera, if the probability input value is 0.250 each for the two input variables, the probability of output is expected to be around 0.250 but the output probability is 0.560. This output does not give an expected value for the inputs. This is also due to the same fact as it is the case of cholera.
From the data gathered, it shows that most houses in the study area do not have good sanitation and almost all the houses do not have refuse disposal facility. Most of the environments are littered with waste (animal, food, and so on) due to the fact that there is no proper refuse disposal facility. The study area are characterised with people living with malaria. From the personal interview conducted in the study area, most people are diagnosed for malaria each time they visit the hospital. Some of them do not bother to go to the hospital, each time they feel uncomfortable. Instead they patronise any chemist or go to a nearby medicine store to purchase malaria drug. Some of them claimed not to visit the hospital or chemist but make use of some locally made herbal malaria medicine.
In addition, there are cases of environmental related diseases in the areas where there is no potable water and in locations that lack good toilet facilities. In the areas where there is no toilet facility or where bucket and bush are used as toilet, there are always cases of cholera. In these areas during the rainy season cholera outbreaks are common occurrences. All these points to fact that, if there is a good environmental health tracking system with predictive features, then environmental health officers would be able to easily monitor, manage and track any area which may be prone to any of these environmental health diseases.

Conclusion
The predictive model was formulated using MATLAB fuzzy logic tool box. In order to formulate the model, there were four inputs to the controller. The identified variables were water, toilet facility, refuse disposal facility and drainage system while the environmental based diseases variables such as cholera and malaria were the outputs of the controller. The fuzzy logic based predictive model generated can be incorporate into the environmental health tracking system. The system will assist the users to view, if there is no probability of occurrence of environmental health diseases or not.