An Application of Geostatistics to Analysis of Water Quality Parameters in Rivers and Streams in Niger State, Nigeria
Isah Audu1, Abdullahi Usman2
1Department of Mathematics & Statistics, School of Physical Sciences, Federal University of Technology, Minna, Nigeria
2Academic Planning Unit, Vice Chancellor’s Office, Federal University of Technology, Minna, Nigeria
To cite this article:
Isah Audu, Abdullahi Usman. An Application of Geostatistics to Analysis of Water Quality Parameters in Rivers and Streams in Niger State, Nigeria.American Journal of Theoretical and Applied Statistics.Vol.4, No. 5, 2015, pp. 373-388. doi: 10.11648/j.ajtas.20150405.18
Abstract: Assessment of surface water quality using multivariate statistical techniques does not incorporate the spatial locations of data into their defining computations. Information on spatial continuity of surface water concentrations can help in identifying the magnitude of contamination by runoff and anthropogenic pollutions. In the present study, spatial behavior of five (5) surface water quality parameters of some rivers/streams in Niger State of Nigeria was studied using R geostatistical package gstat, in conjunction with packages sp, rgdal, spatstat and maptools. The variograms and ordinary krigged spatial maps were generated for rainy and dry seasons. The characteristics of the best variable models; range; sill and nugget effects of each parameter were obtained. The variogram analysis indicated a high spatial coherence for E.co, Mg and TDS, whereas TCo and TH indicated a low spatial coherence. The nugget to sill ratios of experimental and linear fitted variogram models in all cases were less than 0.25 indicating that the rivers/streams water level has strong spatial coherence in both seasons. This result shows that linear model is the best for both seasons. Krigged spatial variability maps revealed that an average range of 48km variograms for dry season changes more rapidly than it does in rainy season with an average range of 4.3 km and R2 values of 0.80 to 0.92.
Keywords: Kriging, Predictions, Experimental Variogram, Nugget, Water Parameters
Niger State is underlain by sedimentary and basement complex rocks which have different capacities of retaining water all year round [23,24]. Niger State like the rest of Nigeria and other tropical lands has two seasons, the dry and rainy seasons. The rainy season is influenced by the south west wind or the tropical maritime air mass. This wind involves Nigeria between February and June, depending on the location. The dry season is accompanied by a dust laden air mass from the Sahara desert, locally known as the harmattan. During the rainy season, the whole area is often flooded with water while in dry season some of the rivers do dry up. This gives rise to difficulties in accessing adequate safe quality water supply. With the increase in population, the situation of scrambling for domestic water is aggravated. Most of the medium-sized towns have been encountering similar problem of lacking adequate quality water supply since 1980.  is of the view that access to portable water in Niger State has been on continuous decrease since 1980s. On the average, less than 20% of the inhabitants of the study area currently have access to portable water.
Water quality is the main factor controlling healthy and diseased states in both humans and animals. Surface-water sources may be extremely difficult to survey adequately, particularly in remote rural areas and where land-use patterns are changing rapidly. Not only may there be daily and seasonal changes in flow to consider but, in addition, variations in physical, chemical, and microbiological characteristics necessitate analysis throughout the year to take account of the effect of changes in rainfall patterns . Surface water quality is an essential component of the natural environment and a matter of serious concern today. The variations of water quality are essentially the combination of both anthropogenic and natural contributions. In general, the anthropogenic discharges constitute a constant source of pollution, whereas surface runoff is a seasonal phenomenon which is affected by climate within the water catchment basin . Among them, because of the intensive human activities, the anthropogenic inputs from a variety of sources are commonly the primary factors affecting the water quality of most rivers, lakes, estuaries, and seas, especially for those close to highly urbanized regions.
Research findings indeed reveal deteriorating surface and ground water quality in Nigeria, Uganda and India due to chemical and biological pollution and seasonal changes among others [11,17,25,28]. As water quality issues become more serious and widespread, the need for water quality monitoring as an important component of health promotion strategy in the developing countries cannot be overemphasized.
Recently, a considerable number of researchers have shown an increased interest in the use of multivariate statistical tools and geostatistical techniques to achieve a sustainable exploitation of water resources [2,4]. The combined use of multivariate statistics and geostatistical techniques provide the identification of possible sources that affect water environmental systems and offer a valuable tool for reliable management of water resources as well as rapid solution for pollution issues [16,30]. Moreover, multivariate statistical tools and geostatistical techniques also provide a way of handling large data sets in the environmental studies [5,29].  applied multivariate methods for assessment of variations in rivers/streams water quality in Niger State of Nigeria.  used multivariate statistical analysis for the assessment of water quality changes in a Karstic aquifer in the rainy (winter) and dry (summer) seasons.
Geostatistics have been applied in different fields of study such as water quality [1,19,21]. They applied ordinary kriging (OK) to determine the spatial distribution of water quality parameters in urban areas in Konya, Turkey.
Geostatistical analysis provides a series of statistical models and tools for spatial data exploration and surface generation of groundwater quality . Hence, the objective of the present study is to provide an overview for the most significant parameters identified in  such as (escherichia coli(E.co), magnesium(Mg), total coliform(TCo), total dissolved solid (TDS), and total hardness(TH)) and integrate the multivariate statistical analysis results to determine the spatial continuity of these river water quality parameters in the study area using geostatistical techniques.
2. Materials and Methods
2.1. Study Area
Niger State of Nigeria lies between Latitudes 80 20' N and 110 30' N and Longitudes 30 30' E and 70 20' E with twenty-five local government area councils. The state is endowed with some large rivers, but with no major water bodies. A sizeable amount of rainwater is lost through percolation to the ground while bulk of it flows as runoff into rivers and streams with some of it lost to the atmosphere by evapo-transpiration . There are two major categories of settlements in the state, urban and rural settlements. The samples were taken from rural settlements. The rural dwellers engage basically in agriculture. Two of the hydro-electric power stations in the state are located within the sampled locations.
Sixteen towns with the population between 5,000 and 20,000 people across the state were sampled as medium sized towns (see figure 1). Each sampled town was divided into wards and ten percent of the total wards were systematically sampled. Water samples were collected from rivers/streams of each of the sampled medium sized towns in both dry and rainy seasons. Seventeen (17) water quality parameters were monitored between December 2010 and November 2011 to cover the two seasons of the year. This was based on the guidelines of  procedure for analyzing the quality of surface water which involves information on chemical, physical and biological parameters. Five out of the seventeen water quality parameters that were most influential in  were used in this study.
2.3. Geostatistical Techniques
Geostatistics is a branch of statistics that specializes in the analysis and interpretation of any spatially (temporally) referenced data . It is based on observations that are similar within certain proximity that is, mutually correlated. It can also be said to be a collection of techniques and theories that can be used to generate sampling designs, build statistical models, make spatio-temporal predictions at unsampled locations, extract spatio-temporal patterns in the data and analyze the associated uncertainties . Its basic tool is variogram analysis which involves the study of the variogram function of a specific variable physical value or of water quality parameters under study. The variogram function, with its specific parameters (nugget value, threshold and correlation range), presents the behavior of the variable under study called the "regionalized variable" [9,12] and thus permits the formulation of conclusions concerning areas that are not represented by any measurement data.
Kriging is a means of spatial, temporal or spatio-temporal prediction and of estimating unknown local values of variables that are distributed in a space of one, two or three dimensions from more or less sparse data. It is an exact linear interpolating procedure. It is based on a spatial linear model for the data which specifies a parametric spatial mean function and spatial dependence structure . The basic assumption in Kriging is that the data comes from a stationary stochastic process and some methods require that the data be normally distributed. Kriging differs from other methods (such as IDW), in which the weight function is no longer arbitrary, being calculated from the parameters of the fitted semivariogram model under the conditions of unbiasedness and minimized estimation variance for the interpolation. The method of Kriging attempts to model the variability in the data as a function through the variogram . A data point estimated by Kriging will have exactly the same magnitude as the original observation. This is because in the estimation procedure Kriging weights each observation according to the distance and direction between that point and the point to be estimated or kriged. If the weights are equal, we have the classical estimate of the mean. The weights are distributed using any of the following methods; inverse of the square of the distances, the inverse of the distance, and the inverse of the number of values. It also uses the information from the semivariogram to find an optimal set of weights. They are chosen to minimize the Kriging variance or the square root of the Kriging error. In this sense, the estimates are optimal . Thus, Kriging is regarded as a best linear unbiased estimation (BLUE).
Kriging is divided into two distinct tasks: viz. quantifying the spatial structure of the data and producing a predicted surface. In order to predict an unknown value for a specific location, Kriging will use the fitted model from variogram, the spatial data configuration, and the values of the measured sample points around the prediction location . Because Kriging uses statistical models, it allows a variety of map outputs, including predictions, prediction standard errors, probability, and quantile maps. Today, a number of variants of Kriging are in general use, these are: Simple Kriging (SK), Ordinary Kriging (OK), Universal Kriging (UK), Block Kriging (BK), Co-Kriging (CK) and Disjunctive Kriging (DK). Among the various forms of Kriging, Ordinary Kriging (OK) has been used widely as a reliable estimation method .
2.3.2. Interpolation by Ordinary Kriging (OK)
OK is used to model the spatial variability of each of the five influential parameters and to perform their estimation in sampled locations. It is based on the concept of a variable that is both random and spatially autocorrelated . The predictions are based on the model:
where is the constant stationary function (global mean) and is the spatially correlated stochastic part of variation. The predictions are obtained using:
where is the vector of kriging weights , is the vector of n observations at primary locations.
The semivariogram is a convenient tool in geostatistics for the analysis of spatial dependence structure . It is based on simple measure of dissimilarity and is defined by:
where is the value of random variable at some sampled location and is the value of the location at distance .
In order to determine the spatial coherence of each of the parameters and to identify the best model variable mode, the variogram for each parameter was drawn through linear, spherical, exponential and Gaussian models using a relationship . The nugget to sill ratio as described by  was used to analyze the spatial structure. A variable is said to have strong spatial dependence if the ratio is less than 0.25, and has a moderate spatial dependence if the ratio is in between 0.25 and 0.75; otherwise the variable has weak spatial dependence.
2.3.3. Variogram Models
Because the kriging algorithm requires a positive definite model of spatial variability, the experimental variogram cannot be used directly. Instead, a model must be fitted to the data to approximately describe the spatial continuity of the data . Experimental variogram for Escherichia coli, Magnesium, Total Coliform, Total Dissolved Solid (TDS) and Total Hardness were calculate at a lag distance of 500m. Thereafter, the models of spatial variability were fitted to the experimental variogram by minimizing the sum of squares between the experimental values and those of the model. Some important models [7,8] are linear, spherical, exponential, and Gaussian models.
Where is the value of the semivariogram at the sill, h is the separation distance and equals .
In this study, OK is applied to each parameter data set using linear, spherical, exponential, and Gaussian models. This is used for spatial prediction of data values of the five water parameters.
2.4. Cross Validation
The semivariogram models were tested for each parameter data set. The quality of prediction performances were assessed by cross validation. Cross validation was conducted to assess the accuracy of the OK through some statistical measurements of the prediction error: the mean error (ME), the root-mean-square error (RMSE) and the root-mean-square standardized error (RMSSE) defined as follows:
where are estimated values, are actual observations, is the number of validation points and is the prediction standard error in location .
For a model that provides accurate predictions, the ME should be close to zero, the RMSE should be as small as possible (this is useful when comparing models), and the RMSSE should be close to one for good prediction .
2.5. R Geostatistics Packages
This study introduces the functionality of five (5) R geostatistics packages that were used to run the processing and display the results: gstat, sp, rgdal, spatstat and maptools. All these are available as open source or as freeware and no licenses are needed to use them. By combining the capabilities of the five packages, the study harnessed the best out of each package and optimized preparation, processing and the visualization of the spatial maps. In this case, gstat calculates sample (experimental) variograms; plots an experimental variogram with automatic detection of lag spacing and maximum distance; iteratively fits an experimental variogram; a generic function to make predictions by inverse distance interpolation, ordinary kriging and runs krige with cross-validation; package sp provides general purpose classes and methods for visualizing spatial data; rgdal produces map projections; spatstat used for various types of statistical and geostatistical analysis; and maptools used for getting shape files into R and converts some sp objects for use in spatstat.
3. Results and Discussion
According to , assessing water quality values using geostatistical techniques require a normal distribution of the parameter values under investigation. In this study, histogram and normal QQPlot analysis were applied to each water quality parameter and it was found that E. coli, Magnesium and TH parameters shows normal distribution. It was also found that Total Coliform and TDS parameters (see figures 7 and 8 under the appendix) exhibited non-normal distributions and therefore do not satisfy the basic assumption of normality which is a condition for geostatistical analysis. Logarithmic transformation was performed on Total Coliform and Total Hardness parameters to make them closer to normal distribution (see figures 9 and 10). The deviations from the straight line are minimal. After the transformation, Kolmogorov-Smirnov test was performed and the result shows that the histograms do not differ much and are normally distributed.
A total of 125 surface water samples were collected from 16 sampled medium sized towns during rainy and dry seasons. The descriptive statistics for both seasons can be seen in Table 3.1. From the results, the two seasons are almost identical. However, these two seasons are significantly different in ways that do not incorporate the spatial locations of data into their defining computations by the common descriptive statistics. The spatial distribution of E.coli, total coliform, magnesium, total hardness, and TDS concentrations developed from the cross validation process are given in figures 2 to 6, respectively.
The cross validation reports, that examined the validity of the fitting models and parameters of semivariograms for river water parameters are given in Table 3.2. For example, during rainy season and using E.coli and TCo parameters as an example, the best fit model for E.coli and TCo is the linear model with a 0.148 and 0.308 ME, respectively. Also, the experimental and fitted linear variogram models plot never level out, therefore, the linear model is considered the best. While in dry season, exponential model is the best fit for E.coli with an ME value of 0.100 and RMSS value of 0.620 whereas linear model fitted well for TCo with ME value of 0.303 and RMSS value of 0.683. This result shows that linear model is the best for both seasons.
After performing kriging cross-validation for different models for each water quality parameter, the prediction errors were calculated and models giving best results were determined. Table 3.3 shows the most suitable models and their prediction error values for each parameter.
The variograms for the OK are presented in Table 3.4. The parameters were obtained by using measurement error to estimate the nugget, global variance to estimate the sill and the mean distance to nearest neighbor to estimate the range. The fitted models have the following structure:
|Rainy Season||Dry Season|
|Model||Rainy Season||Model||Dry Season|
|Rainy Season||Dry Season|
Experimental variogram and fitted variogram models evaluation in Table 3.4 for rainy and dry season’s results indicate a high spatial coherence for magnesium and total hardness parameters, while E.coli and total coliform parameters indicate a medium coherence and TDS parameter indicate a low spatial coherence.
Results of semivariogram analysis are provided in Table 3.5. Linear model fitted best in rainy and dry seasons in all the parameters, except for magnesium. The nugget to sill ratios of linear model in all cases were less than 0.25 indicating that the river water level have strong spatial coherence in both seasons. The range is the distance within which the parameters are spatially correlated. The R2 values of 0.80 to 0.92 indicate that the variograms were chosen correctly and the predictions were accurate.
As with sanitary inspection, data on E.coli, total coliform, magnesium, total hardness and TDS water quality may usefully be divided into a number of categories; the levels of contamination associated with each category should be selected in the light of local circumstances. A typical classification scheme is presented in Table 3.6, based on increasing orders of magnitude of contamination .
|Rainy Season||Dry Season|
|Parameter||Best-Fit Model||Nugget||sill||range||Co/(Co+C)||R2||Best-Fit Model||Nugget||sill||range||Co/(Co+C)||R2|
|Count Per 100ml for E.coli & T.Coliform||Count Per mg/L for Mg, TH & TDS||Category & Color-code||Remark|
|0||0-1000||A (Yellow)||In conformity with WHO guidelines|
|1-10||1000-3000||B (Orange)||Low risk|
|10-1000||3000-10000||C (Red)||High risk|
Source: WHO Geneva 2011- Guidelines for Drinking-Water Quality 3rd ed.
3.1. Escherichia Coli
Table 3.1 indicates that the mean value of E.coli is 57.06 cfu/100ml in rainy season; it increases slightly to 58.91cfu/100ml in dry season. The spatial distribution of E.coli shows that some rivers did not meet the standard of zero tolerance indicated by . The continuous high E.coli concentration occurs within Northwest and city center.
3.2. Total Coliform
The presence of total coliform in surface water may indicate that the surface water has been affected by surface runoff and anthropogenic pollution. Based on , the total coliform must be 10 (100mL) to protect human from diseases, such as diarrhoea, nausea, vomiting, cramps or other gastrointestinal distress. Table 3.1 shows that the mean values of total coliform ranges from 112.22 and 114.36 100ml in the dry and rainy seasons, respectively. However, the total coliform range from 0 to 124 (100mL) in the study areas. The spatial distribution of total coliform shows high concentrations in both seasons and occurs within Northwest and the city center (see figure 3).
Higher concentration of magnesium makes the water unpalatable and act as laxative to human beings. Table 3.1 shows the mean concentrations of magnesium range between 3.33 mg/l and 3.46 mg/l in dry and rainy seasons respectively. In dry season, the maximum magnesium value reaches nearly 138.53 mg/l, which is considerably higher than the permissible limit of 50mg/l in . In rainy season, the maximum concentration of magnesium reaches 144.13mg/l. However, in both seasons, the mean concentrations are higher than the permissible limit of 50(mg/L). The content of magnesium increases from the Northwest to North and Northcentral to Northeast. Figure 4 shows the presence of high magnesium concentration in the three geopolitical zones of the state and in the two seasons.
3.4. Total Hardness
The presence of high calcium and magnesium level shows consistence of water hardness in such sources of water. From Table 3.1, the mean hardness for both seasons is lower than the  drinking water standard of 500 mg/l. The total hardness value of the river water ranges from 17.01mg/l to 167.14mg/l in rainy season and from 17.01mg/l to 156.64mg/l in dry season. Figure 5, shows that the value of water hardness concentration is the same as magnesium concentration.
3.5. Total Dissolved Solids (TDS)
 reported that high TDS values have the tendency to absorb heat from the sun thereby raising the temperature and increasing the turbidity of water. Table 3.1, the mean values of TDS are less than the  standard (500 mg/l) for both seasons. The TDS values range from 18.27 to 498.48 mg/l and from 16.08 to 496.49 mg/l for rainy and dry seasons, respectively. Since both seasons fall within 500 (mg/L) and 1,000 (mgl) they can be tolerated with little health effects. As indicated in figure 6, high concentrations occurred around the rivers in Northwest.
The water quality standard of the World Health Organization  was used as the basis for the surface water quality evaluation (Table 3.1). Ordinary Kriging (OK) was used to determine the spatial continuity of the river water quality parameters. Different semivariogram models namely; linear, spherical, exponential and gaussian were tested. The semivariogram parameters; nugget, sill, and range, with and were determined and the performance of each model was evaluated using cross-validation, which examines the accuracy of the generated surfaces. Thereafter, the models with smallest ME were selected. The spatial prediction maps of river water were calculated using ordinary kriging for both seasons using R software.
The descriptive statistics of the parameters shows that the mean concentrations of E.coli, and Total Coliform in both seasons are greater than permissible limit of 0 ml to 10 ml and is not in conformity with WHO  standards of drinking water quality. This means that there is presence of faeces contamination by animals, including birds. While Magnesium, Total Hardness and TDS mean values in both seasons meet the recommended limit of . The nugget to sill ratios of experimental and linear fitted variogram models in all cases were less than 0.25 indicating that the river water level has strong spatial coherence in both seasons and therefore, linear model fitted best. Spatial variability maps of surface water level indicated that the two seasons are almost identical. The maps show that water quality in dry season changes more rapidly than it does in rainy season.
The study only looked at five surface water parameters out of the several parameters. It is recommended that other parameters not covered in this study be further investigated. It is also recommended that robust variogram model be used to improve the predictions at unsampled locations.