Determination of an Optimal Smoothing Technique for Maternal Health Care Statistics (A Case Study of Nakuru County 2012-2016)

One of the Big four agenda is Universal health. This study focused on maternal health. The main aim of maternal health is usually to reduce maternal deaths. One way in aiding to reduce maternal deaths is to forecast maternal deaths using various statistical smoothing techniques. This would enable better future planning for example increase in health facilities. Shapiro-Wilk Normality Test confirmed that there was clear observable difference between the normal distribution and the data. The study hence focused on non-parametric regression methods which include Kernel and Cubic spline smoothing techniques which were applied on maternal health care data. The technique that best dealt with this type of data was identified and used to focus maternal deaths. Selecting an appropriate technique was important to achieve a good forecasting performance. The performance of the two smoothing technique was compared using MSE, MAE and RMSE and the best model identified. In both methods we have smoothing parameters. Selecting smoothing parameter goal is usually to base it on the data. According to the results obtained in the study, it is concluded that Cubic spline smoothing technique which has a lower MSE, MAE and RMSE is better than Kernel based smoothing technique. The statistical software that was used for the analysis was R. The study used maternal health care statistics data for Nakuru County.


Introduction
Smoothing is a type of data handling technique. In statistics, when data is smoothed, an approximation function is created that usually attempt to capture important patterns in the data. Typically, data smoothing is done to remove noise from a given data set. Data smoothing can be employed to enable better prediction. Maternal health includes the dimensions of health care in family planning, preconception, prenatal and postnatal treatment to reduce maternal morbidity and mortality [1]. Some elementary data smoothing techniques such as visual inter-polation, averaging and mathematical interpolation emerged during the eighteenth century [2]. At that time, smoothing techniques consisted of simple interpolation of the data and eventually evolved into more complex modern methods such as Cubic splines smoothing technique. Kernel density estimation has been previously discussed utterly, giving details about assumptions on the kernel weight, properties of the estimator such as bias and variance [3]. Spline smoothing in some sense corresponds approximately to bandwidth smoothing by a kernel method depending on the local design point density [4]. Consideration of kernel smoothing methods shows that there are desirable properties in how the effective local bandwidth acts in spline smoothing. The smoothing parameter's value for a curve fitting can be chosen by minimizing the expected prediction error [5]. To address the extreme handling of data, smoothing techniques should be used to achieve an accurate result in making predictions. Two smoothing techniques applied on maternal health data is explored and the technique that best deals with this type of data is identified. Usually there are many smoothing techniques available, and selecting the appropriate technique is an important issue to achieve a good forecasting performance. The Kernel based and Cubic splines smoothing methods are studied. Some error measures for example, mean absolute deviation, root mean squared error, and mean square error are calculated for above smoothing techniques to identify the best method.

Literature Review
In Kernel Smoothing Technique, Cross-validation with Kullback-Leibler loss function has been employed to the choosing of a smoothing parameter in the kernel method of density estimation [6]. A framework for this problem is constructed and used to derive an alternative method of cross-validation, based on integrated squared error, recently also proposed [7]. Hall has established the consistency and asymptotic optimality of the new method [8]. Kernel density plays a very important function in statistics. A wrong choice of bandwidth may lead to a bad estimate [9]. Kernel density estimation disadvantage is that it is normally unbiased. Cross-validation is a very good method when it comes to selecting of parameter. For regression function estimation of the Epanechnikov kernel which has the property of optional kernel and which is commonly used in practice is preferred [10]. For the nonparametric estimation of regression functions with a one-dimensional design parameter, a new kernel estimate is defined [11].
Cubic Spline Smoothing Technique; Non-parametric regression using cubic splines is an attractive, flexible and widely applicable approach to curve estimation, [3]. Although the basic idea was formulated many years ago, the method is not as widely known or adopted as perhaps it should be. To a cubic spline operator for smoothing there is normally a given operator of kernel. A Connection between Cubic and Kernel smoothing has been established theoretically and related approximations to a Green's function have also been established [12,13]. The fundamental ideas behind cubic spline is in engineer's tool used to draw smooth curves through a number of points. This consists of weights attached to flat surface at the points to be connected. A flexible strip is then put across each of these weights, resulting in a pleasingly smooth curves. The mathematical spline is similar in principle [14]. The points in this case are numerical data. Stupp introduces a general method for proving uniformity in kernel type function estimators bandwidth consistency [15].

Kernel Smoothing
In smoothing, the data points of a signal are modified so individual points (presumably due to noise) are decreased. A Kernel smoothing refers to a general class of techniques for nonparametric estimation of functions. Closer points are given higher weights. This technique is so useful in the visualization of data. The simplicity of kernel estimators entails mathematical tractability, so one can delve deeply into the properties of these estimators without highly sophisticated mathematics. In summary, kernel smoothing provides simple, reliable and useful answers to a wide range of important problems. The main feature of kernel smoothing is data speaks for itself, meaning that data decides which function best fits. Kernel smoothing also provides a simple way of finding structure in data sets without the imposition of a parametric model. Kernel smoothing has a low standard error and works well for small or large samples.
A kernel smoother generally defines a set of weights for every x and defines = ∑ In this general definition, most smoothers can be considered kernel smoothers. In action, kernel smoothers have a simple approach to describe the sequence of weights by describing the weight function shape through a density function with a scale parameter that changes the size and the form of the weights that are near x. This shape function is usually referred as a kernel K. The kernel is a continuous, bounded, and symmetrical real function K that integrates into one: The weight sequence is therefore defined by; For any given scale parameter h we notice that ∑ = 1 . The kernel smoother is then defined as before for any x by = ∑ . Usually, kernel smoother defines weights that smoothly decrease as you move away from the target point.

The Bandwidth "
Bandwidth controls the smoothness or roughness of a density estimate. Choosing the bandwidth value h is more important than choosing the kernel density function. Popular kernels used for smoothing include parabolic (Epanechnikov), Tricube and Gaussian kernels.
Some particular cases of kernel smoothers are:

Nearest Neighbor Smoother
The main idea here is that for each and every points $ % , we take 8 nearest of neighbors and then we estimate the given value of $ % ). As before ℎ : $ % = ;<$ % − $ |:| <; , where $ |:| is the 8 > closest to $ % neighbor and

Cubic Spline Smoothing Technique
Let $ , : B = 1, … , L be a set of observations, based on a relationship = $ + N where the N are independent, zero mean random variables (normally taken to have constant variance). The estimate of the cubic smoothing spline is known as the minimizer (over the twice-differentiable function class) ∑ − 5 + ⋋ PP Q 5 (7) ⋋≥ 0 usually regulates trading-off among data fineness and roughness of the function estimate. This is often calculated by cross-validation or restricted marginal likelihood (REML) which takes advantage of the relation between spline smoothing and Bayesian estimation. Many methods to select parameters of smoothing such as cross validation(CV), generalized maximum likelihood (GML), generalized cross-validation (GCV) and unbiased risk (UBR) have been established under the assumption of observations being independent.

Derivation of the Cubic Smoothing Spline
We can fit a smoothing spline in two steps: of the values which are fitted, the spline criterion of a sum squares is normally fixed. It only remains to minimize PP Q 5 . This spline interpolation is a linear function given as: where are a spline-based function set. The roughness penalty therefore has the form Where the A elements are The basic functions, and therefore the matrix A, depend on the predictor variables configuration, but not on the or 8 S responses. Return to the first step. The penalized sumof-squares is given as; Parameter ⋋ governs the relationship between fitness and smoothness of the estimate and is normally called the smoothing parameter. Given h < @ < @ 5 < ⋯ < @ < j, an operator k is a cubic if: 1. On one interval h, @ , @ , @ 5 , … , @ , j , k is a cubic polynomial. 2. The polynomial pieces fit together at points @ (called knots) such that k itself and its first and second derivative are continuous at each @ and hence on the whole[h, j].

Choosing the Smoothing Parameter
There are two different philosophical approaches: Subjective choice; Automatic method -chosen by data -(Cross-validation; Generalized cross-validation)

Cross Validation
Used to select value of ⋋ min o pq r = ∑ s − k t 3 @ ; r v if k t is the spline smoother with α

Comparison of Models Performance
Three different prediction criteria are used to compare the performance of cubic spline and kernel model. These include Mean Square Error (MSE), Mean Absolute Deviation (MAD) and Root Mean Squared Error (RMSE). These criteria are defined as follows:

Data Description and Results
The study used maternal health care statistics data for Nakuru County. The variables for this data included:  The P-value is very low meaning that if the data was normally distributed, there would be very little chance of seeing the same sample from such a distribution. Clearly the P-Values < 0.05 suggesting strong evidence of non-normality and so non-parametric test should be used. Shapiro-Wilk normality test was used in conjunction with both Q-Q plot and Histogram to check normality of the data.

Test for Normality
Normal Q-Q plot Q-Q plot is an alternative graphical method of assessing normality to the histogram and is easier to use when there are small sample sizes. The plot helps us to understand the data set's distribution. The scatter normally compares the data to a standard normal distribution. The scatter is not as close to the line with clear pattern coming off the line. By the heavy tail, the data is obviously not normal. The data is therefore not assumed to be distributed normally.
Histogram; Plotting the variable of interest histogram provides an indication of the distribution shape. The histogram is smoothed by a density curve, and is usually added to the graph. The histogram shows the data is clearly skewed so no parametric test should be carried out using this data.

Fitting Cubic Splines to the Data
A smoothing parameter is required to be chosen in cubic splines. Smoothing parameter controls the trade-off between data accuracy and portion estimate roughness. In this study, cross validation was used to choose the smoothing parameter.  Figure 3. Diagnostic plots.

Explanation of the Diagnostic Plots
Plot (a) is relatively shapeless without clear pattern of the data, no obvious outliers and it is generally symmetrically distributed around the 0 line without particularly large residuals. The residuals get larger as we move left to right and there are a few potential outliers. Hence the assumption whether the relationship between the variables of this study being linear is true. Plot (b) shows if residuals are normally distributed. We see that these residuals are not lined up well on the straight dashed line. Hence since the points do not form a line that is roughly straight, the assumption of the dependent variable being normally distributed is not true. In Plot (c) Scale-Location is also called Spread-Location. In the plot, horizontal line with randomly spread points is noticed. Hence the assumption of homoscedastic is true. In plot (d) outlying values at the upper right corner or at the lower right corner are checked if they exist. This plot helps us to find influential cases if any. Not all outliers are influential. In our case there is no influential case or cases. The red dashed line are not so much seen because all cases are well inside the red-dashed line.

Fitting Kernel Smoothing Technique to the Data
The kernel-based smoothing technique has been fitted to the data given in this section. Data speak for themselves is the main feature of kernel smoothing, which means that the data determines what function fits best. Kernel smoothing also provides a simple way of finding structure in data without the imposition of parametric model. Bandwidth or the smoothing parameter was chosen by cross validation. In this study Gaussian kernel, one of the most commonly used kernels was used.  From the first plot, we see that when TSD increased, the TMD increased. This is contrary to our expectation. There might be another factor contributing to increase in TMD even with an increased TSD. From the second plot, we see that when NAC were 3000, TMD decreased up to when it was 4000 but after that the number of TMD increased. This may be due to constant number of caregivers. Other people might not have been attended to due to large number and hence the cause of increased TMD. From the third plot, we see that as the number of PWCV increased, TMD was decreasing. This is true since those who were finishing all the four antenatal visits were taking into consideration all the precautions during pregnancy.

Forecasting Monthly Maternal Deaths Using the Best Smoothing Technique
Cubic Spline smoothing technique was used to forecast monthly maternal deaths.

Explanation
Lo95 and Hi95 are the minimum and maximum boundaries of about 95 percent, respectively. That means we are 95 percent confident that the true population forecast is somewhere between our Lo95 and Hi95. Since we are concerned with forecast values, this would in effect be a case of an interval of expectation rather than an interval of confidence. The interval reflects the set of possible values that we plan to find at some future point in time for a prediction interval. In the table above, the 95% forecast interval informs us that there is a 95% chance that the future observation value will fall somewhere between the minimum and maximum limits. More precisely, a 95% prediction interval informs us that if we calculate 95% prediction intervals across repeated samples, the future observation value should fall on about 95% of these samples within the minimum and maximum bound.
From the graph below we see that Total Maternal Deaths will be increasing.

Conclusion
In non-parametric regression methods, detailed results can be found between explanatory and response variables. In this study, the main objective was to determine the best smoothing technique for maternal health care data. The specific objectives were; to fit Cubic splines and Kernel smoothing techniques to the data; to compare the performances of Cubic splines and Kernel smoothing technique and lastly to forecast maternal deaths using the best smoothing technique. Shapiro-Wilk Normality test confirmed that there was clear observable difference between the normal distribution and the data. Cubic splines model and Kernel model was fitted to the data. Three different prediction criteria were employed to enable comparison of the performance of these models. These are MSE, RMSE and MAE which are chosen as a measure of fit since Cubic spline and Kernel based model are estimated by using the sum of squared error. In these prediction criteria, the smaller the value the better the model. Lower values indicates a better fit. Cubic spline model indicated a good performance since it had a lower MAE, RMSE and MSE as compared to Kernel based model. Cubic spline gave a good forecasting performance which would enable future planning for example increase of skilled personnel, increase of health facilities for the ultimate goal of maternal deaths reduction.