Statistical Modelling and Evaluation of Determinants of Child Mortality in Nyanza, Kenya

: One of the Millennium Development Goals is the reduction of infant and child mortality by two-thirds of 1990 mortality levels by 2015. Generally, significant progress has been made in reducing mortality in children under five years of age. The global under-five mortality rate declined by 59 per cent, from 93 deaths per 1,000 live births in 1990 to 38 in 2019. In Kenya, the infant mortality rate in 2021 is 32.913 deaths per 1000 live births, a 3.36 per cent decline from 2020. In 2020 it was 34.056 deaths per 1,000 live births, a drop of 3.24 per cent from the year 2019. In Kenya, Nyanza Province has the highest infant mortality rate (133 deaths per 1,000 live births) and the lowest in Central Province (44 deaths per 1,000 live births). Despite this advancement, the world still needs to achieve that Millennium Development Goal, target number four, of reducing child mortality. This study aims at identifying vital risk factors affecting child mortality in Kenya. The paper's main objective is to determine the effect of socioeconomic and demographic variables on child mortality in the presence of dependencies in clusters. We then did a logistic regression and tested the proportionality of the significant covariates. Then, performed a Stratified Cox regression model and, finally, a shared frailty model in survival analysis based on data from the Kenya Demographic and Health Survey (KDHS 2014), which was collected using questionnaires. Child mortality from the KDHS 2014 data was analyzed in an ageing period: mortality from the age of 12 months to the age of 60 months, referred to as "child mortality". The study reveals that clusters (households), maternal age at birth, preceding birth interval length and the number of births in the last five years significantly impacted child mortality.


Introduction
Child mortality refers to the death of children under the age of five.The under-5 mortality rate (U5MR), the probability of dying before five years of age (per 1000 live births), is a key global indicator of child health [1] and one of the most important measures of global health [2].
Child mortality is essential to child health and overall national development [3].It also reflects a country's socioeconomic development and quality of life and is used for monitoring and evaluating population, health programs and policies.In the past few decades, there has been a decline in under-five mortality in almost all countries, regardless of initial levels, socioeconomic circumstances and development strategies [4].
A high mortality rate generally signifies unmet human health needs in education, medical care, nutrition and sanitation.The desire to understand the determinants of Under 5 Child mortality (U5CM) poses a very important aspect of research.Intents to reduce under-5 mortality to at least as low as 25 deaths per 1000 live births in all countries by 2030 were previously targeted in the fourth Millennium Development Goal (MDG).Today, it appears in the third Sustainable Development Goal (SDG3) [5].
The Demographic and Health Surveys (DHS) program has been very instrumental in obtaining and disseminating accurate, nationally representative data on family planning, fertility, and maternal and child health, among other health issues.The most recent DHS survey conducted in Kenya was KDHS 2014 [6].

Ethical Approval
The study did not need any approvals because it was secondary data.

Logistic Regression Model
It is a statistical analysis (also known as logit model) frequently used in modeling and for predictive analytics.In our analytical approach, our dependent variable is categorical: (binary response).It is therefore used to understand the relationship between the dependent variable and one or more independent variables by estimating probabilities using a logistic regression equation.
The logistic regression model is given by the following equation: log{P(x) /(1-P(x)} = logit {P (x)}= β 0 +β 1 X 1 (1) where P (x) is the probability that the dependent variable equals a case (child dies), β 0 , intercept from the linear regression, β 1 X 1 regression coefficient multiplied by some value of predictor.The statistical significance of the association can be tested using the Wald statistic, given by β ˆ /SE(β ˆ ) 2 ∼ χ 2 , where β ˆ is the maximum likelihood estimator (MLE) of β i and SE β ˆ is its associated standard error.

Kaplan-Meier Curves
Testing for Proportionality Among Covariates Testing hypotheses: Ho: They (covariates) are proportional.Hi: They (covariates) are non-proportional.We used Schoenfeld test to test for the proportionality of hazards which is a key assumption.

Stratified Cox Regression Model
The survival model used in many fields is the Cox model or Cox proportional hazard (Cox PH) model.Cox [11] and Cox Oakes [12] developed the model to predict the hazard rate of an object with covariates at risk.These covariates potentially influence survival-time (time-to-event), that is, until a child dies.The Cox model assumes that the risk level of an individual is proportional at all times, known as Cox proportional hazard [13].Therefore, the risk comparison in the Cox model is assumed to be constant and independent with respect to (w.r.t.) time.
We propose to use a stratified Cox regression model (with and without interaction) and extended Cox regression models to take care of non-proportional hazards [14].The stratified Cox regression, which is a modification of the Cox regression model, works by not including covariates that do not satisfy the proportional hazards assumption in the model.The interaction and no-interaction models are defined in the context of the stratified Cox regression model.
No -interaction model Let k covariates fail to satisfy the proportional hazards assumption, and p covariates satisfy the proportional hazards assumption.The covariates not satisfying the proportional hazards assumption denoted by Z 1 , Z 2 ,..., Z k and covariates satisfying the proportional indicated by X 1 , X 2 ,..., X p .To form the stratified Cox regression model, a new variable is defined from z and denoted by z * .The stratification variable z * has k * categories, where k * is the total number of combinations (strata) formed after categorizing each of z's.
where the subscript g represents the strata.The strata are the different categorizations of the stratum variable.The variable z is not implicitly included in the model, whereas the x's which are assumed to satisfy the proportional hazards assumption are included in the model.The baseline hazard function, h 0g (t), is different for each stratum.Since the coefficients of the x 's is the same for each stratum, the hazard ratios are same for each stratum.To obtain estimates of the regression coefficients, a likelihood function L that is obtained by multiplying together the likelihood functions for each stratum is maximized.
The general form of the Cox regression model is where h 0 = baseline hazard function; β =a vector of coefficient of the predictors; X=a vector of covariates; Model analysis and deviance.
A test of the overall statistical significance of the model is given under the "model analysis" option.Here the likelihood chi-square statistic is calculated by comparing the deviance (-2 * log-likelihood) of the model, with all of the covariates being specified, against the model with all covariates dropped.The contribution of covariates to the model can be assessed from the significance test given with each coefficient in the main output; this assumes a reasonably large sample size.

Frailty Model
The frailty approach is a statistical modelling concept which aims to account for heterogeneity caused by unmeasured covariates.In statistical terms, a frailty model is a random effect model for time-to-event data.The random effect (the frailty) has a multiplicative effect on the baseline hazard function [15].The aim here is to account for heterogeneity caused by unmeasured covariates.In the univariate frailty model, the hazard of an individual with frailty Z is specified a h ij (t|X ij , Z j ) = h 0 (t)exp(X ij β)Z j (3) where ij =indicates cluster j = 1,2,...,n i ; i=1,...,N; h 0 = unspecified baseline hazard function; Z j = frailty term for cluster j; β =a vector of coefficient of the predictors Assumption.Independence is assumed across clusters but observations within cluster are possibly correlated.

Descriptive Statistics
In Table 1, children who died in Nyanza were still high even though comparatively those who survive account for 2757(94.2%).This imply that 169(5.8%)died and that should be a matter of concern and should be mitigated to re-duce it further.In Table 2, majority in the survey resided in the rural areas of Nyanza province with 2014(69%) While the rest are in the urban areas of Nyanza 912(31%).In Table 3, even though modern techniques of family planning in Nyanza seem to have been preferred or were being used with 1583(54%).There was still a big portion who either did not use any method 1284(43.8%).Traditional methods were also still being used though with a smaller number of 54(1.8%).In Table 4, the distribution of children born in Nyanza province by gender.The numbers of males were slightly higher with 1485(50.75%) as compared to their female counterparts with 1441(49.25%).In Table 5, below depicts that female children survived more than males with 76(45%) and 93(55%) respectively.On the contrary, the number of males 1392(50.5%)who survived were slightly more than their females 1365(49.5%).Again, there was no association between sex of child and their survival or mortality.A Cramers V value is very small close to zero and the p-value is also greater than the level of significance used α = 0.05.
Hypothesis tested H o =no association between Child is alive and Sex of child.H 1 = there exists an association between Child is alive and Sex of child.
In table 6, shows the duration it took before a child is born, and on average the period was 41.57 months.The maximum time was 225 and minimum of 9 months before a child birth.In table 7, showed the duration it took after a child is born and on average the period was 27.17 months.The maximum time was 58 months and minimum of 9 months before a child birth.In Table 8, almost all the families interviewed had 2274(99.7%) of no a live birth in between the births.Only 6(0.2%) had all alive children which is a very small number.In Table 9, the table showed that 111 households did not have children below 5 years, 1188 households had 2 children each and that 5 households had 5 children each.In Table 10, in the last 5 years, 1358 households in Nyanza had 1 child born.1238 households had 2 and 20 households had 4 children living with them.In Table 11, the lowest portion 33(0.011%) had no education at all while 1958(0.669%)had primary level of education.In table 12, those with no education and lost children are only 2(1.2%) as compared to those with primary education who lost 129(76.3%) in that category.Higher education level lost 4(2.4%).On the other category, those with Primary education had the most alive children 1829(66.3%)while the no education lot had the least number of children with 31(1.1%).Again there existed an association between education level and whether the child survival.Our p-value is less than alpha used α = (0.05).

Logistic Regression Model
In table 13, below is an output of logistic regression aimed at depicting the association between a set of chosen variables and our outcome variable which is child is alive.So out rightly, three variable turned out to be statistically significant.The three variables are Succeeding birth interval length after a child has been born with a p value= 0.000338.
Other, significant variables number of children under five years of age in the same household and number of births in the last five years with p values =2.0*10 −16 and 8.26 *10 −5 respectively at α = 0.05.
A variable Sex of child had two levels and, in our analysis, male child is the reference categories.Therefore, female child had a higher odd of surviving by a factor of 1.4933 than their male counterparts.
Children who were born in rural area of Nyanza Province, had a smaller odd factor of 0.909 of surviving as compared to those born in the urban areas.In other terms, they are 9.06% less likely to survive as compared to those born in urban areas.
As succeeding birth interval length increased by one unit, child survival also increased by 0.0776%.
The odds of a Child being alive until the fifth birth day was higher by a factor of 3.2247*10 6 , for those who used traditional family planning or contraceptives as compared to those who did not use any method completely.Those who used modern family planning or contraceptives were 42.773% more likely to have their children alive compared to those who do not used any method who are in the reference category.
As the number of children under five years in a household increased, the chances of child surviving in that household also increased by a factor of 11.247.
When birth order increased by one level, the odds of child being alive for 5 years is increased by a factor of 5.2984.
As the number of births in the last five years increased by one, then the chances of child surviving in that household also decreased by 82.02%.
When a mother's age at birth increased by one year, the odds of having a child alive becomes smaller by a factor of 0.9922981.Alternatively, as a mother increased age by one year, they become less likely to have their children alive by 0.77%.
Those with Primary education level had a smaller chance by a factor of 1.936745* 10 −7 to have a child surviving upto five years compared to those with no education.
Those with Secondary education level had a smaller chance by a factor of 1.311153* 10 −7 to have a child surviving upto five years compared to those with no education.
Those with Higher education level have a smaller chances by a factor of 4.003798* 10 −8 to have a child surviving upto five years compared to those with no education.
The model has an AIC = 262.7 and with a complete case analysis.

Kaplan-Meier Curves
In figure 1 we estimate survival function without any variable.At the top left, 100% meaning none had experienced any event yet and the survival prob-abilities decrease as time increases.In figure 3 show the situation in terms of how survival chances were in the two places of residence that is rural and urban.Both curves begin together implying no early experiences at the start.At approximately 250 days there is a deviation from the two curves which implies that those children who reside in rural places have a slightly lower survival chances until 500 days as compared to those living in urban areas.Testing hypotheses: Ho: There is no survival difference between the two groups.
Hi: There is a survival difference between the two groups.
In table 14, our p -value = 0.8 which implies that we fail to reject our null hypothesis and infer that there is no survival difference between the males and females children at α = 0.05.Additionally, this is confirmed by figure 4, that the there was no difference or deviance between the two survival curves.In fact, it just appeared to be one survival curve and with a medium value of approximately 330 days.Log-rank test-Place of residence Based on the same hypothesis above, table 15 gave the result with a p -value = 4x10 -6 showing that there was a statistical significance and that there exists a survival difference between the rural and urban.Prospects of better survival were in urban areas as compared to rural areas where children were born.In figure 5, gave a visual expression of how survivorship is by the variable place of residence i.e. either rural or urban areas where those children were born.In table 16, gave just an estimate of survival without any particular variable.We had time variable measured in days, n. risk are the number of children at risk of death, survival probabilities as time increases.Others are standard errors and confidence interval namely, lower and upper.
At 180 days there were 2862 children at risk and only 1 died.The probability of surviving up to 180 days was (0.999651%) with a very close 95% confidence interval (0.999, 1.00).In table 17, gave just an estimate of survival of a male child.At 184 days there were 1451 children at risk and only 1 died.The probability of surviving upto 184 days was (0.99931%) with a very close 95% confidence interval (0.997, 1.00).In figure 6, Scaled Schoenfeld residuals against time.The broken lines represent a standard error band around the fit while the continuous line represents a smoothing spline fit to the plot.The line of fit is expected to stay close to the horizontal axis within the whole expanse of time, so that we can conclude that the Proportional Hazard assumption holds or not violated.In general, we have symmetry along the zero -line and have no fear for existence of influential observations in the data or outliers.

Determining the Effects of All the Variable
After a diagnostic tests on CoX PH models, the respective predictors were fitted to the Cox PH parsimonious Cox PH [16] in order to check concurrently the effect of different risk factors on survival time.

Stratified Cox Regression Model
In Table 19,  The odds of a Female child surviving were higher by a factor of 1.0572 as compared to the male child.
Those with Primary education were 18% less likely to have their children alive as compared to those with no education.
Those with Secondary education had a higher a higher odds factor of 2.0551 to have their children alive until the fifth birth day as compared to those with no education.
Those with Higher education had a higher odds factor of 14.0583 to have their children alive until the fifth birth day as compared to those with no education.
As the number of children under five years in a household increased by one, then the odds of survival also increased by 67.53%.
As maternal age increased by one year, then the chances of child survival decreased by 21%.
Those who were taught about family planning by health workers were 33.58% more likely to have their children survive until five years as compared to those who did were not taught by health workers.

Frailty Model
In table 20, is the frailty output, four variables are statistically significant in explaining child survival.They are Frailty (Household number) with a p value = 4.5x10 −7 , Preceding birth interval length with a p value = 0.0058, Number of births in the last 5 years and maternal age at birth with p values = 0.01 and 4.9x10 −13 respectively.
The odds of a Female child surviving were higher by a factor of 1.0698 as compared to the male child.As preceding birth interval length increased by one unit, the chances of child survival decreased by a factor of 0.9828.
Those with Primary education were 19% less likely to have their children alive as compared to those with no education.
Those with Secondary education had a higher a higher odds factor of 1.6802 to have their children alive until the fifth birth day as compared to those with no education.
As the number of children under five years in a household increased by one, then the odds of child survival also increased by 1.9%.
As the number of births in last five years increased by one, then the odds of child survival also increased by 2.2728%.
As maternal age increased by one year, then the chances of child survival decreased by 39%.Those who were taught about family planning by health workers were 16.17% more likely to have their children survive until five years as compared to those who did were not taught by health workers.

Discussion
The study attempts to understand the determinants of child mortality using survey data from KDHS.In this case, the Kenya DHS survey 2014 dataset was used for the analysis.For this work, we analysed data for Nyanza province, only being region with the highest child mortality in Kenya.
A high mortality rate generally signifies unmet human health needs in education, medical care, nutrition and sanitation.Therefore, the aspiration to understand the determinants of Child mortality gives rise to a very important aspect of research.
Many studies have employed regression techniques to explore the determinants of U5CM.The Cox PH regression was used by [17][18][19].
Even though we used logistic regression and the Cox PH model, we went further.We tested for the proportionality among those covariates to ensure that the results from the Cox PH are more reliable and valid.
In our findings, we realised the following covariates; Succeeding and Preceding birth interval length, number of children under five years in the household, number of births in the last five years, Higher Education and Maternal age at birth were statistically significant in determining child mortality.
In comparison with other studies that were published, there was no significant mismatch in that findings showed that child mortality was associated with variables related to child characteristics at birth (such as age at birth), reproduction factors of the mother (such as a number of siblings born before), feeding characteristics and anthropometric measurements.
This was in line with other findings which used Cox PH regression and established that region of residence, sex of the child, type of birth (multiple), birth interval (less than 24 months after the preceding birth), and mother's education were related with an increased risk of childhood mortality before their fifth birthday [17].Nasejje et.al., also established that factors related to mother characteristics and previous births such as the sex of the child, the sex of the head of the household, and the number of births in the past year, were significant [18].Sreeramareddy et al., explored the effect of the mother's education, child's sex, rural/urban residence, household wealth index, region ecological zones and development [19].
In our work, we had a large sample of 2926 observations and 1132 variables which was statistically sufficient.Nevertheless, we ignored the limitation of missing cases and just used complete cases in our analysis.So imbalance in the data, if attended to, could have given us different results altogether.
Another limitation was that we did not use any scientific method to choose the covariates we worked on.That was just chosen from the literature.The study is of significance in that it will help the government; non-governmental organisations get information to guide them on how to intervene to ultimately achieve the global target on Child mortality.Furthermore, students actively researching child mortality may fill the gap and add to the already existing bank of knowledge.

Conclusion
Correlated survival data arises from numerous health fields, especially where participants are in a cluster or household sharing the same prognostic factors.In this research, we have managed to present a framework for the determination of child mortality using the 2014 KDHS data from Nyanza Province in Kenya.Majorly, the framework involved checking which covariates were notable in explaining child survival.Moreover, based on this research, child mortality is associated with variables related to reproduction factors of the mother (such as the number of children under five years in the household, number of births in the last five years, succeeding and preceding birth interval length), age of the mother at birth, and higher education.

Figure 1 .
Figure 1.Kaplan-Meier without a variable.In figure2both curves are almost the same, and both male and female children had a median survival time at approximately 330 days.

Figure 2 .
Figure 2. Kaplan-Meier for sex of child.

Figure 3 .
Figure 3. Kaplan-Meier for residence.Log-rank test-Sex of Child It is the most common technique to compare survival between two groups.Testing hypotheses: Ho: There is no survival difference between the two groups.Hi:There is a survival difference between the two groups.In table 14, our p -value = 0.8 which implies that we fail to reject our null hypothesis and infer that there is no survival difference between the males and females children at α = 0.05.Additionally, this is confirmed by figure4, that the there was no difference or deviance between the two survival curves.In fact, it just appeared to be one survival curve and with a medium value of approximately 330 days.

Figure 4 .
Figure 4. Kaplan Meier for Sex of Child.

Figure 5 .
Figure 5. Plot of survival in urban and rural areas.
exhibits the output from the Stratified Cox model with variables Higher Education p value = 0.00738, Number of children under 5 in HH (household) p value= 2.61*10 −5 and Maternal age at birth p -value =1.19*10 −9 being statistically significant at α = 0.05.

Table 1 .
Child is alive in Nyanza.

Table 2 .
Place of residence.

Table 4 .
Gender of children.

Table 5 .
Child alive against sex of child.

Table 8 .
Live birth in between births.

Table 9 .
Number of children under 5 years in household.

Table 10 .
Number of births in last 5 years.

Table 12 .
Association between Child alive against Education level.

Table 14 .
Log rank by Sex of child.

Table 15 .
Log rank by place of residence.

Table 16 .
Kaplan Meier for survival without a variable.

Table 17 .
Survival of a Male child.

Table 18 .
Testing for proportionality among covariates.