Childhood Mortality Adjusting for Cluster Effect Study in Ghana Demographic Health Survey

In Ghana Demographic Health Survey (GDHS), information is collected on the demographic characteristics and health status which is representative sample of the entire population. The backbone for the survey is enumeration areas (EA), clusters which was done using two-stage probabilistic approach. This paper illustrates analysis of childhood mortality by adjusting for cluster effect using Generalized Estimation Equations (GEE). Ghana Demographic Survey Data -2008 (GDHS2008) was used for the analysis. GEE model with three working correlation matrices independence, unstructured and exchangeable were adjusted for the data set. Logistic regression models and statistical tools were used to find association and select significant variables on childhood mortality. Age of mother, Total birth in last five years and region of residence were significance determinants of incidence of childhood mortality. We recommend that there should be clear policy and programs for educating, campaigning and increasing and improving health facilities. Suggestions for further study of childhood mortality were also in this paper.


Introduction
The assessment of the world public health can be considered in two main ways in which data is collected. One, statistical data generated within the health sector, such as hospital records, doctors' records which are familiar to health professionals.
Another perspective is data collected by statisticians, epidemiologists and demographers which are census or population based studies survey. Example of such survey is Demographic Health Survey (DHS). DHS which is conducted by many countries especially developing counties to provide data to monitor the population and health situations. These data serve various purposes to patients, doctors, health administrators and nations as a whole.
Up to now five surveys have been realized by Ghana Demographic Health Survey (GDHS) : 1988, 1993, 1998, 2003 and 2008. Respondents are selected by stratified multi-stage cluster sampling scheme that involves a fixed sample size per region and a fixed sample per region proportional to the population size. GDHS collect information on wide range of health related outcomes and findings such as HIV/AIDS knowledge and behavior, malaria indicator, Anemia, Nutritional status of children, vaccination of children, fertility levels and preference, use of family planning methods, Childhood mortality and maternal health among others.
One of the key findings which worth to be looked at is Childhood mortality and is a common agenda on public health of local and international agencies such as United Nations ( The latest estimates of under-five mortality from the UN Inter-agency Group for Child Mortality estimation (IGME) shows a 35 percent decline in the under-five mortality rate globally, from 88 deaths per 1,000 live births in 1990 to 57 in 2010. Over the same period, the total number of under-five deaths in the world has declined from more than 12 million in 1990 to 7.6 million in 2010.
Data from (GDHS, 2008) also showed that there is remarkable decline in the rate from 1988 when the conduction of the survey started to 2008 survey. Specifically, between 1988 and 2003 there was stagnant in decline but speedily decline in last five years to 2008.
The gain as expressed by many people was obtained due to the immunization, vaccination and programs by the Ministry of Health (Ghana) and its policies to fight communicable diseases such as diarrhea, measles, respiratory infections, malaria and others. Others too have attributed the recent decline to the implementation of the National Health Insurance policy of free maternal care in the year 2003. This paper examines the health determinants; social and economic environment, physical environment and the person's individual characteristics and behaviours that adjusting cluster effects of child mortality at households' level, urban -rural and regional balances in Ghana.

Statement of Problem
Child mortality issue over the past two decades has been on the agenda of virtually every nation and is unfinished business. The major health interventions during the last decade have been focused on reduction in infant and child mortality, Millennium Development Goal (MDGs) United Nation (WHO, 2000).
Reduction in the rate got to do with directly looking at the contributing factors of child mortality and how data collected is analyze. These factors are not uniform across the country; it may vary from households, villages, towns and regions.
Though some progress has been made in improving child survival, however, the child mortality rates are still high in Ghana and many developing countries. Child death can be concentrated or clustered at the level of mother or family, household, community and region.
The backbone of GDHS survey is based on enumeration area (EAs) that is clusters where death is concentrated. The data from the survey were collected on the same unit across successive cluster in a given period. If this correlation is not taken into account the standard errors of the parameter estimates will not be valid and hypothesis testing results will be non-replicable. Hence in making analysis or modeling using GDHS the cluster effect must efficiently be accounted, for accurate estimates and proper analysis.

Objective of the Study
The ultimate objective is to examine childhood mortality adjusting for Cluster effect.
Specific Objectives: a) Evaluate the determinants of childhood mortality b) To asses if there is cluster effect

Literature Review
Many scientist and researchers all over the world have studied and written papers on infant and child mortality and have come out with many different results of vital importance to the health population. Some of them many years ago, others are recently and some are still on the table of study. These consist of study of various health determinants and its impacts or effects on infant and child mortality.
Child Mortality is defined as probability of dying between first and fifth birthday and suggested that there are interrelated, complex and enormous factors varies linearly to child mortality ( Most of the previous studies have shown significant association between socio-economic and demographic factors as determinants of childhood mortality. For instance the researchers (Hosseinpoor, 2005), (Hill & Mahy, 2001) and (Ogada, 2014) studied infant mortality; (Hobcraft, 1992), (Becares, Cormack, & Harris, 2013) looked at determinant of childhood mortality. Biological factors and environmental factors effect on under five were also examined by (Becher, 2010) and (Mutunga, 2004). They all did their studies through making use of survey or censuses data. From the theory of child mortality, (UNICEF, Technical Report: State of Children, 2009) of data and methodology stated mortality among young children can be subdivided by age group which is defined as Neonatal mortality (During the first 28 days of life), Infant mortality (between birth and exact 1 year), post neonatal mortality (at ages 1month to 11 months), Child mortality (at age 1 to 4 years) and Under-five (between birth and exact age 5). The mortality of much concern here is under five mortality which encompasses all the mortality divisions.
The under -five mortality rate, often known by its acronym U5MR or simply as the child mortality rate, indicates the probability of dying between birth and exactly five years of age, expressed per 1,000 live births.
Concerning the trend, Ghana is among many countries that were on truck in 2010 to achieve millennium Development Goal 4, but progress needs to accelerate in several regions particularly in the southern Asia and sub-Saharan Africa Few studies have investigated child mortality in Ghana with most of them using data from GDHS. It is recommended that many of these studies identified some of the socio-economic, bio-demographic, household and environmental factors associated with child mortality. These studies utilized analytical models that assumed child death to be independent and randomly distributed across families, households, communities and district (Das Gupta, 1990) and (Sastry, 1997) In Ghana, specifically Kintampo in Brong Ahafo region a study was conducted (Obed, Owusu, & Nartey, 2010) did considerably well, to test for the existence of statistically significant clusters of childhood mortality within the Kintampo Health and Demographic Surveillance System (KHDSS) between 2005 and 2007.
With respect to determinants (Akoto & Tambashe, 2000) in their study of socioeconomic inequalities of infant and child mortality among urban and rural areas in sub-Saharan Africa used logistic regression (multivariate)identified six major determinants of child mortality in sub-Saharan Africa as place of residence, mother's education, mother's age, immediate environment, mother's work status and mother's religion. Place of residence they explained as in the case of child mortality, the impact of urban-rural settings varies in the different countries.
In a similar study by (Espo, 2002) in Malawi apply logistic regression to assess association between morbidity and number of linear or dichotomous environmental independent variables. The Kruskall Wallis non-parametric independent sample test and cross-tabulation with chi-square test for statistical significant are employed. The result indicates that source of drinking water and sanitation facilities were strong predictors of infant mortality.
Moreover (Singh & Tripathi, 2013) investigated the effects maternal factors contributing to under-five mortality specifically individual, household and cluster socioeconomic characteristics using multilevel model approach and examine their respective influence on both determinants of child mortality as well as under nutrition.
A study by (Jacoby & Wang, 2003) on their part examines the relationship between child mortality and morbidity and quality of the household and community environment in rural and urban china using a completing risk approach. They indicated that higher maternal education levels reduce child mortality and female education has strong health externalities (i.e. controlling for other factors, a child living in a neighborhood with more educated mothers has about 50% lower mortality risk). Also access to safe water and sanitation, and immunization reduce child mortality incidence in rural areas, while access to modern sanitation facilities (flush toilets) reduces diarrhea prevalence in urban areas and thus reduce child mortality.
Analysis of childhood mortality in West Africa by (Balk & Neuman, 2003) based their analysis on DHS data and linked information from a variety of spatial data source from ten West African countries namely; Benin, Burkina Faso, Cameroon, Cote d'voire, Ghana, Guinea, Mali, Niger and Senegal. They discover that source of water becomes more important as the child is weaned. Surface water clearly is inferior to piped water. Thus, they concluded that the impacts of most country level difference become insignificant when household and spatial characteristics are included. With regards to risk or proximate determinants, (Mosley & Chen, 1984) in early years identified intermediate variables affecting childhood mortality which is known as proximate determinants as a sequence of socio economic and biological forces on mother's health that influence the outcome of her pregnancy. The adverse outcome of this sequence of events is usually the delivery of premature, low birth weight or sick neonate.
In Bangladesh, (Rafiqul, Moazzem, Mizanur, & Mosharaf, 2013) identified that demographic factors such as sex, multiple births and previous child death are associated with a high risk of infant death. Infant boys, especially during the neonatal period, have a higher risk of death than females. Early infant death is also significantly higher for multiple births, mainly because multiple births are most likely to be premature and/or low birth weight. If more than one birth survives delivery then there is competition for breast milk and the mother's resources.
In the US the infant mortality rate of black infants is twice that of white infants Mac Dorman and Mathews; US Department of Health and Human Services, 2000. These differences also exist among other racial and ethnic groups.
Also (Wang, 2003), using the results of 2000 Ethiopian DHS examines the environmental determinants of child mortality by running three hazard models. The estimation results show that children born in rural areas face much higher mortality risk compared with those born in urban areas. Ethiopia is characterized by severe lack of access to basic environmental resources and strong statistical association is found between child mortality rates and poor environmental conditions.
Regarding the association between socio-economic status and child mortality (Kanmiki & Bawah, 2014) explained that education can contribute to child survival by making women more likely to marry and enter motherhood later and have fewer children, utilize prenatal care and immunize their children. The results also, however, showed mysterious conclusion that effect of maternal education on child survival is weaker in sub-Saharan Africa. Similar findings have been reported elsewhere (Abuqumar, Danny, & Fred, 2010).
The research work of (Singh and Tripathi, 2013) in assessing maternal factors contributing to under-five mortality in India found that child's gender is risk factors of birth order 1 and 5 information regarding complications in pregnancy.
From (Manda, 1999) cited in (Johanna, 2016) came out with the fact that unlike the endogenous maternal and demographic factors that substantially increase child's risk of death, the effects of socio-economic variables are enhanced as the child gets older. The reason usually cited for this is that a greater proportion of child deaths between age 1 and 4 years are due to exogenous factors over which parents potentially have control.
In Kenya, (Hill & Mahy, 2001) reported an inverse relationship between mother's educational level and economic status (wealth index) on child mortality. While for the relationship between urban/rural residence and child mortality, urban areas showed higher mortality risks than rural, but when adjusted for HIV prevalence, child mortality was lower in urban areas. (Heisler, 2012) have argued that the high infant mortality rate reflects racial and ethnic disparities, as evidenced by racial differences in infant mortality. Others suggest that the high infant mortality rate may reflect variation in a number of health system characteristics, such as the adequacy of public health services and the availability of health care for women and infants (Yeboah, 2014) The household environment, measured by factors such as source of drinking water and toilet facilities, provides important determinants covering with older children's chances of survival. These factors are important not only for their direct effect on child survival, but because they may also indicate the overall resource level of a child's family.

Methodology
Design of the Study GDHS 2008 used in this study is a household based survey which is national representative. The sampling of households and respondents were done with World Health Organization (WHO) guidelines and frame construction from Ghana Population and Housing (GPH, 2000) census. The sampling design is a probabilistic two-stage sampling. The first stage is where enumeration areas (EAs) are randomly selected with probability proportional to their size.
In all 412 sample EAs points or clusters were selected from all the 10 regions of Ghana which is political administration of the country in consideration of rural and urban separation.
The second stage of the selection involve sampling of thirty households within the selected clusters (EAs) which were randomly selected with equal probability and sample weight are assigned to individual. A total of 12,323 households were collected from the 412 clusters except one which is due to security reasons.

Dependable Variable
The outcome variables (Childhood Mortality) are all children below or equal to the age of five live birth status. Those alive recorded as 0 and dead as 1.

Explanatory Variable
This study used variables available in the GDHS, 2008 data. These include socioeconomic, demographics and health outcome predictors (biological variables). They are classified as follows: Socio-economic: maternal education level, mother's occupation, type of residence (urban and rural) and wealth index.
Demographic variable: age of mother at birth, age of mother at first birth, sex of the child, region or district or community of residence, ethnicity and religion. Biological (health outcomes predicting variables): birth order, birth size, breast feeding, previous birth, interval and place of delivery.

Models Used for Analysis
The data from GDHS-2008, where women were questioned about their children birth for the period in the enumeration areas were edited for consistency. The variables obtained from the respondents were tabulated in frequency distribution table in excel software and imported to SAS for further analysis.
Due to the nature of the data being health related, statistical and biological consideration were taking in the choice of the variables. Frequencies of the dependent variable with independent variable were display from the SAS frequency count which gives percentages and counts for simple description. The univariate, bivariate and multivariate were conducted.
In the univariate, Cochran-Mantel-Hansen test was performed on the ordinal variables as against the response variable, to test for linear trend. Summary statistics and spearman correlations were reported on continuous variables. The variables were also tested using bivariate analysis to examine the relationship between the selected variables and childhood mortality. This was done using simple logistic regression of calculating odd ratios.
Under multivariate analysis, stepwise backward and forward technique were used to spontaneously select variables that are significantly associated with childhood mortality and their pair wise interaction terms were check if they showed significance. Finally variables that showed significance were used.
The assumption under Ordinary least square (OLS) regression will be violated since the responses are correlated. The effect of ignoring correlation will make the standard error of the estimates being underestimated for between subject and overestimated within subjects. In taking into account correlation, Generalized Estimation Equation (GEE) was used to analyze the data with working correlation matrices. It has the strength in marginal expectation of the dependent variable as a linear function of explanatory variables.

Introduction to Generalized Estimation Equations (GEE)
Generalized estimating equations (GEEs) represent an extension of the generalized linear model (GLM) (Hogan et al, 2009) to accommodate correlated data. GEE was first introduced by (Zeger & Liang, 1986) have become very popular in the biological, epidemiological, and related disciplines, yet remain less known in the educational and social sciences (McCullagh & Nelder, 1989).
GEEs provide a general framework for the analyses of continuous, ordinal, polychotomous, dichotomous, and count-dependent data, and relax several assumptions of traditional regression models.

GEE Model Building
The data for the whole study can be classified into a number of different groups called clusters. This consist of 412 clusters contains multiple observations. The key feature of these clustered data is that the observations within a cluster are more similar than observation from different cluster.
The geographical location of the clusters clearly indicates observations are more alike example in terms of health facilities and their conditions. The fact that the observations within a cluster are more alike than from different clusters induces the correlation between observed variables within the same cluster. But observe variables from different clusters are independent. Let be a vector of responses from n clusters example EAs, with j observations the cluster 1,2,3, … . , . For each a vector covariate is available which possibly contains an intercept. The data can then summarized to the vector and matrix , , , … … . . ,

The pair
, are assumed to be independently identically distributed. From GLM which allows flexibility in modeling mean and variance structures give the model for mean structure as / where g is non linear response function and is unknown 1 parameter vector we are looking for. Here the inverse of can be termed as the link function, considering the conditional model which is also termed as dependence model.

/
(2) where ! response within the same cluster. And for random effect model also mixed model is also mixed models / follow some distribution F. GLM gives important relationship between mean and variance as where h is the variance function and ∅ is the dispersion parameter. A situation where a specific univariate exponential family can be assumed such as Binomial, Poisson or Gamma distribution variance function can be determine by assumption. The variance function for binomial # # 1 % # Poisson identity # # and # 1 for normal If the observations are independent, the maximum likelihood method is used to estimate parameter. The maximum likelihood equation is obtained by finding the derivative of the Log-likelihood function with respect to and equate to zero.
Given the mean # and variance structure" , estimating equation is formed. The GEE estimator of is the solution of (4) where is the diagonal matrix of the first order derivative matrices and vectors for D and Y respectively. And is a vector of with which is a consistent estimate of .
The correlation matrix of can be modeled as (5) where is working correlation matrix fully specified. The specification of the working correlation matrix account for the form of within subject correlation of responses on dependent variables. One of the aims of this paper is to find out whether using different working correlation matrices for estimation would affect the estimates and SEs substantially. Various types of working correlation exist, but three of them have been used here for the study. Namely independent, Exchangeable (Compound Symmetry) and Unstructured (no specification) Independent The independence model adopts the working assumption that the repeated observations for a subject are independence. It assumes that there are households and is the same resulting parameter estimated are the same.
Consider a matrix; Independent correlation matrix can be illustrated as

Exchangeable (Compound Symmetry)
The exchangeable working correlation specification makes the constant correlation between any two measurements within a subject.

Unstructured (No Specification)
There is parameters to be estimated when the correlation matrix is completely unspecified under this, it the most efficient estimator for but useful only where there are relatively few observation terms or conditions.

The Descriptive Data Analysis of the Study Variables
The frequency count shows there were 2,137 respondents of which 349 have ever experienced child death representing 16.33% and 1,788 have never experience death during the half year period representing 83.67% given Childhood mortality as 83.67 %.
In the have years to 2008 many women give birth to one child which constitute 64.9 % (1,387 children) of the entire women who gave birth.
Most of the respondents live in rural area of small villages. The frequency count also indicate that only 2.25% of the mothers have higher, 37.8% with secondary, 23.75% primary and 36.21% with no education. As this can be implied from those that cannot read and write is as high as about 70%.
The main occupation among the respondents was agricultural related and 37.13% gave birth at home but there was a remarkable percentage which gave birth in government hospital or polyclinics of 38.05%. The wealth index also accounts for more than half of the respondent being poor (51.66%), 17.45% middle 18.6% richer and 12.73% richest.
And very few of about 0.33 % (7 respondent) had given birth 4 times where most of them experience child death. There is also indication that those with few births like just one in the period experience less child death (8.47%). It also attest to the fact that Akan is the major ethnicity in the country.
Though Ashanti region recorded the highest birth during the period and upper east least, Ashanti region recorded highest child survival (12.69%) with upper east have the highest child death (2.48%) and Volta region have the least death of 0.7% Most of the women came from Ashanti region (14.74%) which is followed by northern region (14.32%) with the least coming from Volta and upper east regions.

The Summary Estimates For GEE Models
From tables 1 and 2 all the three childhood mortality determinants were significant at 5% level of significance under independent working correlation assumptions.
However categorical variable, region of residence recorded non-significant for all the separate regions and western region recording 0 parameter and standard error for both empirical also known as sandwich or robust estimator and model-based also naive estimator which the reference variable by default for the regions.
The tables also give the analyses of GEE parameter estimates of model-based and empirical standard errors. The intercept (-4.3930) and independent variables parameter estimates are the same under both models but have marginal differences in the standard errors. This may give an indication of the fact that the data set needed to be modeled with correlation in mind.
When exchangeable assumption is applied the parameter estimates and standard error estimates for empirical and model-based are approximately the same. The estimated intercept, age of respondent and total birth -4.3955, 0.0487and 0.5722 respectively are the same. But the standard errors showed marginally difference.
The working correlation constant reported is 0.001853 from working correlation matrix but no inference can be made as is treated as nuisance.
The tables 1 and 2 also give the summary results of GEE family of models for the three working correlation specification. It gives the estimates of parameters, standard errors and pvalues from empirical and model -based estimates.
The estimated coefficients from the three GEE models were marginally close at statistically significant at 5% levels. This attests to the claim by (Zeger and Liang, 1986) that closeness of the measures for comparison and their consistency are expected when working correlation is misspecified. The estimated coefficients of the three models for the variables differ at only second decimal digit which is trivial.
In general, all the estimated coefficients separately for region of residence have positive effects on childhood mortality except two regions, Greater Accra and Volta regions which have negative effects and all are not significant. Age of respondent and total birth estimated coefficients showed positive effects and highly significant.

Discussions and Conclusion
This study examined childhood mortality in Ghana adjusting for cluster effect. The result from the adjusted GDHS-2008 data using specified working correlation attest to the consistent estimates from GEE model. In biomedical data analysis such as GDHS ignoring cluster can lead to inconsistency in estimation.
The adjustments from the specified working correlation yield results which are not the same in model selection but provide useful guide and empirical support which give basically the same interpretation and conclusion. In fitting the model, parameter estimates were approximately the same with differentials in standard errors for working correlation within and across parameters.
This explained the closeness in adjusting for cluster effect in childhood mortality. There are very slight differences in standard errors for both empirical and model base in GEE model for exchangeable. The estimate for working correlation coefficient of approximately 0.002 indicates that is good estimate. Hence probability of a child dying within a enumeration area (Cluster) is estimated at 0.002 which is very small. And that regression model based on GEE is an increasingly important for such data. The results showed that some variables are significant others are less significant.
No significant effects were found in type of residence, sex, religion, literacy, type of drinking water, wealth index, place of delivery and marital status. The study revealed that there is statistically significant effect of Age of mother;the ability of the mother to act promptly in response to her child care turns out to be crucial for the survival of the children from birth to 5 years age, region of residence; the impact of urban-rural settings and total birth in last have years.
The most important determinant of childhood mortality is total birth in the last have years followed by region of residence and the least is age of the mother. While the current study is focused on the enumeration area as cluster, future study can be focused on regional boundaries as clusters. In considering any model or analysis using such data the effect of cluster should be accounted, for effective and efficient results.
The findings suggest that birth intervals should be wide, that within have years at most two children at birth interval of 3 to 4 years. Policies and programs such family planning interventions through widespread health education campaigns and strengthening the health centre facilities and capabilities. This will improve birth intervals and reduce the incidence of higher birth orders at short birth interval.