Application of Bayesian Approach Survival Analysis of Under-five Pneumonia Patients in Tercha General Hospital, South West Ethiopia

: Pneumonia is among the major killer diseases in under-five children in the world. In developing countries 3 million children die each year due to pneumonia. Ethiopia is one of the 15 pneumonia high burden countries. The aim of this study was to examine the risk factors of the survival time of under-five pneumonia patients using Bayesian approach analysis. Total of 281 under-five pneumonia patients included in this study. The parametric survival models such as Weibull, Lognormal and Log-logistic baseline distributions were used to fit the datasets by introducing prior distributions. The DIC value was used to compare the baseline distributions


Introduction
Pneumonia is the major killer of under-five children's than any other diseases known to affect children, more than the death shares of Acquired Immune Deficiency Syndrome (AIDS), Malaria and Measles combined [1]. More than 50% of all new pneumonia cases of the under-five childhood are concentrated in the poorest world's regions, Sub-Saharan Africa and South Asia. In terms of mortality, about 90% of all under-five Pneumonia deaths burden is reported to occur in these two regions [2]. According to World Bank report the risk of pneumonia in children in developing countries is 3 to 6 times higher than other children. Not only outbreak of pneumonia, but also the mortality rate of this disease is higher in developing countries [3]. The contribution of pneumonia to the deaths of older children was estimated to reach 14.1% with approximately four percent of childhoodpneumonia related death occurred in the first 28 days of life globally [4]. According to pneumonia and diarrhea progress report of 2015, Ethiopia is among 15 top under five pneumonia high burden countries. And also the study conducted in Gilgel Gibe Field Research Center reported that Neonatal and infant mortality rates were respectively 38 and 76.4 per 1000 live births. The two most common causes of death during neonatal period were prematurity (26.4%) and pneumonia (22.6%). Whereas the top causes of death in postneonatal period were pneumonia (42%), malaria (37%) and acute diarrheal diseases (30%) [5].
Few studies in this area was tried to use statistical models like Binary logistic regression model and multilevel logistic regression models to examine the determining factors of under-five pneumonia patients and several studies were used descriptive analysis. But those studies are not enable to study the survival time of patients hospitalized with pneumonia. Due to this, the researcher was intended to use the survival analysis models. Survival analysis is a statistical method for data analysis where the outcome variable of interest is the time to the occurrence of an event and there are many standard parametric models such as Weibull, Lognormal and log-logistic models [6]. Lately Bayesian approach have been used in many research studies, especially in the field of medicine, as an alternative to classical or frequentist statistical methods. One of the reasons is that classical methods base their maximum likelihood estimations on asymptotic considerations that are usually only valid for a considerable data size [7]. The study conducted at Beirut Lebanon was reported that the Bayesian approach may have advantages over the frequentist one, particularly in case of a low power of the frequentist analysis [8]. According to Gelfand, A. and Mallick B. K. [9] study Bayesian method is the best method to obtain the appropriate estimates of the model.
The Bayesian methods are applied to relatively small data sets where the validity of the asymptotic assumptions is doubtful [7]. In the Bayesian approach no assumption is made as to the shape of the percentile distribution, rather the data themselves specify the distribution and the Bayesian approach has the possibility of improving the precision of the results by introducing external information in terms of the priori distribution [7]. The choice of prior distribution was determined as the experiences gained in the previous studies, the researcher's expert of knowledge. If prior information comes from the previous research, Bayesian estimation should lead to more precise results than classical methods [10]. The prior distributions which contain exact information and have an impact on the posterior distribution are called informative priors. In Bayesian method, if the researcher intends to obtain correct results, the researcher should use prior distributions which have a minimal impact on the posterior distribution. Such distributions are called noninformative prior distributions [11]. In this study a small size sample limited to under-five pneumonia was used in order to investigate the survival time of under-five pneumonia patients. As the classical approach central limit theory and requires a big sample, for a small sample it is essential to use Bayesian approach, even if the researcher do not have any prior information, because the researcher can choose noninformative prior distributions [11].
In this study Bayesian survival Analysis are used to identify important risk factors of under-five pneumonia patients. The response variable in this study is the admission time until an event (death) due to pneumonia. Log rank test is used to compare the survival experience of different category of patients and Parametric Accelerating failure time (AFT) models using Bayesian approach analysis is used to identify predictors of mortality of under-five pneumonia patients. There has been a limited literature on the use of Bayesian survival analysis of under-five pneumonia disease. Therefore the main aim of this study was to investigate survival time of under-five pneumonia patients in Tercha General Hospital Using Bayesian approach survival analysis.

Study Area and Study Design
The study was conducted in Dawro Zone Tercha General Hospital around 491 km away from Addis Ababa capital city of Ethiopia. The study was retrospective study that reviews or visits all under-five aged children cards and pediatric charts hospitalized with Pneumonia in Tercha General Hospital. The data analysis was taken place using latest version R 3.5.0 and WinBUGS software.
The response variable in this study was the survival time (time to death) of under-five pneumonia patients measured in days from the date of admission until the date of patients' death or discharge from hospital. The status variable is coded as 1 if an event (death) occur and 0 for censored in the given time interval. Explanatory variables considered in this study were listed below in Table 1 with their corresponding codes. Ratio of patient to nurse counted in a month PNR = * Length of Hospital stay is the number of calendar days from the days of patient admission to the day of discharged/died.

Survival Data Analysis
Survival analysis is a collection of statistical procedures for data analysis for which the outcome variable of interest is time until an event occurs. By time, mean years, months, weeks, or days from the beginning of follow-up of an individual until an event occurs [6].

Non-parametric Survival Methods
Nonparametric analyses are more widely used in situations where there is doubt about the exact form of distribution. The nonparametric methods used in these study is Log-rank test method. Assessing whether or not there is a real difference between groups can only be done, with any degree of confidence, by utilizing statistical tests. Log-rank test is the one commonly used nonparametric tests for comparison of two or more survival distributions [12]. The log rank test statistic for comparing two groups is given by: Where: r is the number of rank-ordered failure times (event times), w 0 is the weight for censor adjustment at time , d 20 is the observed number of failure (event occur) at time t (j) in group 1, n 1j is the number of individuals at risk of event occur in the first group just before time t (j), n 2j is the number of individuals at risk in the second group just before time t (j), d 0 is the total number of events occurred at t(j), nj is the total number of individuals at risk before time t (j) . Q follows a chi-square distribution with k-1 degree of freedom.

Bayesian Method
The Bayesian approach analysis considers the parameters of the model as random variables and requires that prior distributions specified for them and data are considered as fixed. The key ingredients to a Bayesian analysis are the likelihood function, which reflects information about the parameters contained in the data, and the prior distribution, which quantifies what, is known about the parameters before observing data. The prior distribution and likelihood can be easily combined to form the posterior distribution, which represents total knowledge about the parameters after the data have been observed [13].

Prior Distribution
The prior distribution is a probability distribution that represents the prior information associated with the parameter of interest. It is a key aspect of a Bayesian analysis. In this study the researcher used non-informative multivariate normal prior distribution for coefficients with mean zero and variance 1000 and inverse-gamma prior distribution with scale parameter a=0.01 and shape parameter b=0.01 [14]. A typical joint prior specification can be expressed as a product of a multivariate normal (for parameter β/σ 2 ) and an inverse gamma prior (for σ 2 ), that is

Likelihood Function
A likelihood functions is a function that gives the probability of observing the sample data given the current parameters. Suppose we observe n independent vectors of (t i , δ i , x i ), where t i is time to the event (death of under-five pneumonia patients) and δ i is indicator variable telling us whether t i is uncensored or censored.
δi = E 0 censoring observation 1 event or dead failure The likelihood function of the set of unknown parameters θ in right censoring is written as: Log-likelihood would be as follows: Where: f t a /x a and S t a /x a are the density and survival distributions [29].

Posterior Distribution
The posterior distribution is obtained by multiplying the prior distribution over all parameters, θ by the full likelihood function L X/θ . All Bayesian inferential conclusions are based on the posterior distribution of the model generated. The inference is performed by sampling from posterior distribution until the convergence to the posterior distribution is achieved [15]. The researcher assumed that θ is a random variable and has a prior distribution denoted by π θ . Inference concerning θ is then based on the posterior distribution, which is obtained by Bayes' theorem. Then posterior distribution of θ is given by: Combining the likelihood function with the prior distribution on (β, σ 8 ) and the full conditional distributions for unknown parameters, the posterior distribution can be written as: The posterior distribution for the model specification above does not have closed form solution for the parameters. For those models, MCMC Gibbs sampler is implemented using WinBUGS software. The baseline hazard distributions of parametric survival models such as Weibull, lognormal and log-logistic in Table 2 below are used to fit Bayesian Approach survival models by introducing prior for each parameters.

MCMC Estimation Methods
The MCMC iteration is used to do the integration numerically rather than analytically by sampling from the posterior distribution of interest even when the form of that posterior has no known algebraic form [16]. Gibbs sampler is an algorithm that sequentially generates samples from a joint distribution of two or more random variables [17]. Gibbs Sampler Algorithm is written as follows: 1. Choose an arbitrary initial value of θ = ˆθ 2 , θ 8  As a rule of thumb, the iteration should be run until the Monte Carlo error for each parameter of interest is less than about 5% of the sample standard deviation [18].

Model Selection and Comparison
The fitted Bayesian approach models are selected based on Deviance information criteria (DIC) Where: pD is effective number of parameters in the model, θis maximum likelihood estimate.
DIC is used to compare Bayesian approach survival models. The preferable model is the one with the lowest value of the DIC [16].

Model Diagnostics
Once a model has been developed, the researcher would like to know how effective the model is in describing the outcome. This is referred to as goodness of fit. The most common ways of checking goodness of fit in Bayesian approach analysis are diagnosis for convergence and mixing. Therefore the researcher used time series plot, Autocorrelation plot, Gelman-Rubin statistic plot and Kernel density plots to check goodness of fit.

Results
A total of 281 under-five aged children who are registered due to pneumonia in Tercha General hospital from September, 2016 up to August, 2017 are included in this study out of those patients 126 (44.84%) are females and 155 (55.16) are males. Patients whose residence was rural is 194 (69.04%) and whose residence was urban is 87 (30.96%). The case of under-five pneumonia based on the season of diagnosis are in Autumn 78 (27.76%), in Winter 33 (11.74%), in Spring 89 (31.67%) and in Summer 81 (28.83%) out of 281 patients, this implies that the occurrences of the pneumonia in the hospital are higher in spring and in summer season. According to age group in the table 3 below the percentage of under-five pneumonia were 49.82%, 22.06%, 11.39%, 10.32% and 6.41% in age group of 1-11, 12-23, 24-35, 36-47 and 48-59 months respectively. The patients with comorbidity is 39.50% and without comorbidity is 60.50%. This implies that even if there is no comorbidity the pneumonia in under-five is the sever disease. All the descriptive results are presented in the table 3 below.
In this study the log-rank test is used to compare the survival time of two or more groups of under-five pneumonia patients. According to log-rank test in table 3 below there is a significant difference of survival time between female and male patients (p=0.023) at 5% level of significance and there is no significant difference in each age groups. And also the log-rank test revealed that residence, comorbidity, SAM, Treatment types and patient refer status has statistical significant different survival time for each categories of covariates as shown in table 3 below. Pneumonia Patients in Tercha General Hospital, South West Ethiopia

Bayesian Survival Data Analysis
Bayesian approach analysis was used to make inference about the parameters based on the Gibbs sampler algorithm and it is implemented using 40,000 iterations in three different chains, then after 15,000 terms were discarded due to burn-in state in order to avoid autocorrelation and 75,000 sample obtained for full posterior distribution. The noninformative normal prior distribution with mean zero and variance 1000 and Inverse gamma distribution for sigma with scale=0.01, shape=0.01 parameters [14] were used in this study.
Bayesian approach Accelerated Failure Time Model Comparison. The distribution with smaller DIC value is a good distribution that fit the data well, due to these Bayesian approach Weibull Accelerating failure time model is selected [16] as the preferable model to analyze the under-five pneumonia dataset based on the DIC value presented in the table 4 above. All parameter estimates in the table 5 above has Monte Carlo error (MC-error) value that is less than 5% of standard deviation which indicates convergence of the parameters. Due to this reason and convergence plots, the researcher uses this posterior summary as final results. The final model results were interpreted using acceleration factor, 95% credible interval of Bayesian approach accelerated failure time estimated values.
When the effect of other factors keep fixed, the estimated acceleration factor for male patient was estimated to be ™ š = ™ & .28oe = 0.882 with [95% CrI: -0.2673, -0.0345]. The credible interval for the Bayesian acceleration failure time didn't included zero or on other hand researcher can say that the credible interval for the Bayesian acceleration factor did not include one by exponentiation of the Bayesian acceleration failure time credible interval that is [95% CrI: Žoe : 0.765, 0.9661]. This indicates that male patients have less survival time than female patients or in the other way female patients survived 11.8% longer time than male patients. The acceleration factors for patients whose residence was urban were estimated to be 1.179 with [95% CrI: 0.1025, 0.1712]. This indicates that patients whose residence was urban had prolonged survival time than patients from rural residence at 5% level of significance.
Patients diagnosed at spring season and summer season were 0.977 and 0.869 with [95% CrI: -0.3194, -0.0797 and -0.4108,-0.1042] respectively. This implies that patients who were diagnosed at spring and summer season had less survival time than patients who was diagnosed at autumn season. The acceleration factor for patients who were suffered other extra disease or comorbidity was estimated to be 0.888 with [95% CrI: -1.1, -0.5357]. This implies that patients who were not suffered other extra disease had longer survival time than patients who were suffered other extra disease or comorbidity. The acceleration factor for patients who were suffered severe acute malnutrition (SAM) was estimated to be 0.751 with [95% CrI: -0.2482, -0.105]. This indicates that patients who were not suffered severe acute malnutrition (SAM) had longer survival time than patients who were suffered severe acute malnutrition (SAM).
Acceleration factor for patient nurse ratio was estimated to be 1.112 with [95% CrI: 0.0865, 0.1552]. This indicates that patient nurse ratio had significant effect on the survival time of patients.

Assessment of Convergence
a) Time Series Plots:        Four different types of plots used in this study to check the convergence of the parameters and indicated that the parameters are converged. To assess the accuracy of Bayesian survival analysis, the researcher used Monte Carlo error of each parameter and checked that MC error value is less than 5% of its posterior standard deviation, then the posterior density is estimated with accuracy. In this study, MC error for each significant variable is less than 5% of its standard deviation. This indicates that convergence and accuracy of posterior estimates are attained and the model is appropriate to estimate posterior statistics.

Discussion
In this study the Bayesian approach parametric survival models were used to identify the risk factors of survival time of under-five pneumonia patients in Tercha General Hospital. The parametric survival models used in this study were Weibull, Lognormal and Log-logistic distributions. For this study the source of data was a single center study with 281 under-five pneumonia patients. The Bayesian approach parametric survival analysis is started by MCMC iteration until the convergence of each parameters. The MCMC iterations were generated by setting the initial values and burn-in state without any criteria, since there is no established method for determining an appropriate number of iterations and burn-in size. Rather, the researcher use a trialand-error process in which the ultimate goal is to obtain stable parameter estimates that minimize simulation error. This statement confirms with study conducted in USA [14]. The MCMC simulation helped to increase the accuracy of the results by narrowing the credible interval and minimizing the standard error, but did not change the direction of the results this agrees with studies [19,20]. In this study 40,000 samples was generated by using Gibbs sampler algorithm of MCMC iteration method. From those samples 15,000 was used for burn-in state and 25,000 samples with three chains were used for posterior inference using Win BUGS software for iteration and the convergence of the parameters were checked. After the convergence of the parameters three baseline distributions were compared using DIC value and the distribution with smaller DIC value is preferable as stated by Spiegelhalter D. [16]. In this study the Weibull distribution model was selected as a good model to fit the under-five pneumonia dataset in the Bayesian approach survival analysis due to having smaller DIC value.
The results of Bayesian Weibull accelerated failure time model in this study was revealed that female patients had prolonged survival time than male patients and this study agrees with the study conducted in Pakistan by [21] and in JUSH [22]. Patients whose residence was urban had prolonged survival time than patients in rural residence and is in line with study conducted in JUSH [22]. The result of this study revealed that Patients who were admitted in summer and spring season have shorter survival time and had high risk of dying from Pneumonia as compared with autumn and winter seasons; this result agrees with study conducted in Hawassa city [23] and in southern Israel Hospital [24]. And also it is in line with report of CHERG as stated that Altitude, annual rainfall and nature of the seasons and average monthly temperatures are the factors of under-five pneumonia [25].
Patients who were suffered comorbidity or other diseases with pneumonia had shorter survival time than patients without comorbidity, and also under-five pneumonia patients suffered with Severe acute malnutrition (SAM) had shorter survival time than that of patients without Severe acute malnutrition (SAM) and this study agree with the studies conducted in Pakistan [26], in Malawi [27], in JUSH by [22], in southern Israel Hospital [24] and with Child Health Epidemiology Reference Group (CHERG) report [25]. Patients who admitted during patients to nurse ratio (PNR) was high had high risk of dying from pneumonia than patients admitted during PNR low in this study; which is supported by study conducted in Europe that is the higher level of nurse staffing was associated with a decrease in the risk of in hospital mortality [28] and in Hawassa city [23].

Conclusion
The risk factors of survival time of under-five pneumonia patients in the hospital were identified using the Bayesian approach parametric survival models. The Weibull, lognormal and Log-logistic baseline distributions were used in this study. From those baseline distributions Weibull distribution was selected as the good model based on DIC value and Weibull baseline distribution was used as final model to fit under-five pneumonia dataset. Therefore; the results of Bayesian approach Weibull AFT model analysis showed that sex, residence, season of diagnosis, comorbidity, severe acute malnutrition (SAM), patients refer status and patient nurse ratio (PNR) were found to be statistical significant predictors for survival time of under-five pneumonia patients. According to the results obtained from this studies all concerned body should work on awareness by giving health promotions on appropriate and effective treatment in home and early diagnosis to the community to reduce under-five mortality due to pneumonia. The researchers who are interested to investigate on the same area are recommended to introduce frailty modeling to account the correlation which comes from the cluster using Bayesian approach.