Analysis of TB Case Counts in Southwest Ethiopia Using Bayesian Hierarchical Approach of the Latent Gaussian Model

Introduction: Tuberculosis is the long-lasting infectious disease caused by bacteria called Mycobacterium tuberculosis. Globally, in 2016 alone, approximately 10.4 million new cases have occurred. Africa has shared around 25% of the incidence and specifically in Ethiopia around 82 thousand was caught by Tuberculosis. Methods: The study has been conducted in, south west Ethiopia, Jimma zone of entire districts and the data is basically secondary which is obtained from Jimma zone health office. The counts of Tuberculosis case counts have been analyzed with factors like gender, HIV coinfection, Population density and age of patients. The Integrated Nested Laplace Approximation (INLA) method of Bayesian approach which is fast, deterministic and promising alternative to MCMC method was used to determine posterior marginal of the parameters of interest. Results: The Latent Gaussian Model (LGM) of Poisson distributional assumption of Tuberculosis cases that includes both fixed and random effects with penalized complexity priors appeared to be the best model to fit the data based on the Watanabe Akaike Information Criteria and other supportive criteria. Using Kullback-Leibler Divergence criteria, the under-used simplified Laplace approximation indicated that posterior marginal was well approximated by normal distribution. The predictive value of the best model is not far deviated from the actual data based on the Conditional Predictive Ordinate and the probability integral transform. Conclusions: All the variables were significant under this model and the posterior marginal was well approximated by standard Gaussian. The PIT indicated that predictive distribution was less affected by outliers and the model was reasonably well.


Background
Tuberculosis (TB) is a chronic infectious disease caused by a bacillus belonging to a group of bacteria grouped in the Mycobacterium tuberculosis complex and remains an important public health problem of the 21st century according to WHO [64][65][66].
Globally, in 2016 alone, approximately 10.4 million new cases (range from 8.8 million to 12.2 million) which are equivalent to 140 cases per 100000 have occurred worldwide. According to the reports of [65], the most estimated number of TB cases is in the WHO South-East Asia Region (45%), the WHO African Region (25%) and the WHO Western Pacific Region (17%). Similarly, smaller proportions of cases occurred in the Eastern Mediterranean Region (7%), the WHO European Region (3%) and the WHO Region of the Americas (3%) and 1.8 million deaths of tuberculosis were reported [2,65].
According to the [65,66] report, Africa is not among the regions registered to have a declined in TB mortality rates. In 2016, the total notified TB cases in this region was 1303483 with 84% of pulmonary cases which intake an estimated MDR/RR-TB cases of 40000 (ranging from 36000 to 44000) among notified pulmonary TB cases. The estimated TB Endale Alemayehu et al.: Analysis of TB Case Counts in Southwest Ethiopia Using Bayesian Hierarchical Approach of the Latent Gaussian Model treatment coverage in the WHO Africa region is only 49% [65]. Different reports and studies certified that Ethiopia has only limited resources to spend on combating tuberculosis and multidrug-resistant tuberculosis. It ranked the ninth among the world most TB burden country and is one of 27 MDR TB high burden countries. In another way the number of deaths due to TB cases without HIV co-infection, was estimated to be 26 thousand where the death rate is 25/100000 and whereas 4 (2.7-5.4) thousands of HIV coinfected died [13,14,65].

Statements of the Problem
Different study reported from various parts of the country showed that the prevalence of smear-positive cases ranged from 33 to 213.4/100,000 people in Ethiopia. This burden of the diseases was gradually increased till 2016 [2,13] According to different studies, the Bayesian approach have empowered over the frequentist based on analysis using TB cases and simulation studies [17,29,47]. But, the Bayesian methods in those studies were based on MCMC sampling technique which has the burdensome of time-consuming, convergence problem and Monte Carlo error. This implies that analysis based on the MCMC technique could essentially affect the conclusions of researches.
In this research, we apply the latent Gaussian model with INLA methods under the framework Bayesian hierarchical paradigm. Based on this method, this study have attempted to answer the basic research questions on: whether there is variation in the distribution of TB cases among the districts of Jimma zone, whether changes in prior assignment really affect the candidate model to be selected.

General Objective
The general objective of this study is to model the counts for TB cases in Jimma zone, southwest Ethiopia, using the Bayesian hierarchical approach of the latent Gaussian model with INLA method.

Specific Objectives
The research includes the following specific objectives:

The Significance of the Study
The results of this study may help the organization as well as individuals who work in this area to get a clue on to what extent TB distribution is serious across the districts of Jimma zone. It is an additional about the trend of TB prevalence to the previous studies. Furthermore, this research can assist for those who are interested to work in this area for the future. The result of this study will also be expected to be helpful for policy makers in TB concern agendas and strategies.

Study Area
Jimma is one of the zones in the Oromia regional state of Ethiopia and is named for the former kingdom of Jimma, which was absorbed into the former province of Kaffa in 1932. The capital town of the zone is Jimma which is the largest city in south-west Ethiopia. Recently the zone includes around 22 districts. Based on the 2007 census conducted by the CSA, this zone has a total population of 2,486,155 and has an area of 15,568.58 square kilometers. It has a population density of 159.69.

Source of Data
The data for this study was mainly based on the secondary data that has been obtained from Jimma zone and Jimma district health office except for data related to population density. All the cases registered on the data base of the office from September 2016 to August 2017 have been considered.

Response Variable
The dependent variable of this study was the count of TB cases (all forms) in each district of Jimma zone recorded under the health office from September 2016 to August 2017.

Explanatory Variables
According to the different reviewed study discussed in literature parts, the explanatory variables considered for this study which has been registered in the health office were gender, HIV co-infection, population density and age of patients.

Methods of Data Analysis
In any research design, an appropriate data analysis plays a crucial place for the relevance of data under consideration. Thus, to fit the data well, the researcher has been passed through different stages of data analysis for which the techniques were presented under sub-sections here below.

Latent Gaussian Model
The structured latent Gaussian regression models amenable to INLA-based inference can be defined in terms of three layers: hyper-parameters, latent Gaussian field, likelihood model. The univariate likelihood captures the marginal distribution of data and is often chosen as an exponential family similar to the framework of generalized linear models.
For those exponential family models, the link function is used to have the linear relationship with the response variable. Hyper-parameters can appear in the likelihood as dispersion parameters like the variance of the Gaussian distributions [43]. Formally, the Latent Gaussian Model (LGM) can be written as:

Integrated Nested Laplace Approximation
Integrated nested Laplace approximation (INLA) is a recent approach to Bayesian statistical inference for latent Gaussian Markov random field models introduced by [50]. It provides a fast, deterministic alternative to Markov chain Monte Carlo (MCMC) which, at the moment, is the standard tool for inference in such models of Bayesian inference [49] The main goal of the approximation techniques used in the analysis of the latent Gaussian model is to compute posterior marginal for each component of x of expression (2). Generally, the marginal posterior distributions for each element of the parameter vector can be formulated as: And the marginal posterior distribution for each element of the hyper-parameter vector: Now, our intention was to compute 7( / ) from which all the relevant 7( < / ) can be obtained and to determine 7( / , ) which is needed to compute the parameter marginal posteriors7( / ) [6,50].

Prior Distributions of Parameters
To do with Bayesian inference, the choice of prior distribution is a vital issue as it represents the information that is available for the parameters of interest.
The default priors are the inbuilt priors of R-INLA packages of INLA function in which the researchers need not to further assign the other priors. It was widely used by different researchers [5,6,36,50]. According to the study by [50] (who was the developer of R-INLA packages and INLA function), these default priors were considered as weak informative priors that were checked under different conditions before officially encoded under the R-INLA packages. If the observation is assumed to follow the Poisson distribution, for intercept, INLA assign zero for both mean and precision; i.e. Normal (0,0) and all the fixed parameters are assigned mean zero and precision 0.001; meaning that they have Normal (0,0.001) priors. Whereas the random effect (district for our case) is also Gaussian with mean zero and precision parameter. Finally, the precision parameter in the random effects is also assigned to other distribution of log gamma which is log-gamma (1,0.00001) [6,50].
The other and very applicable recent priors are the one called Penalized Complexity prior (PC) priors. It was developed by [52] and was an informative prior. PC priors are general enough to be used in realistically complex statistical models and are straightforward enough to be used by general practitioners. Using only weak information, PC priors represent a unified prior specification with a clear meaning and interpretation. With this type of priors, researchers were agreed with its advantages of controlling the heterogeneity in random effect as it defined with the results of the standard deviation of residuals in the fixed effect [52].

Posterior Distribution
A great advantage of working in a Bayesian framework is the availability of the entire posterior probability distribution for the parameter (s) of interest. Obviously, it is always possible and useful to summarize it through some suitable synthetic indicators. Summary statistic typically used is the posterior mean, which, for a hypothetical continuous parameter of interest , is: Where B all the possibilities that the variable can assume and the integral becomes summation if the value of is assumed to be discrete. Moreover, it is also possible to determine the indicators of which divide the probability in a very convenient way.

Results
Under this section of data analysis, the researcher tried to answer the basic research questions and attained to address the objectives by modeling the data with the appropriate model fit. In order to further go for the model, we have started with the simplest frequency table which has the power to intend the appropriate candidate model.
Thus, using the concept of INLA of the Bayesian framework, the results of the models with different fixed and random parameters considering the assignment of priors have been discussed stepwise here below. The results obtained from the different model of this study were compared by using standard statistical tools of model selection and comparison so that to filter out the relative best model in approximating the posterior marginal well.

Descriptive Data Analysis
Without considering the effect of sex and ages, Nono bench district accounted minimum (2%) TB cases, whereas Sekachokorsa recorded to have the highest (12%). The numbers of male cases in each district were greater than those of females, except for districts of Agaro, Gomma, Limmu Seka, and Kersa. The two extreme age classes (0-14 and >54) have lower TB cases which found to be 10% and 14% of the total cases; whereas the two middle class ages (15-34 and 35-54) were relatively more affected group; which is 30% and 46%.
The trend seen in the figure 1 below (in the figure legend) is also indicated the distribution of TB cases across all districts of Jimma zone.

Model Based Data Analysis
The summary data in table 1 (in the table legend) below was the outcome of Poisson distributional assumption of Tb cases under LGM which include both fixed and random effects with penalized complexity priors. The intercept exp (1.0703)=2.916. When the sex=female, age=0-14 and covariates HIV co-infection and population density were held constant, the average incidence rates of TB in Jimma zone was found to be 2.916. This is because the intercept was interpreted under the reference categories of categorical covariates and assumed when the effect of continues variables were zero. The incidence rate of TB with the male was 1.114 times greater than those of female. This is to mean that around 11.4% more diseases with males. Each category of ages was also significant for the occurrences TB cases. Compared to those aged 0-14, people with age 15-34 developed TB incidence rates by 3.12 times more. In the same fashion, the incidence rates of people aged 35-54 was 5.15 times greater than those of aged 0-14 and those in the age interval of >54 were 2.01 times more likely to have TB incidence rates than those ranged in the age 0-14.
For a one unit increase in the HIV co-infected people, the TB incidence rate was increased by 1.044 (4.4%). This result has an implication that the number of HIV co-infected in this study seems not to such signs in determining the relative risk of TB cases. On the other hand, the incidence rate of TB was increased by 1.0034 (0.34%) as population density was increased by one unit. Even though the population density was found to be significant, under this model, the coefficient value indicated that this variable was not such potential in determining the occurrence of TB cases for this study.
The Kullback-Leibler divergence (KLD) describes the difference between the standard Gaussian and the under-used Simplified Laplace Approximation (SLA). Therefore, with this model, since the values of KLD irrespective of all covariates were zero, the researcher can generalize that the marginal posterior densities were well approximated by the Normal distribution. Thus, the under-used SLA which is the default approximation method in INLA function, in determining the densities of posterior marginal was defined as having good (small) error rate and no need to use the more computationally intensive technique full Laplace approximation.
The summarized result of table 3 below (in the table legend) is the posterior marginal distribution of districts variation of tuberculosis with penalized complexity priors. The interpretation of posterior marginal for the precision of the random effect district is more general and bit difficult to interpret because it is on the scale of 1/variance. On the other hand, it is not possible to take the reciprocal of the (squarerooted) summaries to obtain information about the posterior distribution of the standard deviation, because the transformation is not linear.
Thus, the researcher preferred to compute posterior marginal with the scale of standard deviation. The average standard deviation of the variation of TB cases across districts was 0.294 with (0.207, 0.416) credible interval. The diseases with the other were also varied in between. Generally, the appendix indicated that there is variation in TB cases across districts of Jimma zone.

Model Comparison
In order to select the model which was relatively best fit the data, the researcher has intended to compare the model in two phases. The candidate models were: Model 1: LGM with covariates of fixed effects only and default priors.
Model 2: LGM including covariates of both fixed and random effects with default priors.

Model 3: LGM including covariates of both fixed and random effects with PC priors.
Thus, at the first stage of this model comparison, model 1 and model 2 have been compared in order to see whether the random effect has a significant effect or not. Then, to get valued on the robustness of the priors, model 2 and model 3 have been compared which in fact is to see the actual changes on the model as the priors on the parameters were changed. All the models were compared with the standard model comparison techniques, WAIC and other supportive criteria. Table 2 is the summary results of WAIC, the effective number of parameters and number of equivalent replicates for the aforementioned three candidate models with the different number of parameters and/or under different priors.
At the first stage of model comparison, model 2 which is the model with covariates of both fixed and random effects under the assumption of default priors, have less WAIC (1105.25) as compared to model 1, WAIC (1300.79) which in fact includes only fixed effect with same priors. Thus, by the operational definition that the smaller the WAIC, the better the model fit, the researcher can prioritize model 2 as the relative better fit of the data under study. It also supports that including districts as the random effect has advantages in order to handle variation in incidence rates of TB across districts.
Once the model with both fixed and random effects under default priors was selected, the researcher was able to compare the same model under different priors which help to more ascertain the robustness of the priors. We did this because it helps to avoid the problem of model fit due to bad priors and also used for further investigation as for whether the recent informative PC priors was efficient than the R-INLA inbuilt default priors or not.
Thus, based on the results of Table 2 below (in the table  legend), the WAIC for model 3 which was 1104.27 was relatively smaller than that of model 2 which was 1105.25. However, different literature said that the models of the same parameters were considered to be significantly different, if their WAIC were at least 3-5 differences [39,53]. As the rule of model difference based on the WAIC's value difference is rule of thumb, other technical methods of model comparison have been used to see the clear difference between the models. Hence, the concept of the effective number of parameters and number of equivalent replicates were applicable here. Since the expected number of effective parameters is basically the number of independent parameters included in the model, the smaller is the better the model. This is because, at any stages of model comparison, the intention of the researcher was to come with the best model of the smaller parameter. Besides, the number of equivalent replicates is the result of sample size per effective Endale Alemayehu et al.: Analysis of TB Case Counts in Southwest Ethiopia Using Bayesian Hierarchical Approach of the Latent Gaussian Model number of parameters in the model and thus the smaller is an indication of poor fit. But, the difference in both effective number of parameters and number of equivalent replicates for model 2 and 3 seems not such significant. However, since there were no clear-cut rules that judge to decide on the size of difference on such comparison techniques, the researcher has been forced to compare the models without valued the size of the difference in those model comparison methods.
Therefore, since the effective number of parameters of model 3 (24.74) was slightly smaller than that of model 2 (25.08), we can say that the candidate model with PC priors was relatively better in fitting the data well. Moreover, the number of equivalent replicates of model 3 was very slightly greater than that of model 2 and this also still has some information to decide that model with PC prior was comparatively better.
On the other regards, we extended our evidence from the perspective of standard errors by considering the results of standard deviation for the precision of the district (random effect) based on tables 1 and 2 (in the table legend). Recalling the direct proportion between standard error and standard deviation of the same sample size, we can say that the greater the standard deviation, the larger the standard error is. Hence, the standard deviation for the precision of districts in the model with default priors was 4.489 and that of the model with PC priors was 3.911, and considering the truth that the smaller the standard error, the efficient the model was, we still can support model with PC priors was better in fitting the data. Additionally, since the credible interval for the precision of districts of the model with PC priors (4.526, 19.73) was narrower than that of the model with default priors (5.771, 23.24), this may also assist the researcher to conclude that model with PC priors was relatively better in fitting the data.
Generally, considering the collective evidence detailed above and since PC priors are informative priors, we finally selected the LGM of Poisson distributional assumption of TB cases including covariates of both fixed and random effects with PC priors as the best model.

Discussion
The descriptive results of the study indicated that the number of males with TB cases (53%) was greater than the number of females with the same cases. These results were in line with WHO reports of 2017, which also presented as the number of males with TB cases was greater than females worldwide. Similarly, the number of TB counties of middleaged people was greater than the two extreme categories of the ages and this also matches with the truth existed throughout the world [65] and different studies from Ethiopia were also persisted with this results [21,22]. Besides, the descriptive summary clearly showed that the counts of the cases were varied from one district to the other and empowers the researcher to bear in account whether the number of population of each district may also have effect for the variability of the cases across the districts.
In order to assimilate the variation in the population size across districts with the corresponding TB cases, the offset variable was included in the correction factor. The offset in a sense means that the expected counts of TB cases in each district and especially used to correct the number of events (TB cases). The offset is the special type of variable that was widely applicable when the observation was assumed to have Poisson distribution with the known slope of 1 that helps to adjust the problem due to variation in population size from one district to the other.
Some of the researchers have been considered this adjustment under different dataset [6,31]. But, many studies that included geographical variation as random effect had missed these potential terms offset which used to weighted (corrected) the effect of miss many numbers of events and population size [27]. Thus, our study has been filling the gap with the miss used of the offset variable.
At the first stage of the model fit, the LGM with Poisson distributional assumption of the observations has been fitted with covariates of fixed effects only under R-INLA inbuilt default priors. The variables age, HIV-co-infection, and population density were found to be significant. In order to check the effectiveness of simplified Laplace approximation method that applied in this model, the researcher considered the value of KLD in which the minimum the KLD is the less difference between the standard Gaussian and the Simplified Laplace Approximation. In our case, since the value of KLD corresponding to all the variables was zero, the SLA was reasonably well in approximating the value which expected from standard Gaussian [26].
The LGM of Poisson distributional assumption of the observation which includes both fixed and random effects with default priors revealed that all the covariates have significant effects on the incidence rates of TB. The efficiency and relevance of the model were supported by the work of different researchers which in fact applied for the different dataset [5,36,50]. Moreover, the significance of the variables in this study was persistent with the finding of different researchers [51,55]. Since KLD result was, found to be zero, the under used SLA has well approximated the standard Gaussian and no need to go for further intensive approximation methods like full Laplace approximation [26].
The other model called LGM of Poisson distributional assumption of observation with both the fixed and random effects under PC priors was applied. The same to model with default priors, all the covariates were found to be significant and the KLD values were also zero; meaning that SLA approximately had the same results with standard Gaussian. The developer of PC priors [52] has been checked the effectiveness of the priors with simulated data and other few European researchers including Professor Havard Rue who is famous and developer of R-INLA program [50] had also exercised with the same simulated data. Thus, since the prior was developed in a very recent time and is informative, the authors were intended to apply for this actual data.
In order to make the model comparison, the researcher preferred the WAIC model comparison technique because of theoretical reasoning and inclusive advantages of the method. For clarity purpose, the three candidate models were compared in two phases; at the first stage, model 1 and 2, and at the second stage model 2 and 3 were compared so that to selected the best model which fitted the data well.
The results of WAIC indicated that model 2 which was the LGM of Poisson distributional assumption with both fixed and random effects under default priors was better than model 1 which was the same to model 2 except it includes only fixed covariates. Then, to check further for the effects of the priors, model 2 was compared with model 3 which was similar to model 2 except priors' assignments. Thus, since the WAIC of model 3 was smaller than that of model 2, the LGM of Poisson distributional assumption with both fixed and random effects considering PC priors was selected as the relative best model to fit the incidence rates of TB cases in Jimma zone. The advantages of comparison of models with different priors were certified under previous studies [52,57].
The random effect in the study was found to be significant and varied across the districts. This is an indicator that including districts as random effect here is advantageous so that to identify the district (s) with the highest TB cases. With this study, therefore, Sekachokorsa district was found to be the most severed districts. Previous studies also are also consistent with this result that TB cases were varied across the geographical regions [27,31].
The CPO and PIT were used for model checking. Before further go for graphical model checking, the researcher intended to check whether the usual numerical problem occurred during the computation of CPO. Thus, since the sum of the number of failure in CPO was zero, no failure was detected and meaning that no numerical problem has occurred. The histogram and scatter plot of PIT indicated that the predictive residual based values were almost uniformly distributed with very few deviated outlier and we can get reasonable that the predictive distribution matches the actual data. Besides, the same graphs of CPO also indicated that most of the observed predictive values have the same distributional shape with the tolerance of surprising observation. Therefore, based on the plots of both CPO and PIT, the predictive values seem not significantly affected by surprising observation and extreme outliers [37,50].

Conclusions
For better attainment of the model fit, three different candidate LGMs namely: Poisson distributional assumption of TB cases of fixed effects only with default priors, Poisson distributional assumption of TB cases of both fixed and random effects with default priors and Poisson distributional assumption of TB cases with both fixed and random effect under PC priors have been fitted. Thus, based upon the WAIC and other supportive model comparison technique, the LGM of Poisson distributional assumption of TB cases which includes both fixed and random effects with PC priors has been selected as the best model that fits the data well. All the covariates under the best model are found to be significant. Having the computational advantages of SLA and its better approximation in the data, the researcher preferred not to use the more computationally intensive full Laplace approximation.
Based on the findings of this study, the researcher recommended the following points for researchers, Jimma health office and individuals interested in any sub-work of this study.
All the covariates in this study are significant factors of TB cases. Thus, Jimma zone health office and other health sectors should have to focus on controlling TB cases with special focus to districts that have a high severity of the disease.
The posterior marginal of this study is totally determined with the methods of INLA which actually is very fast and has a less computational burden. Thus, the researchers should strongly recommend so that to compare the methods.
Interested researchers are recommended to extend this work by including fully spatial covariates in order to map and identify the hot-spot areas. This is very flexible and widely applicable.

Declaration
During conducting the study, the investigators have included the following declarations.

Ethics Approval and Consent to Participate
The Research Ethics Review Board of Jimma University has provided an ethical clearance for the study. The data was taken from Jimma zone health office, and the formal cooperation letter was written from college of natural science research office to the Jimma zone health office where data was obtained. The study conducted without individual informed consent; because it relied on retrospective data.