A More Robust Random Effects Model for Disease Mapping

Disease mapping studies have found wide applications within geographical epidemiology and public health and are typically analysed within a Bayesian hierarchical model formulation. The most popular disease mapping model is the Besag-York-Molli ́e model. A distinguishing feature of this model is the use of two sets of random effects: one spatially structured to model spatial autocorrelation and the other spatially unstructured to describe residual unstructured heterogeneity. Very often the spatially unstructured random effect is assumed to be normally distributed. Under practical situations, this normality assumption is found to be over restrictive. In this study, we investigate a more robust spatially unstructured random effect distribution by considering the Inverse Gaussian (IG) distribution in the disease mapping problem. The distribution has the normal distribution as special case. The inferences under this model are carried out within a bayesian hierarchical model formulation implemented in WinBUGS. The IG distribution is introduced in WinBUGS using zero tricks. The usefulness of the proposed model is investigated with a simulation study and applied in real data; mapping HIV in Kenya. In this work we showed that the IG distribution can produce better results when the normality assumption is violated due to the skewness of the data. For the case of data in which the random effects are truly normal, the IG distribution adjusts to a normal distribution as dictated by the data itself. On the other hand, the spatially structured random effect is normally modelled using the intrinsic conditional autoregressive (iCAR) prior. This prior is improper and has the undesirable large scale property of leading to a negative pairwise correlation for regions located further apart. In addition, the BYM model presents some identifiability problems of the spatial and non-spatial effects. In this work, we consider Leroux CAR (named lCAR hereafter) prior, a less widely used prior in disease mapping, as the prior distribution for the spatially structured random effects.


Introduction
Disease mapping is the study of the geographical or spatial distribution of health outcomes. Over the past few decades and with the advent of computational methods and statistical methodology, and availability of spatially-referenced data and software tools, disease mapping has increased in popularity in epidemiological research [1,2]. Particularly for modeling rare diseases, it is critical to consider spatial variabilities in order to reliably conduct inference or prediction about the process under study. In disease mapping, usually the object of analysis is to provide (estimate) the true relative risk of a disease of interest across a geographical study area (map). Disease mapping is useful for several purposes such as health services resource allocation, disease atlas construction and in formulation of hypotheses about disease aetiology. Several statistical reviews on disease mapping have been done [3,4,5,6,7].
Most of the work in spatial disease mapping considers the Besag-York-Molli´e model (BYM) proposed by [8] which includes two spatial effects: one assuming a Gaussian exchangeable prior to model unstructured heterogeneity and another one assuming an iCAR prior for the spatially structured variability. Assumption of normality on the uncorrelated random effect in models is common. However, under practical situations, this normality assumption can be incorrect because some random effects can be skewed violating this general normality assumption, see [9,10]. In this study, we investigate a more robust spatially unstructured random effect distribution by considering the Inverse Gaussian (IG) distribution in the disease mapping problem. The IG distribution is a member of the exponential family and it offers a convenient modeling for positive right skewed data. It has two parameters, the location parameter, and the scale parameter . The normal distribution is obtained as a limiting case for the IG distribution. On the other hand, the iCAR prior that is usually used for modeling spatially structured random effect is improper and has the undesirable large scale property of leading to a negative pairwise correlation for regions located further apart [11,12]. In addition, the spatial and non-spatial effects in the BYM convolution model are not identifiable from the data [13,14]. In this work, we also consider the Leroux CAR prior proposed by [15] as the prior distribution for the spatially structured random effects. This prior has been shown to outperform the iCAR prior [16].
This paper is organized as follows. In section 2, a review of the BYM and Leroux CAR models is given. Section 3 focuses on the IG distribution and discusses its limiting distribution, while a simulation to study the effect of misspecifying the random effects is conducted in Sections 4. In section 5, the models discussed are used to analyse the HIV data from 2012 Kenya Aids Indicator Survey (KAIS). In section 6, we present discussions and finally conclusion is given in section 7.

BYM Model
Consider a study region subdivided into n contiguous areas. Let and denote, respectively, the observed and expected counts for a disease in the th area ( = 1, . . . , ). Expected counts for each area can be obtained by applying a standard table of group-specific sex and age rates to each area-specific background population, subdivided by age and sex. The Besag-York-Molli´e (BYM) model has the following set of assumptions: ~Poisson( ) log( ) = + + (1) in which = describes the effect of area-level covariates , whereas vectors = ( , . . . , ) and = ( , . . . , ) represents spatially unstructured and structured random effects respectively. The implied prior on = + is termed a 'convolution prior' since it is the sum of two independent components. Specifically, the following priors are assumed: where :~ denotes the set of all unordered pairs of neighbours, i.e., regions sharing a common border, and hence the sum over all such sets can be written using a precision matrix Q. Here is a zero mean white noise Gaussian process with precision " # ; i.e ∼ <(0, " # > ?) and is a first order intrinsic Gaussian Markov random field [17] with precision " 4 ; i.e ∼ <(0, " 4 > 9 > ) where 9 > denotes the generalized inverse of 9. The resulting covariance matrix of is The full conditional distributions of given all the remaining components > = ( , . . . , > , C , . . . , ) can be expressed as follows: This conditional distribution for is called the intrinsic conditional autoregressive (iCAR) prior distribution.

Leroux CAR Model
In the BYM model, the structured and unstructured components cannot be seen independently from each other, and are thus not identifiable [13,14]. That is, each data point is represented by two random effects but only their sum + is identifiable. An additional challenge, which makes the choice of hyperpriors more difficult, is that and do not represent variability on the same level. While " # > can be interpreted as the marginal variance of the unstructured random effects, " 4 > controls the variability of the structured component , conditional on the effects in its neighbouring areas [7,18]. [15] proposed an alternative model formulation to make the compromise between unstructured and structured variation more explicit. Here, is assumed to follow a normal distribution with mean zero and covariance matrix where H ∈ I0,1J denotes a mixing parameter. The model reduces to a model with independent (exchangeable) random effects if H = 0 and to the iCAR model when H = 1, see [16,14,11]. The univariate full conditional distribution of is given by: , I( >L)C F LJ. 5 3 (7) The basic idea of these models is that, conditional upon its neighbours, is independent of all 7 s at nonneighbouring areas. A neighbouring structure needs to be specified. Many structures have been proposed in the literature. In this study, we follow the most popular approach: areas are considered to be neighbours if they share a boundary, and information about neighbouring is summarised in the × matrix N, whose entries O 7 = 1 if areas and : share a boundary and 0 otherwise (O = 1 completes the specification).
The posterior distribution can be sampled using McMC algorithms such as the Gibbs or Metropolis-Hastings samplers. In this study, we use a Gibbs sampler as conditional distributions for the parameters were available in that formulation.

The Inverse Gaussian Distribution
Several authors, [9,7,8,19], have suggested that it is possible to replace the normality assumption of the spatially unstructured random effect with other choices such as the Laplace distribution or the Student's P distribution. For instance, [9] used generalized Gaussian distribution. In this work we explore the use of the IG distribution as a candidate for the unstructured random effects.
Definition: A random variable Y is said to have an IG distribution if its probability density function is given by; The mean and the variance of this parametrized IG distribution is given by ( = and Z[\( = ] ⁄ . In GLMs, the parametrization , _ 0 = 1 ⁄ is normally used. The normal (Gaussian) distribution is obtained as a limiting case of IG distribution when → ∞ (or _ 0 → 0). See [20] for further discussions on statistical properties of this distribution.

Simulation
In this section, we carry out a simulation study to determine the effect of wrongly specifying the distribution of the random effect in a BYM model. Two scenarios were considered in the simulation. In the first simulation, the dataset is generated through a random effect which assumes IG distribution as follows: Assuming that there are 50 geographical regions and b is the number of disease counts observed in region and is the corresponding expected counts in that region. Without loss of generality, we further assume that no covariates are available for use.
Step 2: Set = 30 for all the regions.
Step 3: Calculate the relative risk = log( Step 4: Calculate = × Step 5: Generate the observed counts as b~Poisson(50, gh[ = In the second scenario, the procedure above is repeated but changing the random effect generation mechanism in step 1, as ∼ <(0.5,0.125 . We fitted two Bayesian hierarchical models for the data set. The models were specified based on different assumptions on the random effects as follows: ~Poisson( log( = log( + The model with a small MSE provides the best fit. In the second scenario, the procedure above is repeated but changing the random effect generation mechanism in step 1, as ~<(0.5,0.125 . From the simulation results, the inverse gaussian random effect is seen to perform well even in cases where the random effect strictly follows a normal distribution. In Table 1, the IG distribution produces lower mean squared error values as compared to the normal distribution in both cases. When the random effects are generated using IG distribution, the loss in efficiency incurred for using normal random effects is 4.7%. When the random effects are generated using normal distribution, the IGD still has a lower MSE compared to the normal counterpart. The loss in efficiency for this case is low at 2.1%.

Application to HIV Data
In this section we apply the model to HIV data collected by the Ministry of Health, Kenya. The data was extracted from the 2012 Kenya Aids Indicator Survey (KAIS), conducted by the Government of Kenya. The main objective of survey was to collect high quality data on the prevalence of HIV and sexually transmitted infections (STI) among adults, and to assess knowledge of HIV and STI in the populations. The survey collected a representative sample of households selected from the eight provinces in the country. In this study, we focus only on HIV cases among adults, that is, men and women in the age of 15-64 years. Two questionnaires were used in the survey. The first one is a household questionnaire which collected information about the household head and the characteristics of the dwelling place. The second one, the individual questionnaire, collected information from men and women aged 15-64 years, about their demographic characteristics, and their knowledge on HIV and STI. All women and men aged 15-64 years in selected households who were either usual residents or visitors present the night before the survey were eligible to participate in the individual interview and blood draw, provided they gave informed consent.
The following eight models were fitted: where and denote, respectively, the observed and expected cases of HIV in the th county ( = 1, . . . ,47). Model estimation was carried out using a Bayesian approach. All parameters in the models were assigned prior distributions. In this analysis, a non-informative normal prior was assigned to the fixed effect coefficient r. The shape parameter was given a gamma prior distribution, and the variance parameters were assigned inverse gamma distributions. The models were implemented using WinBUGS version 1.4 [21,22). For each model, 6,000 Markov chain Monte Carlo (McMC) iterations were ran, with the initial 4,000 discarded to cater for the burnin. The 2,000 iterations left were used for assessing convergence of the McMC and parameter estimation. We monitor McMC convergence using trace plots, see [23]. For model comparison, we adopt the deviance information criterion (DIC) proposed by [24]. DIC is defined as d?t = d + zd, where d is a measure of current model fit and zd is a penalty for model complexity. The best fitting model is one with the smallest DIC value. The overall loss across the data was assessed by the use of the Mean Squared Predictive Error (MSPE), which is an average of the item-wise squared error loss, lm{ = ∑ ∑ 6R − R 7 |} 8 0 (c × g) 7 with R 7 |} being the predictive data item at iteration :, g being the number of observations and c is the sampler sample size. The best model for prediction is the one with the lowest MSPE value. The results for this analysis are given in Table 2 below. From table 2 above, it can be seen that models whose unstructured random effects follow IG distribution have quite small DIC values in comparison to the models with normally distributed unstructured random effects. Similarly models whose structured random effects are modelled with lCAR prior distribution have smaller DIC values as compared to models with the iCAR distributed structured random effects. This confirms that the IG and lCAR prior models produce better results than the popular normal and iCAR prior models. In particular, the IG convolution model (model 8) provides the smallest DIC value. Thus, Model 8 is the best model in terms of goodness-of-fit measures. It also has the lowest MSPE value as compared to the other models indicating that it has a good predictive behaviour. Figure 1 shows the spatial distribution of HIV in Kenya in 2016 based on the best fitting model (model 8). This is a map of relative risk and its corresponding credible interval.

Discussion
Correlated data are usually modelled using generalized linear mixed effects model through the inclusion of subjectspecific, random effects. Very often such random effects are assumed to be normally distributed. The normality assumption for the random effects has been both criticised and supported by several authors [25,26,27].
A similar situation arises in disease mapping context. The most popular model is the BYM model. This model has two sets of random effects: one which is spatially unstructured and the other which is spatially structured. The spatially unstructured random effect is usually assumed to have a Gaussian exchangeable prior distribution. However, this normality assumption can be incorrect and too restrictive because some random effects can be skewed violating this general normality assumption, see [9,10]. The spatially structured component is usually modelled by an iCAR prior distribution. However, this prior has been shown to produce negative correlations for regions located further apart [11,12]. In addition, the variance components in the BYM convolution model are not identifiable from the data; see [13].
This study has revealed that more robust and flexible distributions for the random effects can be used to improve the model fit in disease mapping models. In particular, we have considered IG distribution for the spatially unstructured random effects. This distribution contains the normal as a special case. For the spatially structured random effects, we propose Leroux CAR prior [15] which is a less widely used prior in disease mapping.

Conclusion
In this paper, we have shown that models whose unstructured random effects follow IG distribution perform better than those with normally distributed unstructured random effects. Similarly models whose structured random effects are modelled with the Leroux CAR prior distribution perform better than those models with the iCAR distributed structured random effects.
We adopt a bayesian modeling approach and assign prior distributions to all model parameters. This approach offers several potential advantages over classical (e.g. maximum likelihood) estimation procedures. First, bayesian inference allows us to express uncertainty about model parameters through prior distributions. Second, by incorporating recent developments in McMC methods [28], including Gibbs sampling, bayesian models provide a flexible way to handle complex non-linear regressions such as ours. This was implemented easily within WinBUGS. Since the IG distribution is not a standard distribution in the WinBUGS software, it was introduced in the software using zero tricks. We assessed McMC convergence of all models parameters by checking trace plots and autocorrelation plots of the McMC output. The models were compared using the DIC as suggested by [24]. In addition, the overall loss across the data was assessed by the use of the Mean Squared Predictive Error (MSPE).
The models were compared using a simulation study and a real data set. In the simulation study, it was noted that the effect of misspecification of the random effects when the normal distribution is used in place of the IG distribution was high as compared to using the IG distribution in place of the normal distribution. IG distribution is often used as alternative to the normal distribution because of the similarities between the inference methods for these distributions. In fact the normal distribution is a special case of the IG distribution. When the random effect distribution fails to adhere to the normality assumption due to skewness, the IG distribution plays a big role in capturing this, something that the normal distribution cannot.
In the real data sets comparison, it can be seen that models whose unstructured random effects follow IG distribution generally perform better than models with normally distributed unstructured random effects. Similarly, models whose spatially structured random effects are modelled with Leroux CAR prior distribution perform better than those modelled with the iCAR prior. The IG convolution model (model 8) is the best fitting model. This model was used to produce the county relative risk maps of HIV in Kenya for the year 2016. Disease maps play a key role in descriptive spatial epidemiology. Maps are useful for several purposes such as identification of areas with suspected elevations in risk, formulation of hypotheses about disease aetiology, and assessing needs for health care resource allocation.