Use of Space Time Model for Forecasting Mortality due to Malaria: A Case of Ifakara and Rufiji Health and Demographic Surveillance System Sites

Malaria is a leading cause of morbidity and mortality in developing countries especially in rural areas where local resources are limited. Accurate disease forecasts can provide information to public and clinical health services to design targeted interventions for malaria control that make effective use of limited resources. Using verbal autopsy data, space-time model was used to forecast mortality due to malaria. The study used longitudinal data which were collected from Rufiji and Ifakara Health Demographic Surveillance System (HDSS) sites for the period of 1999 to 2011 and 2002 to 2012 respectively to assess models. The models included environmental factors and mosquito net ownership as predictor variables for mortality due to malaria. Deviance information criteria (DIC), logarithm score and root mean square error (RMSE) were used to assess the goodness of fit and forecasting accuracy of the models. The results indicate that the model included spatial and temporal random effect terms had small deviance information criteria, logarithm score and root mean square error. This model was the best model for forecasting and prediction of mortality due to malaria in both HDSS sites. In addition, mosquito net ownership and rainfall were significantly associated with mortality due to malaria. The model with spatial and temporal random effect terms is useful tool to provide reasonably reliable forecasts for mortality due to malaria. This might help to design appropriate strategies for targeting malaria control. On the other hand, including spatially and temporal varying random terms in the model is necessary and good strategy for modelling mortality due to malaria.


Introduction
Resources allocation and malaria control interventions need to be spread in space and time to enhance effective malaria control [1]. Having a forecasting system in place may help to have a more focused approach with positive impact on the resource allocation for malaria control over space and time. However, there are fewer studies which have investigated on forecasting models for mortality due to malaria [2,3]. The studies indicate the need for improved epidemic early warning by incorporating important predictors such as meteorological factors.
Malaria incidence forecasting models have been developed in many endemic countries [5][6][7][8][9]. These models mostly used time series models and other use climate related predictors, such as rainfall, temperature and normalized difference vegetation index. However, these models did not include other important predictors for mortality related to malaria such as mosquito nets, availability of anti-malarial and drug resistance. In addition, these models did not include spatial effect which could capture heterogeneity in malaria transmission. The accuracy of forecasts may be weakened due to transmission-reducing predictors were not considered in the models [7]. Health and demographic surveillance system (HDSS) data, however, are becoming increasingly available with the routinely collecting data and adopting the verbal autopsy approach [8]. Verbal autopsy (VA) approach is an indirect method of determining causes of death from information on symptoms, signs and circumstances preceding death, obtained from the deceased's caretakers. VA has been widely used as a method for determining causes of death in the country where vital registration system is lacking and majority of deaths occur outside the formal health care system [9]. The VA is widely used to provide more information on deaths whose causes are unknown. Several studies have evaluated the validity [10,11] and accuracy of VA in determining the causes of death and malaria specific cause of death [12,13]. These studies concluded that VA is the reliable source of data that can be used to estimate specific causes mortality in all age group for public health [13].
This study assess space time models for forecasting mortality due to malaria for resource allocation and optimization of the effects of malaria control interventions over space and time.

Study Design and Study Area
This was longitudinal study design using secondary data collected from Rufiji and Ifakara Rural Health and Demographic Surveillance System (HDSS) sites [14,15] in Tanzania. Rufiji HDSS is located in Rufiji District, Coast Region while Ifakara HDSS is located in part of Kilombero and Ulanga District in Morogoro Region. These two HDSS sites were selected because they are among HDSS sites which continuously collect large amount of longitudinal data in well defined geographical areas over time in Tanzania. These HDSS sites are characterized by malaria transmission endemic after the long rain season. The description of the study area is detailed elsewhere [14,15].

Data Source
This study utilized secondary data which were prospectively collected from1999 to 2011 and 2002 to 2012 for Rufiji and Ifakara HDSS sites respectively. The two HDSS sites consistently records pregnancy outcomes, deaths and migrations by visiting households once every four months since 1997 in Ifakara HDSS and 1998 in Rufiji HDSS. In additional, the HDSS sites collect information on social economic status including household ownership of mosquito net annually. The study period covers national scale up of ITNs that occurred in the country [16]. The mosquito nets in this study included both untreated and treated nets (ITNs). Yearly numbers of malaria deaths were extracted from the Rufiji and Ifakara HDSS database covering a period of 1999 to 2011 and 2002 to 2012 respectively. The details on data collection, management and credibility in HDSS have been describe elsewhere [14,15].
The Verbal autopsy procedures and collect data used in this study are described in detailed elsewhere [8,17]. In the HDSS, deaths were captured during rounds of data updates. Then HDSS field interviewers visited the deceased's home after a grieving period to administer a verbal autopsy questionnaire. A face to face interview was administered to relatives or caregivers who were closely associated with the deceased during the period leading to his or her death. The questionnaire was used to explore the identity of the deceased and established the sequence of events leading to death, including symptoms and signs of the illness before death. Verbal autopsy was carried out since 1998 in Rufiji HDSS and 2002 in Ifakara HDSS. The verbal autopsy forms are independently reviewed by two physicians according to a list of causes of death based on the 10 th revision of the International Classification of Diseases (ICD-10). Causes of death (main, immediate, and/or contributing) are coded to be consistent with the ICD-10 [18]. Malaria deaths are coded as direct cause of death when malaria is the underlying cause of death or indirect cause of death when malaria is one of the several diseases leading to death but the death is attributed by a different cause) [19].
Climate data were obtained from Tanzania Meteorological Authority (TMA) head office in Dar es Salaam. These data includes monthly mean rainfall, and maximum and minimum temperature. TMA provides meteorological services, weather forecasts, climate services, and warnings including daily forecast information for each region in Tanzania. The climate data are collected through different gauges located in different stations in each district. In Ifakara HDSS, villages located in Kilombero district used climate data from Kilombero Agricultural Training and Research Institute (KATRIN) weather station. Other villages in Ifakara HDSS used Mahenge weather station data for villages located in Ulanga District. In Rufiji District, Utete and Kibiti are the two weather stations which records climate data. Kibiti weather station is located in Rufiji HDSS site which provided data used in this study.
Remote sensing data were extracted from MODerateresolution Imaging Spectroradiometer (MODIS) on board NASA's terra satellite. The remote sensing data includes Normalized Difference Vegetation Index (NDVI) and processed from MODIS (MOD13A3) using monthly composite images at a 1 km x 1 km resolution. Vegetation indices are used for global monitoring of vegetation conditions and are used in products displaying land cover and land cover changes. ERDAS Imagine software version 10.1 was used for processing the satellite images, and ArcGIS version 10 was used for spatial analyses. Mean NDVI were calculated for each village in each year within the study area to link with mortality due to malaria and ownership of mosquito net for each village.

Data Processing and Analysis
Aggregated yearly numbers of malaria deaths were extracted from the HDSS database by each village for the period of 1999 to 2011 and 2002 to 2012 in Rufiji and Ifakara HDSS respectively. The data was aggregated yearly to reduce zero count in each village for appropriate assumption used in Poisson model. Time at risk (personyears) contributed by each person was calculated from 1 st January until exit or 31 st December for each year. Exit from the study was due to migration (outside the HDSS area), death or end of the study. In a case where a person migrated to a different household location within the study area, time at risk was computed separately for new location and added to the total person time at risk. The outcome of interest was a total yearly death due to malaria for specific age groups. Age was categorized into two groups i.e. under five and five and above.
The outcome variable was aggregated at village level to capture heterogeneity for mortality in the study area as reported in previous studies [20,21]. Direct estimates of mortality due to malaria rates were calculated in order to highlight spatial-temporal trends of mortality related malaria in the study areas. The mortality due to malaria rates were calculated by dividing the number of malaria deaths by the person-years of observation and were expressed per 1,000 person-years (py) for each village. Direct estimates of the mortality due to malaria rates were obtained as: Where is the mortality due to malaria rate at j time in i village,Y is the number of malaria deaths at time j in village i. representsthe number of person-years in i village at time j and represent the exposure to risk.

Space Time Model
The model was intended to capture both temporal and spatial features of mortality due to malaria trends or along with spatial-temporal interactions. The developed space time model will be useful to explain the patterns of malaria mortality rates in terms of environmental factors and ownership of mosquito nets. Previous studies have proposed hierarchical Bayesian spatial-temporal modelling along with Poisson model [22][23][24].
Spatial-temporal modelling approach within a hierarchical Bayesian framework was adopted. The spatial aspect of the modelling approach allows for taking into account similarities between values observed at locations across space. Similarly, the temporal aspect of the modelling approach allows for inference concerning temporal trends of changes in malaria mortality/deaths. Finally, the proposed model allows for spatial and temporal trend analysis of the data as well as considering the effect of predictor variables [25].
The Bayesian models requires the data, process and parameters to be modeled [26]. Using data extracted from HDSS database and other source, the covariate information included in this study for village i at year j were mosquito net ownership as exposure variable and Other variables were rainfall, temperature and NDVI. For this study i the number of village in the study, I = 33 for Rufiji and 25 for Ifakara HDSS and j for years, J = 13 for Rufiji HDSS and J = 11 for Ifakara HDSS.
The general specification of the hierarchical space-time model can be represented by the following hierarchical structure, which simultaneously model the spatial effect, temporal effect and space time interaction as described in [27]: Where denotes the mortality due to malaria rate. Random terms and are spatial terms, is temporal random effect and is space-time random effects. Covariates information for village i at year j are denoted by with being the corresponding regression coefficients of k variables varying over time.
The random effects specifications for the spatial term includes unstructured noise that follows a normal distribution and structure term which modelled with the set of villages that adjacent to other villages by weight of neighbouring village. Weight is considered to be 1 when two villages share the same boundary and 0 otherwise. The prior distribution of the random effect term is a conditional autoregressive prior used to model the spatial dependence.
The temporal effect can be specified in a number of ways, which will determine the prior distribution that was assumed for the term . In this study time effect as first order autoregression AR (1) given by the partial autocorrelation function. The distribution of the interaction term is characterized by a precision matrix obtained as the Kronecker product of the precisions of structure and unstructured noise terms [28].
From the general equation in (2), five sub-models were fitted considering different ways of entering with each of the random terms and covariates in the model. Five models considered with different combination of random effects and covariates as follows: Where represents a model with covariates only; ! ) is a model with covariates, a spatial random term and a temporal term; # ) is a model with covariates, spatial component terms and a temporal term; $ is a model with covariates, spatial component terms, and two temporal terms (first and second order); % is a model with covariates, spatial, temporal, and an interaction term.
The analysis in this study used the Integrated Nested Laplace Approximation (INLA) R-package in R software available at http://www.r-inla.org to fit five models defined in equation 3 to 7.INLA is a recent computational approach used to perform approximate Bayesian inference based on an efficient combination of Laplace approximations and numerical integration. The INLA is designed for latent Gaussian models; a very wide and flexible class of models ranging from generalized linear mixed to spatial and spatiotemporal models. The package does not sample from the posterior distribution like Markov Chain Monte Carlo (MCMC) [29]. This approach provides approximates the posterior with a closed form expression and more precise estimates in seconds or minutes. Also, the INLA compute model comparison criteria and various predictive measures so that models can be compared without convergence and mixing problems. The method is best suited to Bayesian hierarchical models for which there are large number of unknown parameters following a Gaussian Markov random field and a small number of hyperparameters, with a specific form of prior covariance on the parameters. This analysis used data for 1999 to 2010 in Rufiji HDSS and 2002 to 2011 in Ifakara HDSS. Data for 2011 and 2012 in Rufiji and Ifakara respectively were used for validation. The best fitted model was used to predict malaria mortality for 2011 in Rufiji HDSS and 2012 for Ifakara HDSS for validation. Furthermore, the best fitted model was used to forecast malaria mortality for five years 2012 to 2016 in Rufiji HDSS and 2013 to 2017 for Ifakara HDSS. The forecasted mortality due to malaria was multiplied by personyears to represent the number of malaria deaths.

Model Selection
The model goodness of fit and forecasting accuracy was assessed using commonly known methods for hierarchical Bayesian models called the deviance information criterion (DIC) [30], root mean square error (RMSE) and logarithm score were used to select the best model. Based on the DIC criterion and RMSE, the model with relatively lower DIC values indicate a better fit to the data compared to models with higher DIC values. Similarly the model with small value of RMSE indicates a better model for prediction and forecasting. The logarithm score also used to assess the predictive quality of the model [27]. A smaller value of the logarithmic score indicates a better prediction quality of the model.

Ethical Statement
This study was approved by Institutional Review Board of Ifakara Health Institute (IHI/IRB/No: 31-2014). Also, Medical Research Coordinating Committee (MRCC) of the National Institute for Medical Research (NIMR) approved for the establishment of Rufiji and Ifakara Health and Demographic Surveillance System (HDSS) sites. For each household visit, informed verbal consent was obtained from head of the family or eligible adult among the family members aged 18 years and above who was able to provide with more information at household.    Figure 3 and 4 presents the annual mortality due to malaria for selected villages for under-five children age in Rufiji and Ifakara HDSS for the study period respectively. The results indicate that there was heterogeneity in mortality due to malaria across villages. Some villages reported low mortality due to malaria throughout the entire study period while others had high mortality due to malaria rate. Furthermore, the results show that all villages experienced an overall decreasing trend in annual mortality due to malaria in Rufiji HDSS site.    Mortality due to malaria rate in 33 villages and 25 villages for Rufiji and Ifakara HDSS respectively were incorporated in the model fitting. Furthermore, annual mean rainfall, average temperature, mean NDVI and annual percentage of ownership of mosquito net in each village were included in the model fitting as covariates. Table 2 shows the results on the variance inflation factors values and indicates less that 10 between covariates (rainfall=1.56, temperature =1.84 and NDVI=1.38) in Rufiji HDSS. Also, the variance inflation factors were less that 10 (rainfall=1.08, temperature=1.08 and NDVI=1.00) in Ifakara HDSS (Table 2). This indicates that there was no evidence for collinearity between covariates in the regression model. Table 3 presents a summary forthe deviance information criteria, logarithm score, mean square error and root mean square error for five models. Based on DIC values, Model ! with covariates, spatial and temporal terms gave a better fit than other models in both HDSS for all age and under five children. Model $ had the worst performance with highest DIC value in Rufiji HDSS followed by model . In Ifakara HDSS, model ! performed better than other models while model was having highest DIC value followed by model $ . Based on these results, it is clear that including spatially and temporal random terms in the model is necessary.

Prediction and Forecasting
Furthermore, Table 3 indicates that model ! had small value for logarithm score compared to other models for Rufiji and Ifakara HDSS respectively. The results also show that model ! had small value for RMSE compared to other models for both Rufiji and Ifakara HDSS based on one year validation. According to the DIC, logarithm score and RMSE values, model ! was considered as the best model for goodness of fit, prediction and forecasting. Table 4 shows coefficients for the covariates incorporated in the selected best model and the results show that mosquito net ownership and rainfall were statistically significant associated with mortality due to malaria in both HDSS sites.
In the other hand, mosquito nets ownership was negatively associated with mortality due to malaria while rainfall was positively associated with mortality due to malaria in the study areas. Temperature and NDVI were not statistically associated with mortality due to malaria. The results also showed that there was a precision for temporal and spatial effect terms (Table 4). Furthermore, the results show a significant difference in temporal and spatial effect for mortality due to malaria for all age in Rufiji and Ifakara HDSS respectively. There was no spatial difference with respect to mortality due to malaria for under-five children in Rufiji HDSS as compared to Ifakara HDSS.  Table 4 and 5 presents forecasted mortality due to malaria for selected villages using best model ! for Rufiji and Ifakara HDSS respectively. Table 4 shows the selected villages with forecasted mortality due to malaria using identified best model ! . The forecasted mortality due to malaria for Nyamwimbe village in Rufiji HDSS was higher in 2012 to 2016 (2.62 per 1000 person-years and 3.01 per 1000 person-years respectively) as compared to Kibiti A village (2.54 per 1000 person-year and 2.39 per 1000 personyears) respectively. The lowest mortality due to malaria forecasted was 1.54 and 1.42 per 1000 person years in Kinyanya village for2012 and 2016 respectively (Table 4).  Table 6). The results also show that Idete village in Kilombero has lowest forecasted mortality due to malaria throughout the forecasted years. In Ulanga district, Kidungalo and Igumbiro were forecasted with highest mortality due to malaria compared to other villages in 2013. Igumbiro village appeared to have high mortality due to malaria in all forecasted years. On the other hand Milola village was having with lowest malaria mortality forecasted in 2013 and Idunda in 2014. Iragua village appeared to have lowest forecasted mortality due to malaria in 2015, 2016 and 2017.

Discussion
In this study, different models were compared for forecasting mortality due to malaria from historical malaria deaths data in the study areas. The findings indicate that model with spatial and temporal random terms was the best model for goodness of fit, prediction and forecasting malaria mortality. The finding is consistent with previous studies [31][32][33][34] which found potential use of spatial and temporal terms in the model. Some studies which used Bayesian hierarchical framework for diseases mapping and ecological studies of health environment association [22,33,35,36] found that spatial and temporal terms in the model are necessary if data are collected over space and time. The possible explanation for this could be due to the complex dependence patterns over space and over time of the occurrence of malaria deaths, and the inherent large stochastic variability due to rare events. In other hand, it may be due to inclusion of spatial and temporal random terms in the model to account other unmeasured confounding variables.
Estimating separately time trends in each area will not be efficient because it will be difficult to establish a baseline pattern separately for each area. In Bayesian approach, the study used the power of hierarchical modelling to borrow information over space and time in order to estimate typical predictable patterns for each area. Some previous studies also have indicated that the statistically advanced techniques using time series models may produce very good fit to the data but in post-sample forecast, they would not be robust enough to handle a possible change in behaviour of the series [37,38].
This study included covariates information in the model for forecasting mortality due to malaria. The covariates information may have contributed significantly to variations in mortality due to malaria. In this study, the models included covariates information such as rainfall, temperature and NDVI which are important factors for malaria transmission [42], and mosquito net which is reduces malaria transmission. The inclusion of these covariates in the model may be not weaken the accuracy of forecasts because malaria transmission-reducing interventions are considered in the models [7]. Rainfall and ownership of mosquito net were independently associated with mortality due to malaria in the study areas. The result is in agreement with previous studies [39,40] which found an association between malaria epidemics with changes in meteorological factors The possible explanation for this may be due to the fact that rainfall provides breeding sites for mosquitoes and increases the humidity which enhances their survival and therefore, increases the spread of the disease [41].
In the other hand, better forecasts were obtained for shorter term forecasts indicating that there was some contribution of the inherent pattern in the historical mortality data that may be considered in multivariate models. The best fitted model performed quite well in hind casting for a single previously removed one to two year or forecasting for one to two years in advance using INLA. However, as more years are removed, the space time model using INLA tends to predict closer and closer to the mean of the remaining data [33]. Forecasting five to ten years depends on the pattern of historical data. For example, when five or more years are removed in observed data for validation, the model only has 8 years for Rufiji and 6 years for Ifakara HDSS of data left to build from and the prediction became very close to the mean of the malaria mortality for remaining years than when you have long time period.
The Abuja Declaration noted the importance of accurate disease prediction for targeting and evaluating control measures [42]. For forecasting models to be useful for malaria control interventions and public health decisionmaking, models must produce accurate forecasts. The study examined various predictors and models across two different settings in rural Tanzania and consistently found that in both sites, models with spatial and temporal random term effect were necessary to achieve the highest possible forecasting accuracy. This is the first study, to the best of our knowledge, which incorporate mosquito nets predictor in combination with environmental predictors for forecasting malaria mortality in space and time.
Furthermore, the study focused on forecasting in space and over time. Fitting such complicated models could potentially be quite difficult and we used INLA, a relatively new method for Bayesian analysis. TheINLA method is a new approach for Bayesian inference which was introduced by Rue and Martino [43,44]. INLA provides a faster computation compared to Markov Chain Monte Carlo (MCMC) method. The method has been applied in different disease modelling and forecasting in which it performed quite well in forecasting for one to two years in advance [33,45].

Strengths and limitations
This study usedlarge datasets from Health and Demographic Surveillance System sites which are continuously registered vital demographic events in a geographical defined area. This has provided better estimates with high precision to allow generalization of the findings to a large population. The findings from Health and Demographic Surveillance Systems data provide information to policy makers and program managers which can be translated into policy and practice for planning.
This study used VA data collected in HDSS sites; Verbal autopsy has great potential for countries like Tanzania where more people die outside of health facility care where no records are available. Also, VA has been shown to provide the best results to obtain the specific causes of death in most of SSA [46] and widely used [12].
This study has some limitations that need to be considered in interpreting the findings. First, ownership of mosquito net was considered as a proxy for use of bed nets in the house, as information about use of bed nets in households and use were not collected during the study. Secondly, there was a possibility of misclassification bias with regard to the causes of death when the sensitivity and specificity of the VA technique is relatively low for assessing cause of death. However, if this happened in this study it may lead to underestimate or overestimation of the reported malaria deaths. Hence, the direction of bias may be non-differentia misclassification. Thirdly, there are other possible factors associated with malaria deaths that were not assessed in this study because are not captured in HDSS database, such as anti-malaria availability. The effect of these variables in mortality due to malaria estimates remains not quantified.

Conclusion
This study used Bayesian framework approach for developing space time model for forecasting malaria mortality. Although the model is reasonably reliable, especially with regard to the magnitude of forecasting one to five years, the model needs further evaluation to determine its accuracy. The INLA method performed quite well in hind casting for a single previously removed year or forecasting for one to two years in advance.
The model with spatial and temporal random effect terms predicted mortality due to malaria with combination of environmental factors and mosquito net which improved the forecasting accuracy. This model may be a useful tool for producing reasonably reliable forecasts of the malaria mortality for targeted intervention strategies. Moreover, including spatially and temporal varying random terms in the model may be necessary and good strategy for modelling malaria counts.