A Prediction Model for the Animal Plague in Spermophilus dauricus Focus in China
Xiaolei Zhou^{1}, Boyu Zhang^{2}, Xianbin Cong^{1,}^{ *}, Xiaoheng Yao^{1}, Cheng Ju^{1}, Zhonglai Li^{2}, Cheng Xu^{1}, Tianyi Duan^{1}, Guijun Zhang^{1}, Lei Chen^{1}, Zhencai Liu^{1}
^{1}Chinese Base for Control of Plague and Brucellosis, Chinese Center for Disease Control and Prevention, Baicheng, China
^{2}Laboratory of Mathematics and Complex Systems, Ministry of Education, School of Mathematical Sciences, Beijing Normal University, Beijing, China
Email address:
To cite this article:
Xiaolei Zhou, Boyu Zhang, Xianbin Cong, Xiaoheng Yao, Cheng Ju, Zhonglai Li, Cheng Xu, Tianyi Duan, Guijun Zhang, Lei Chen, Zhencai Liu. A Prediction Model for the Animal Plague in Spermophilus dauricus Focus in China. Science Journal of Public Health. Vol. 3, No. 5, 2015, pp. 612-617. doi: 10.11648/j.sjph.20150305.13
Abstract:Plague is a fatal infectious disease that causes serious harm to humans. Its occurrence threatens not only public life, but also economic development. Although the incidence of plague in China shows a downward trend, the risk of animal and human plague still persists. By analyzing the data of the Spermophilus dauricus focus in the Inner Mongolia Autonomous Region from 1981 to 2012, we established a statistical model to predict the epidemic of the animal plague, which combines the best subset regression method and the exponential smoothing method. According to the data from 1981 to 2011, the model predicted that there is no animal plague epidemic risk in 2012. This result is consistent with the report from the Inner Mongolia Autonomous Region, the plague bacillus Yersinia pestis was not detected in the S. dauricus focus in 2012. In addition, our model can be extended to predict the epidemic of plague in other foci. Potential and limitations of the model are discussed.
Keywords: Spermophilus dauricus Focus, Exponential Smoothing Method, Best Regression Subsets Method, Risk Classification
1. Introduction
Plague is a natural epidemic disease, primarily occurring in rodents. It spreads fast, has a high mortality rate, and can cause a human epidemic. The sources of infection are mainly rodents and humans (pneumonic plague), and the infectious vectors are mainly fleas [1]. Several plague epidemics have occurred in China, these epidemics have had a perennial course and a variety of foci covering large areas [2,3]. According to the geographical landscape, host, insect vectors, and ecological characteristics of the plague bacillus Yersinia pestis, there are a total of 12 types of natural foci of plague in China: Spermophilus dauricus focus, Meriones unguiculatus focus, Marmota himalayana focus, Marmota caudata focus,Marmota baibacina-Spermophilus undulatus focus, Spermophilus alaschanicus focus,Microtus brandti focus, Marmota sibirica focus, Apodemus chevrieri-Eothenomys miletus focus, Rattus flavipectus focus, Microtus fuscus focus, and Rhombomys opimus focus [4,5]. In the 305 years before the People Republic of China was founded, epidemic plagues occurred about 179 times in 20 provinces (autonomous regions) including 549 counties (cities, banners), infected 2,598,794 people, and killed 2,399,400 people. From 1950 to 1954, 6,868 cases of human plague were recorded. Since 1955, the outbreak of epidemics of human plague was under control [6]. However, from 1990 to 2002, the number of human infected cases increased due to recrudescence of the animal plague in the R. flavipectus foci and its spread to humans. After 2003, the incidences of human plague in China showed a downward trend, with the decline of the R. flavipectus plague epidemic.
Because of the high infectivity and fatality, a great deal of research has been devoted to analyze the main factors that affect the epidemic of plague and the change trends of the host populations (in China). Most of these investigations focused on the occurrences of animal plague. For instance, by applying grey system method, Li and Baiyin [7] predicted that the density of M. sibirica in Inner Mongolia Autonomous Region would decrease according to the collected data of 1981-1992. Based on the data from 1981 to 1993 on the Citellus alaschanicus plague focus and using the multiple stepwise regression method, Qin and Li [8] showed that the density of C. alaschanicus and nest flea index were the main factors affecting the epidemic of plague. Later, Li and Zhang [9] indicted that a combination of the grey system method and the runs testing method can be used to predict the epidemic yearsin YiKeZhaoMeng (data from 1965 to 1991) and Otog QianQi (data from 1967 to 1990). They further analyzed the data of the M. brandti plague focus from 1980 to 1996 by the runs testing method and found epidemic cycles [10]. Using the astronomical, meteorological, and monitored data of the northern desert steppe region in the Inner Mongolia Autonomous Region from 1982 to 1993, Mi et al [11] calculated the epidemic intensity by the multiple regression and optimal regression models. In the next few years, multiple regression, fuzzy clustering analysis, discriminant analysis, and grey system methods were applied to find out the main influencing factors of the S. dauricus plague focus based on part of the monitored and meteorological data from 1957 to 2000. All these methods showed that the epidemic of plague is mainly affected by rodent density, body flea index, and burrow-dwelling flea index [12,13,14,15]. Recently, prediction models based on the geographic information system have attracted considerable attentions [16], and these methods have been applied to S. dauricus focus, regions along the Qinghai-Tibet Railway and Guangxi Province [17,18,19]. In contrast to animal plague, few studies considered human plague in China since the infection of human plague is very rarely accidence in the recent years. Zhang et al. [20] linked human plague intensity from 1871 to 2003 to climate at the province scale. Using the spatial and temporal human plague records in China from 1850 to 1964, Xu et al. [21] revealed that human plague intensity in northern or southern China is positively related to wetness index. They further showed that wet climate and transportation routes accelerate spread of human plague [22].
The purpose of our study is to predict the occurrence of animal plague epidemic in the S. dauricus focus. The earliest written record of the S. dauricus focus was Wu Lien-teh’s report of 1917-1920 on the "Tongliao bean rat plague (S. dauricus plague)" [23]. Subsequently, Y. pestis was isolated from Rattus norvegicus, Mus musculus, Mesocricetus auratus, and Apodemus agrarius in 1937. This was the first bacteriological evidence confirmed the existence of this focus. In 1957, researchers from the former Epidemiology Institute of Chinese Academy of Medical Sciences, Jilin province, and the Inner Mongolia Autonomous Region indicated that S. dauricus was the main host of the focus and R. norvegicus was the second [24]. Although the incidence of plague has declined significantly in recent years, the risk of animal and human plague epidemic still persists. Between 1981 and 2012, a total of 87 strains of Y. pestis were detected, where 64 strains were from animals (1985, 1986, 1987, 1988, 1990, 1994, 1996 and 1998) and 23 strains were from vectors (1985, 1987, 1988, 1989, 1990, 1996 and 1997) (Section 2.1.2 Date source). It is important to note that more than 95% of the strains were detected in the Inner Mongolia Autonomous Region (83 out of 87), and the other 4 strains were from Jinlin Province (1985). In addition, Y. pestis was not detected in Liaoning province and Heilongjiang province in the last 30 years.
In this study, we establish a prediction model based on the best subset regression (BRS) method and the exponential smoothing method. The BRS method is a variable selection method that attempts to the "best" subset of a set of predictor variables in regression1 [25]. This method has been applied to analyze the main influencing factors of plague [26,27]. The exponential smoothing method is a time series analysis model developed on the basis of a moving average method [28]. It emphasizes that the development of a time series is mainly affected by recent data, where the predicting value (called the exponential smoothing value) is the weighted average of the observation value and the exponential smoothing value of the previous period. Comparing with other moving average methods (e.g., ARMA, ARIMA), the exponential smoothing method only requires one latest observation value and one latest predicting value, therefore can greatly reduce the data storage problem. The exponential smoothing method has been effectively used to predict the frequency changes of ecological variables, such as population density and biomass of rodent communities [29]. Therefore, we could expect that it has a good potential for predicting animal plagues, as we will show later.
2. Materials and Methods
2.1. Materials
2.1.1. The S. Dauricus Focus
S. dauricus focus are located between north latitude 41°31' to 46°45' and east longitude 113° to 126°41' [30]. The specific location of the focus is at the south of the Lalin River in Jilin Province, west of the Yitong and Kacha Rivers, east of the Mongolian desert plateau, and north of the Yinshan Mountains (south to the Nuluer Tiger Mountain region of Liaoning Province), where includes the Liaohe River basin and the steppe zone from the southern foothill of the Greater Khingan Mountain to the south of the Otindag sandy land[30]. S. dauricus focus is distributed in the 53 counties (cities, banners) of Inner Mongolia Autonomous Region, Jilin province, Liaoning province, and Heilongjiang province, covering an area of 161,918 km^{2}(see Figure 1, yellow and blue regions). Since Y. pestis was rarely detected in Jilin province, Liaoning province, and Heilongjiang province, we only focus on the S. dauricus focus in the Inner Mongolia Autonomous Region (see Figure 1, blue region).
2.1.2. Data Source
The meteorological, geographical and monitored data of the S. dauricus focus in the Inner Mongolia Autonomous Region from 1981 to 2012 are used for the animal plague prediction. To be precise, monitored data includes 7 factors: density of S. dauricus, S. dauricus flea infection rate, S. dauricus flea index, nest flea infection rate, nest flea index, burrow-dwelling flea infection rate and burrow-dwelling flea index; meteorological data includes 7 factors: temperature, relative humidity, precipitation, air pressure, relative temperature, evaporation, and sunshine; geographical data includes 3 factors: living environment, vegetation and soil. Data are collected from annual summary of the Inner Mongolia Autonomous Region from and the mid-term evaluation report of the "Eleventh Five-Year plan".
2.2. Methods
2.2.1. A Basic Framework of the Prediction Model
Our prediction model consists of four phases. First, main factors that affect the epidemic are selected using the BRS method. Second, a risk classification model is established to evaluate the risk of epidemic based on the fitting results of the BRS method. Third, the exponential smoothing method is applied to predict the main factors in the next period. Finally, the risk of epidemic in the next period is calculated according to the predicted results and the best regression equations.
Numbers of factors | Best regression equations | R2 |
1 | y = -0.097 + 0.152X3 | 0.196 |
2 | y = -0.428 + 0.134X3+ 0.977X7 | 0.298 |
3 | y = 0.309 + 0.384X1 - 0.022X2 + 0.248X3 | 0.390 |
4 | y = 1.032 + 0.556X1 - 0.026X2 + 0.266X3 - 0.012X4 | 0.473 |
5 | y = 0.948 + 0.515X1 - 0.028X2 + 0.267X3 - 0.011X4 + 0.011X6 | 0.486 |
6 | y = 0.963 + 0.559X1 - 0.029X2 + 0.279X3 - 0.010X4 - 0.009X5 + 0.011X6 | 0.491 |
7 | y = 0.890 + 0.548X1 - 0.027X2 + 0.270X3 - 0.010X4 - 0.009X5 + 0.008X6 + 0.167X7 | 0.493 |
Note: X_{1}–X_{7} represent the density of S. dauricus, S. dauricus flea infection rate, S. dauricus flea index, nest flea infection rate, nest flea index, burrow-dwelling flea infection rate, and burrow-dwelling flea index, respectively. R^{2} is the coefficient of total correlation. Statistical software: Matlab.
2.2.2. Best Regression Equations
Preliminary analysis based on the BRS method showed that the S. dauricus plague epidemic was mainly decided by the characteristics of the S. dauricus population itself, such as the density of S. dauricus, and was less affected by the climate and environment [27]. Therefore, we only consider the following 7 factors as explanatory variables when establishing the best regression equations: density of S. dauricus, X_{1}; S. dauricus flea infection rate, X_{2}; S. dauricus flea index, X_{3}; nest flea infection rate, X_{4}; nest flea index, X_{5}; burrow-dwelling flea infection rate, X_{6}; and burrow-dwelling flea index, X_{7}. In the model, the dependent variable y = 1 if Y. pestis was detected, and y = 0 if no bacterium detected. The best regression equations with different numbers of factors are shown in Table 1. In particular, the four main factors of the S. dauricus focus screened out by the BSR method are the density of S. dauricus, S. dauricus flea infection rate, S. dauricus flea index, and the nest dye flea rate.
2.2.3. Risk Classification
In a previous study of the risk classification of S. dauricus focus [31], there are 3 different risk levels according to the value of the dependent variable (of the BRS method): If y>2/3, then the focus is predicted as an epidemic. If y<1/3, then the focus is predicted as no epidemic risk. If 1/3 ≤ y ≤ 2/3, then the focus is predicted as a high-risk area. According to the collected data from 1981 to 2011, when y > 2/3, the accuracy of epidemic prediction was 100%. When the number of factors in the best regression equations is greater than 3 and y < 1/3, the accuracy of non-epidemic prediction was 100%. When 1/3 ≤ y ≤ 2/3, the fitting rate of epidemic prediction was approximately 50% (see Table 2). In this study, we adopt this risk classification standard.
Numbers of factors | Epidemic (%) | No-risk (%) | High-risk (%) |
1 | 100 | 77 | 20 |
2 | 100 | 82 | 40 |
3 | 100 | 89 | 44 |
4 | 100 | 100 | 45 |
5 | 100 | 100 | 42 |
6 | 100 | 100 | 56 |
7 | 100 | 100 | 56 |
2.2.4. Exponential Smoothing
The basic formula of the exponential smoothing method is
ES(t) = a×X(t-1) + (1-a)×ES(t-1),
where ES(t) represents the predicting value (i.e., the exponential smoothing value) at time t, where the range of t is from 1982 to 2012; X(t) represents the observation value at time t; and a is smoothing constant in the range [0,1]. We extrapolate the trends of the 7 main factors (X_{1}–X_{7}) by using the exponential smoothing method. Since monitored data in 1981 are incomplete, we cannot calculate ES(1982) from the formula. Instead, we assign the observation value to be the initial smoothing value, i.e., ES(1982)=X(1982). To determine the smoothing constant a, we use the minimum mean square error (MMSE) principle. According to this principle , the optimal smoothing constants of the 7 main factors are as follows: (1) density of S. dauricus: a_{1} = 0.81; (2) S. dauricus flea infection rate: a_{2} = 0.45; (3) S. dauricus flea index: a_{3} = 0.44; (4) nest flea infection rate: a_{4}= 0.94; (5) nest flea index: a_{5} = 0.73; (6) burrow-dwelling flea infection rate: a_{6} = 0.97; and (7) burrow-dwelling flea index: a_{7}= 0.76.
3. Results and Discussion
3.1. General Results
The exponential smoothing values of the 7 main factors in S. dauricus focus from 1982 to 2012 can be calculated by applying the exponential smoothing method introduced in Section 2.2.4. The trend graphs are shown in Figure 2. It is clear that in recent years, all the 7 factors show decreasing trend. This then gives us a first impression that the epidemic of S. dauricus plague is not likely to occur in 2012. The predicting values of the 7 main factors in 2012 are: (1) density of S. dauricus, X_{1}(2012)= 0.94; (2) S. dauricus flea infection rate, X_{2}(2012)= 48%; (3) S. dauricus flea index, X_{3}(2012)= 1.85; (4) nest flea infection rate, X_{4}(2012)= 64%; (5) nest flea index, X_{5}(2012)= 9.94; (6) burrow-dwelling flea infection rate, X_{6}(2012)= 14.88; and (7) burrow-dwelling flea index, X_{7}(2012)= 0.28. From Table 1, y values of the best regression equations with numbers of factors 1-7 are y_{1}(2012)= 0.183, y_{2}(2012)= 0.089, y_{3}(2012)= 0.085, y_{4}(2012)= 0.046, y_{5}(2012)= 0.050, y_{6 }(2012)= 0.071, and y_{7 }(2012)= 0.061, respectively. According to the risk classification standard, all the y values are less than 1/3. Therefore, our model predicts that no epidemic risk in 2012. This prediction is consistent with the actual situation: The plague bacillus Yersinia pestis was not detected in the S. dauricus focus in 2012.
3.2. Limitations of the Model
The combination of different statistical methods can help to get a more precise prediction, but it also requires more data in order to ensure all these methods are applicable. In general, establishing a reasonable regression model needs about ten observations for each state (two states in our model are "Y. pestis was detected" or "was not detected") [32]. If thenumber of occurrences of plague in a region is not sufficient, then the results of regression method are questionable. Since Y. pestis was rarely detected in Jilin province, Liaoning province, and Heilongjiang province from 1982 to 2012, we cannot construct regression equations for these regions (although the risk of plague epidemic cannot be ignored). In contrast to regression method, the exponential smoothing method does not require any minimum number of observations. However, it can be used only for data without any systematic trend or seasonal components [28]. This implies that monthly data or quarterly data of the S. dauricus focus are not applicable for exponential smoothing method, we have to first transfer them to annual data in order to minimize the seasonal effects. On the other hand, the exponential smoothing value for a given year is derived from both the past smoothing values and the observation value of the year. Thus, if one observation value is missing, then it is unable to calculate the corresponding smoothing value (and also the subsequent smoothing values). A possible solution for this problem would be to use the Expectation-Maximization algorithm to estimate the missing value [33].
3.3. Potentials of the Model
It is worth to note that the key idea of our model could also be applied to predict the epidemic of plague in other foci. To achieve this, the first step is to find out the main influencing factors by regression methods; the second step is to establish a risk classification standard based on the history data and the regression results; the third step is to predict the trends of the main factors by time series methods; and the last step is to calculate the risk of epidemic in the next period. Depending on the data, one should choose suitable regression methods and time series methods. Previous studies have showed that different foci may have different main influencing factors, e.g., M. unguiculatus focus is sensitive to meteorological factors such as precipitation [20,34]. Thus, epidemic predictions for these foci become even more delicate issues since there no universal time series method for meteorological data.
4. Conclusion
The regression methods (such as multiple regression) and the time series analysis (such as moving average methods) are commonly used for predicting infectious diseases. However, previous studies on plague prediction usually adopted a single statistical method to predict the occurrence of epidemic. In this paper, we establish a prediction model that combines the best subset regression method and the exponential smoothing method by the risk classification standard. Based on the data of the S. dauricus focus in the Inner Mongolia Autonomous Region from 1981 to 2011, we constructed best regression equations with different numbers of influencing factors. We then derived the values of the 7 main factors in 2012 by using the exponential smoothing method, and calculated the risk of animal plague epidemic through the best regression equations. According to the risk classification standard, our model successfully predicted no animal plague epidemic in 2012.
Acknowledgments
This research received financial support from the Research Special Fund of Health Sector of China (No. 201202021), NSFC (No. 11301032) of China and "the Fundamental Research Funds for the Central Universities" of China.
References