Comparison of ARIMA Model and Exponential Smoothing Model on 2014 Air Quality Index in Yanqing County, Beijing, China

In order to study the changes of air quality index (AQI) in Yanqing County, Beijing, China and predict the trend of AQI value, this paper constructed a time-series analysis.A non-stationary trend is found, and the ARIMA (1, 1, 2) model and Holt exponential smoothing model are found to sufficiently model the data. In comparison of these two model fittings, the ARIMA modelling result are better than Holt modelling’s in terms of trend capturing and result MSE, and in this data it is better to apply the ARIMA model to predict the future AQI values.


Introduction
Beijing is the capital of China and one of the most populous cities in the world. Its population in 2013 was 21.15 million. The city proper is the 3rd largest in the world. The metropolis, located in northern China, is governed as a direct-controlled municipality under the national government, with 14 urban and suburban districts and two rural counties. It is home to the headquarters of most of China's largest state-owned companies and many large multinational companies, and is a major hub for the national highway, expressway, railway, and high-speed rail networks. As China economic is boosting over 20 years, Beijing is always an attraction in the world. However, in recent 2-3 years Beijing is air pollution problem is often in the headlines of many news articles. China government has noticed this problem and done a lot of measures to control the air pollution in Beijing. In this paper, the air quality index (AQI) is used as a comprehensive figure to measure the air quality. As the AQI increases, an increasingly large percentage of the population is likely to experience increasingly severe adverse health effects [1]. Different countries have their own air quality indices, corresponding to different national air quality standards. This paper only concerns the AQI defined by China government [2]. The reasonable analysis and forecast of AQI can help the government make and check their air control police and let the hospitals to prepare their daily patient service.
China's Ministry of Environmental Protection (MEP) is responsible for measuring the level of air pollution in China. The AQI level is based on the level of 6 atmospheric pollutants, namely sulfur dioxide (SO2), nitrogen dioxide (NO2), suspended particulates smaller than 10 µm in aerodynamic diameter (PM10), suspended particulates smaller than 2.5 µm in aerodynamic diameter (PM2.5), carbon monoxide (CO), and ozone (O3) measured at the monitoring stations in China [2]. Table 1 displays the AQI value and its corresponding level and health implications. As shown in Table 1, when AQI value is less than 100, the air is no effect for daily life, but when AQI is larger than 200, it can may case heavy adverse health effects.
In this paper, the study area is in Yanqing County of Beijing, which is situated in northeast Beijing and has an area of 1,993.75 square kilometers and a population of 317,000. It is an ecological conservation and development area of the capital, and well-endowed with natural resources and a picturesque landscape. The Yanqing County is famous tourism place is Beijing with over 30 unique tourist attractions including Badaling Great wall and Longqing Gorge. The 2022 winter Beijing Olympics will be held in this county as it has great Ice and Snow landscape. It is chosen to be study area as its tourism industry is highly determined by its air quality. As shown in Table 1, health implication of AQI is mainly related to outdoor activities. There is one air quality monitor to examine the air pollution and it publish the AQI value every day. The data is extracted from their everyday report from Jan. 1st 2014 to Dec. 29th 2014.  [3]. In these models, ARIMA and Holt methods are two wildly used models [4,5]. In this paper, the performance of these two models are compared on the AQI data in Yanqing County of 2014. All computations are done by using SAS software (SAS® 9.4, SAS Institute Inc., Cary, N.C.) [6,7].

Description of the Data
A timing diagram is firstly plot using all the AQI data of 2014 in Yanqing County, Beijing. As shown in Figure 1, the AQI values range from 26 to 415 with the annual mean value 115. AQI values peak at spring and winter season, and for the other period of 2014 the AQI seems stationary. It is reasonable to have large AQI values in spring and winter months, as the temperature is relative low in Beijing at that time, ranging from -10℃ to 5℃ and it often leads to fog and haze weather in low temperature. The number of days for every AQI Pollution level in Yanqing County, Beijing in 2014 are shown in Table 2, and 54.27% of days in 2014 are in Good or Excellent Air level. However, 12.95% days of 2014 in Yanqing are in Heavily Polluted or Severely Polluted. So, in general the air condition in Yanqing County is acceptable and suitable for the tourism industry.  As shown in Figure 1, the two peaks on the two sides of the plot break the hypotheses of weaker stationary [8]. It also has no linear trend in diagram and very difficult to match the trend in the Figure to any curve model such as polynomial models and exponential models, thus a non-stationary model could be fitted to the data [8,9]. Because the data is only in one year, i.e., from Jan. 1 st 2014 to Dec. 29 th 2014, it cannot be fitted with seasonal effects in the model. In non-stationary models, Holt exponential smoothing model and ARIMA models are two prefer models which are generally used [10].

ARIMA and Holt Modelling
In statistics and econometrics, and in particular in time series analysis, an autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. These models are fitted to time series data either to better understand the data or to predict future points in the series (forecasting). They are applied in some cases where data show evidence of non-stationarity, where an initial differencing step (corresponding to the "integrated" part of the model) can be applied to reduce the non-stationarity. ARIMA models are generally denoted ARIMA (p, d, q) where parameters p, d, and q are non-negative integers, p is the order of the Autoregressive model, d is the degree of differencing, and q is the order of the Moving-average model [11]. ARIMA models form an important part of the Box-Jenkins approach to time-series modelling. When two out of the three terms are zeros, the model may be referred to the non-zero parameter, dropping "AR", "I" or "MA" from the acronym describing the model. For example, ARIMA (1,0,0) is AR(1), ARIMA(0,1,0) is I(1), and ARIMA(0,0,1) is MA(1) [3,4,12].
Holt exponential smoothing method is the most popular double exponential smoothing method, proposed by Holt (1957) with extending simple exponential smoothing to allow forecasting of data with a trend. Holt method is to concentrate on the series of increments , and then estimate the slop parameter to a linear trend by exponential smoothing of these differences. The Holt method can be expressed as following formulas, The formula for prediction is i .
In this formulation, two weighting parameters (α and γ) are used for the two updating equations. In the model fitting of the Holt exponential smoothing method, the initial parameters of and need to be determined. In this study, the initial value of the smoothing series is set to be , i.e., . The initial value of the trend series can be defined by many ways, and a simple method is defined that for an arbitrarily n, X n ⁄ [5,13]. Following ARIMA model procedure, a First order differencing is computed for the data, and then a timing diagram of the differencing data is computed and shown in Figure 2. The differencing data shows a stationary pattern, although several outliers exist, thus it is suitable to let parameter d=1. Auto-correlogram (Figure 3) is also done on the differencing data, which displays a short-term autocorrelation and confirms the stationary of the differencing data. To make an accurate inference of the data, autocorrelation check for white noise is also done on the differencing data. As shown in Table 3, the white noise hypotheses is rejected on lag 6, 12, 18 and 24 with very small p-values. All these results shows that an ARMA model can be fitted to the first order differencing data. From the Figure 3 of the Auto-correlogram, it is safe to determine that q is no more than 2, while as shown in Figure 4, the partial autocorrelation is also no more than 4. This means that it is enough to choose the model in the set of p ! 4 and q ! 2% . From the discussion above, it concludes that the ARIMA (p, 1, q) is suitable to AQI data of Yanqing 2014, but the parameters p and q need to be determined [13,14].   At the significant level of 0.05, all the ARIMA (p, 1, q) models with p ! 4 and q ! 2% are compared. The model with all parameters are significantly different from 0 and least AIC value is selected as the best model [15]. The model ARIMA (1, 1, 2) is chosen and the estimated parameters are shown in Table 4. The constant term are eliminated as its p-value is 0.92. The final ARIMA model is Comparing with ARIMA modelling, Holt modelling is relative simple. The least squared method is often used to estimate parameters of Holt method, and also chosen in this study. The two parameters of Holt modelling are estimated as α 0.11 and β 0.25 . To compare the performance of these two models, the fitted Holt exponential smoothing model, ARIMA model and timing diagram plot of AQI value of 2014 in Yanqing County, Beijing are shown in Figure 5. It can be seen from the plot that ARIMA model fitting result well capture the trend in the data as the blue line match the orange line almost everywhere. The mean squared error (MSE) of these two model fitting results are also calculated, and the MSEs of Holt model fitting and ARIMA model fitting are 5889.40 and 3659.41 respectively. So, the ARIMA modelling result are better than Holt modelling's in terms of trend capturing and result MSE, and in this data it is better to apply the ARIMA model to predict the future AQI values. It is noted that Holt and ARIMA smooting model only can be used to predict future short steps, just like all the other time series methods.

Conclusions
This paper does a study on 2014 the air quality index (AQI) in Yanqing County, Beijing, China. In the process of model building, the original AQI data is found to be non-stationary, but the first order differencing data of original AQI data is stationary. In the ARIMA model fitting, comparing with Air Quality Index in Yanqing County, Beijing, China several models, ARIMA (1, 1, 2) is chosen as the final model. In the Holt exponential smoothing model fitting, least squared method is used to model the data. In comparison of these two model fittings, the ARIMA modelling result are better than Holt modelling's in terms of trend capturing and result MSE, and in this data it is better to apply the ARIMA model to predict the future AQI values. The fluctuations of AQI value are non-rational, and it is influenced by many factors. No model can include all these factors, but this predict model can still help government and other authorities to take advanced measures to the coming air condition.