Predicting PM2.5 Concentrations Using Stacking-based Ensemble Model

: With the increasingly serious air pollution problem, PM2.5 concentration, as an effective indicator to evaluate air quality, has attracted extensive attention from all sectors of society. Accurate prediction of PM2.5 concentrations is of great significance in providing the public with early air pollution warning information to protect public health. With a decade of development, artificial intelligence technology has given birth to various prediction models with high-performance, in particular, brought new impetus to the prediction of PM2.5 concentrations. In this study, a stacking-based ensemble model with self-adaptive hyper-parameter optimization is proposed to solve the PM2.5 concentrations prediction problem. First, the raw data are preprocessed with the normalization method to reduce the influence of the different orders of magnitude of input variables on model performance. Second, the Bayesian optimization method is used to optimize the hyper-parameters of the base predictors to improve their performance. Finally, a stacking ensemble method is applied to integrate the optimized base predictors into an ensemble model for final prediction. In the experiments, two datasets from the air quality stations in different areas are tested with four metrics to evaluate the performance of the proposed model in PM2.5 concentration prediction. The experimental results show that the proposed model outperforms other baseline models in solving the PM2.5 concentrations prediction problem.


Introduction
In recent years, along with the rapid development of modern industries, air pollution has become increasingly serious. As one of the major airborne pollutants, PM2.5 has captured considerable focus. PM2.5 refers to the inhalable particulate matter with an aerodynamic diameter of 2.5 µm or less, which is composed of highly active toxic and hazardous substance [1]. If one exposes to high concentrations of PM2.5 for a long time, the normal function of the human body will be affected and the risk of lung cancer will increase [2]. Therefore, to protect public health, providing accurate prediction of PM2.5 concentrations as the early warning information is a matter of urgency.
However, due to the complexity of the PM2.5 formation and development, how to accurately predict the PM2.5 concentration has become a challenging task. In the past, traditional statistical methods have often been used to predict the PM2.5 concentration [3][4][5]. Unfortunately, most of these methods assume that there is a linear relationship between historical PM2.5 concentration data and future PM2.5 concentration, with the complex nonlinear relationship between them oversimplified [6]. Recently, benefiting from the booming development of artificial intelligence technology, various high-performance prediction models suitable for describing complex nonlinear relationships have been used to monitor and predict PM2.5 concentrations accurately. In this study, a stacking-based ensemble model with self-adaptive hyper-parameter optimization (SEM-SAHPO) is proposed to predict PM2.5 concentrations accurately. First, after the normalization of data, the proposed model deploys the Bayesian optimization method to self-adaptively optimize the hyper-parameters of base predictors, including linear regression (LR) [7], support vector regression (SVR) [8], k-nearest-neighbor (KNN) [9], long short-term memory (LSTM) [10], convolutional neural network (CNN) [11], and multi-layer perception neural network (MLP) [12]. Then, the stacking ensemble method is employed to integrate these base predictors. In the experiments, the performance of the proposed model and six baseline models are compared on two datasets from the air quality stations in different areas.
The experimental results proved that the proposed model performs better than other baseline models in predicting PM2.5 concentrations.
The remainder of this study is organized as follows. Section 2 reviews the previous works on PM2.5 concentrations prediction and ensemble methods. In Section 3, the methodology of the proposed ensemble model is introduced. Section 4 details the experiments and analyzes the experimental results. Section 5 describes the conclusions and provides the directions of future studies.

PM2.5 Concentrations Prediction
The adverse effects of PM2.5 push researchers to predict its concentrations. Earlier studies on the PM2.5 concentrations prediction mainly adopted the statistical methods such as land use regression model [4], auto-regression integrated moving average [5], and Kalman filtering method [13]. The performance of these models based on statistical methods is weak in predicting the extreme points, and has difficulty in dealing with the complex factors involved in the prediction [1]. They usually predict the future PM2.5 concentration by describing the linear relationship between historical PM2.5 concentration data and future PM2.5 concentration, while oversimplifying the complex nonlinear relationship between them [6].
With the persistent development of artificial intelligence technology, the machine learning methods have gradually become popular for air quality prediction, because they can describe the complex nonlinear relationships and greatly improve the prediction accuracy. For example, Gennaro et al. [14] proposed an artificial neural network to predict the PM10 daily concentrations and observed the effectiveness of the artificial neural network in obtaining air quality information. Sinnott and Guan [15] compared the performances of the machine learning methods including LSTM networks and artificial neural networks with the traditional statistical methods for predicting the PM2.5 concentrations of Melbourne. They observed that the machine learning methods are better than the traditional statistical methods, and the LSTM network performs best in solving the PM2.5 concentrations prediction problem. Joharestani et al. [16] adopted several machine learning-based models, including extreme gradient boosting, deep learning, and random forest on multi-source remote sensing data to predict the PM2.5 concentrations in Tehran's urban area.
Previous literatures have demonstrated that the machine learning methods can predict the PM2.5 concentrations well. In this paper, six machine learning-based prediction models with good prediction performance, including three traditional machine learning models and three deep learning models, are selected as the base predictors to predict the PM2.5 concentrations. To further enhance the accuracy of prediction, the stacking ensemble method is applied to integrate the six base predictors to construct an ensemble model. Moreover, the linear support vector regression (LSVR) [17] is employed as the meta-predictor owing to its excellent flexibility and high efficiency.

Ensemble Method
In machine learning, the ensemble methods integrating multiple machine learning models have gained considerable attention because they usually outperform single machine learning models. The widely employed ensemble learning methods include bagging, boosting [18], and stacking [19]. Among them, bagging and boosting can be used to construct homogeneous models but not the heterogeneous model; while stacking can be used to construct both homogeneous and heterogeneous models. In addition, stacking is characterized of the flexible structure and stability. These advantages make stacking extensively to be applied for solving various prediction problems, such as bankruptcy prediction [20], user geolocation prediction [21], and earthquake casualty prediction [22].
In recent years, numerous works are emerging in predicting air quality with ensemble methods. For example, Wang and Song [23] developed a deep spatial-temporal ensemble model to predict the air quality in the future and demonstrated that the ensemble model outperforms the single base models and can obtain the more accurate prediction results. Di et al. [24] proposed an ensemble model by integrating multiple machine learning models and predictor variables to predict the PM2.5 concentrations across the contiguous United States. Maciag et al. [25] proposed a clustering-based ensemble model to analyze the PM10 concentrations in London, by evolving spiking neural networks trained on a separate set of time series.
Considering the superior performance of the stacking ensemble method and its successful application in air quality prediction, this study employs the stacking ensemble method to integrate the selected six base predictors. The Bayesian optimization method is employed to optimize the hyper-parameters of the base predictors so that the performance of the proposed model is further improved.

Dataset Exploration and Preprocessing
The datasets used in this study involve the PM2.5 concentrations records detected by two air quality stations in different areas. In detail, the first one [26] is the hourly PM2.5 concentrations data from Beijing Olympic Sports Center in China, which is named as BOSC in this study. It includes 4416 records detected between 00:00 h on 1 March and 23:00 h on 1 August 2013. The other one [27] is the hourly PM2.5 concentrations data in the Xuhui site of Shanghai, China, which is named as XHSH in this study. It includes 2208 records detected between 00:00 h on 1 October and 22:00 h on 31 December 2015.
To improve the availability and standardization of dataset, the raw data should be preprocessed. In this study, the raw PM2.5 concentrations are preprocessed using the normalization method to reduce the influence of the different orders of magnitude of input variables on model performance.

The Proposed SEM-SAHPO Model
The sub-section describes the proposed SEM-SAHPO model. The framework of the proposed model is shown in Figure 1 and the details are as follows.

Bayesian Optimization Method
In staking-based ensemble models, hyper-parameter setting of the base predictors influences the prediction accuracy of the base predictors and further impacts the ensemble model's performance. This necessitates the optimization of the hyper-parameters of base predictors. Traditional hyper-parameter optimization methods, (e.g., the grid search method and trial-error method), realize the hyper-parameter optimization by searching the whole hyper-parameter space, which is exponentially expensive and inefficient [28]. The emergence of the Bayesian optimization method provides a promising alternative for solving the hyper-parameter optimization problem [29,30]. Applying the Bayesian optimization method to optimize hyper-parameters can get high-quality solutions with high efficiency [30]. Therefore, the Bayesian optimization method is employed in this study to automate the optimal configuration of the hyper-parameters of base predictors. Through the Bayesian optimization method, the performance of the ensemble model can be enhanced.

Stacking Ensemble Method
The stacking ensemble method is a promising ensemble learning strategy that integrates multiple base predictors into an ensemble model, and characterized of stronger robustness and better performance [31]. In this study, to obtain the higher prediction accuracy, a stacking-based ensemble model is proposed for PM2.5 concentrations prediction. Six popular machine learning-based models with good prediction ability, including LR, KNN, SVR, LSTM, MLP, and CNN, are employed as the base predictors. The LSVR with excellent flexibility and high efficiency is selected as the meta-predictor. The proposed ensemble model includes two processes, i.e., data preparation process and stacking ensemble process. The stacking ensemble process consists of two stages, i.e., the training and optimizing stage of base predictors and the training stage of meta-predictor. In the data preparation process, the raw data are normalized, and the obtained normalized data are divided into three parts, including training set, validation set, and test set. In the training and optimizing stage of base predictors, the Bayesian optimization method is firstly applied to optimize the hyper-parameters of six base predictors based on the training set and validation set. Then the six base predictors are trained on the training set to obtain the trained optimal base predictors. In the training stage of meta-predictor, the six trained optimal base predictors are firstly used to predict the validation set. Then the six validation results obtained by the trained optimal base predictors are concatenated into one feature matrix, which are employed as the input to train the meta-predictor.
For testing the trained meta-predictor and obtaining the final prediction results, the six trained optimal base predictors are tested on the test set first. Then the six test results of trained optimal base predictors are concatenated into one feature matrix, which are employed as the input to test the meta-predictor to obtain the final results.

Experiment
This section introduces the statistical metrics for evaluating all models' performance and analyzes and discusses the experimental results of all models in predicting the PM2.5 concentrations. All models and methods were implemented with Python programming language. The experimental computer contained an Intel Core i7-8700 3.2 GHz CPU, 32 GB of RAM, and an NVIDIA GeForce RTX 2080 GPU.

Evaluation Metrics
Four widely used statistical metrics are adopted to evaluate the models' performance, including mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R 2 ). The calculation process of these four metrics is defined as follows [32,33]: In particular, for MAE, RMSE, and MAPE, the lower value indicates the better prediction performance. For R 2 , the higher value represents the better prediction performance. Table 1 shows the evaluation results obtained by the proposed SEM-SAHPO model and other baseline models on BOSC and XHSH datasets. Moreover, to present the evaluation results more intuitively, the predicting results of all models are depicted in column charts, as shown in Figure 2.    To further verify the prediction accuracy of the proposed SEM-SAHPO model, the predicting values of all models are visualized. Figure 3 and Figure 4 show the predicting PM2.5 concentration and true PM2.5 concentration on BOSC dataset and XHSH dataset, respectively. It can be found that SEM-SAHPO model capture the inherent trend of PM2.5 concentration more accurately than other baseline models. Furthermore, SEM-SAHPO model is capable of generating smoother predicting results when the PM2.5 concentration fluctuates frequently. Therefore, the conclusion can be drawn that the SEM-SAHPO model outperforms other baseline models in accuracy and stability.

Conclusion and Future Work
Accurately predicting the PM2.5 concentrations is of great significance for providing early warning information of air pollution and addressing the corresponding public health problems. In this study, an ensemble model is proposed to predict the PM2.5 concentrations. Specifically, the Bayesian optimization method is applied to optimize the hyper-parameters of base predictors automatically, and the stacking ensemble method is adopted to integrate the optimized base predictors for achieving the cooperative advantages. The proposed ensemble model was verified on two PM2.5 concentrations datasets based on four evaluation metrics. The experimental results demonstrated that the proposed model outperforms other baseline models.
Although the proposed SEM-SAHPO model in this study can obtain promising predicting results, some aspects can be improved in the future. Firstly, the employed LSVR can be extended with some advanced heuristic search algorithms (e.g., genetic algorithm and particle swarm optimization algorithm) to shorten the running time of the proposed model and further improve the model's efficiency. Secondly, the other ensemble methods (e.g., bagging method and boosting method) can be explored to improve the structure of the proposed model to further enhance its performance. Thirdly, more evaluation indicators can also be considered to evaluate the performance of the predictors, so as to obtain more comprehensive evaluation results. In addition, the proposed model can also be applied to predict other air pollutants concentration or to solve other forecasting tasks in other fields, such as traffic flow predicting problem, wind speed predicting problem, and stock price prediction problem.