Covid-19 Projections: Single Forecast Model Against Multi-Model Ensemble

The novel coronavirus has unsettled many nations and has created severe uncertainty in its spread. In this paper, we present the performance of ensemble models and single forecast models in the projection of COVID-19 confirmed cases in nine countries. Data consisting of two (2) health indicators (new COVID-19 and cumulative COVID-19 confirmed cases) were collated on May 10, 2020 from the Humanitarian Data Exchange (HDX). Forecasting models with the minimum Mean Square Error (MSE) and Root Mean Square Error (RMSE) were selected. Our findings showed that ETS (A, N, N) was the best model fit for China, Spain, South Korea and Ghana in terms of single COVID-19 confirmed cases. On the other hand, INGARCH (1, 1) was the best fit model for the remaining countries. Regarding cumulative COVID-19 confirmed cases, INGARCH (1, 1) was fit for each of the nine countries. Again, we found that single forecasting models outperform hybrid models when the number of data points does not meet a certain threshold, and when the data has no seasonality; suggesting further that hybrid forecast models perform efficiently in complex time series dataset. Results from the 10 days forecast indicate that for most countries, with the exception of Ghana and India, new covid-19 confirmed cases will drop. The study suggest for future works to expand the training dataset by augmenting additional data onto the available data and then apply hybrid forecasting models to the dataset.


Introduction
The World Health Organization (WHO) declared COVID-19 outbreak a pandemic on March 12, 2020; after it had spread widely across China (since December 2019), extending into fifty-one (51) countries by February 28, 2020 [1]. The declaration of International Public Health Emergency concern was a form of alert for countries to be prepared for containment through active surveillance, early detection, isolation and case management, contact tracing; to enhance prevention and forestall further spread of the novel coronavirus [2].
The influx of vertical (imported cases) and horizontal (local transmissions) spread of COVID-19 pandemic has necessitated several efforts by researchers to forecast the morbidity, mortality cases and recoveries using varied models. As a result, many models have been proposed which have suggested diverse contexts of forecasts towards unique needs. Most of these proposed forecast models were developed at the early stages of the pandemic, where the number of cases did not encourage holding a substantial amount as training data to allow for efficient estimates. Researchers agree for the need to explore forecasting methods and tools to promote accurate predictions [3]. Li et al., [1] in their quest to predict the spread of COVID-19 in China developed ARIMAX (0, 1, 0) model with R-square value of 0.977 and a corresponding Ljung-Box Q (18) test statistic value of 0.987. The researchers explained from their results that China's emergency intervention measures at the onset of the epidemic had a critical restraining effect on the original spread of the epidemic.
Zhang, [5] found from his use of hybrid model in forecasting that the combined model approach has great potential to improving forecasting accuracy than when either of the models was used separately. Wang, [6] supported this view when he had similar findings that suggested that using hybrid models provided improved performances in forecasting.
In this paper, we aim at comparing projections of COVID-19 confirmed cases (both single and cumulative) using single forecast model and multi-model ensemble for nine countries. These countries included China, India, Iran, Italy, Spain, Thailand, Turkey, South Korea and Ghana.

Data
Data were collated on May 10, 2020 from the Humanitarian Data Exchange (HDX) 1 . The data spans from January 11, 2020 to May 10, 2020. The data consist of two (2) health indicators such as new covid-19 confirmed cases and cumulative covid-19 confirmed cases for nine (9) countries. The countries were China, India, Iran, Italy, Spain, Thailand, Turkey, South Korea and Ghana.
The dataset was divided into training and testing dataset. The composition of the training and testing dataset were 80% and 20% respectively. Figure 1 shows the time series plot of daily new infections/cases against time for India, Iran, Italy, Spain and Ghana. Clearly from the figure there were no consistent trends (upward or downward) over the entire time span for all the countries. The series appeared to wander up quickly to its peak and wandered down slowly for countries like Spain, Italy and Iran. In Ghana and India, the series were wandering up but had not reached their peak yet. There were no seasonality and no obvious outliers identified for all the countries.   Figure 2 shows the time series plot of daily new infections/cases against time for China, South Korea, Thailand and Turkey. These series showed no consistent trends (upward or downward) over the entire time span for all the countries. The series appeared to wander up quickly to its peak and wander down slowly for Turkey than the other countries. There was no seasonality identified for all countries but an obvious outlier for China.

Model Estimation
We compared several single forecasting models with ensemble/hybrid forecasting models. Due to limitation of space, forecasting models with similar or close forecasting accuracy metrics were covered in this study. In this regard, we compared the ARIMA forecasting model, the Theta forecasting model, Simple Exponential Smoothing and the Integer Valued GARCH model.
We developed several potential time series models based on the techniques stated in the foregoing paragraph using 80% of the dataset of the selected countries. The developed time series model were then used to forecast based on the length (the number of observation) of the remaining 20% of the dataset. We then estimated the Mean Square Error (MSE) and the Root Mean Square Error (RMSE) between the forecasted values and the remaining 20% of the dataset. Forecasting models with the minimum MSE and RMSE were selected. We also forecasted ten (10) days of new covid-19 cases for the nine (9) countries using the selected models.
Mathematical Formulation

ETS
Considering an observed time series: , , … , . The Simple Exponential Smoothing equation formally takes the form: Where is the actual, known series value for time period , is the forecast value of the variable Y for time period , is the forecast value for time period + 1 and is the smoothing constant [7].
The forecast is based on weighting the most recent observation with a weight and weighting the most recent forecast with a weight of 1 − . For details of how this forecasting method works see [7].

Results
The new covid-19 confirmed cases for the period of January 11, 2020 to May 10; 2020 exhibited no consistent trends (upward or downward) over the entire time span for all the countries. The series appeared to wander up quickly to its peak and wandered down slowly for countries like Spain, Italy, Iran and Turkey. Ghana and India had their series (new covid-19 confirmed cases) wandered up and yet to hit its peak. There was no seasonality identified for all countries but an obvious outlier for China which concurs with findings from the study of [11]. These results are well captured by Figures 1 and 2.
Among the fitted time series forecasting models for new covid-19 confirmed cases and cumulative covid-19 confirmed cases (Tables 1 and 2), single forecasting models such as ETS and INGARCH models outperformed ARIMA and the Hybrid forecast model of ARIMA and Theta model. Due to insufficient amount of available data for the novel covid-19 and the non-seasonality of the dataset, several models single forecasting models and multi-model ensemble models could not be fitted. For instance, Seasonal and Trend decomposition using Loess models require that the input series be seasonal; furthermore, the data must include at least two seasons of data for the decomposition to succeed. Similarly, Neural Network Time Series Forecasts models also require that the data must include at least two seasons of data.
Several studies [12][13][14] have shown the efficiency of the hybrid forecasting models in improving forecast accuracy. However, due to insufficient amount of available data for the novel covid-19 and the non-seasonality of the dataset, the single forecasting models outperformed the hybrid or multi-model ensemble models as evidenced by Tables 1 and  2. This results also suggest that hybrid forecast models performs efficiently of complex time series dataset [13,14].
There are several ways to build a most accurate forecasting model with limited or insufficient dataset. Three of these approaches were expounded by [12]. We suggest for future works or studies to expand the training dataset by augmenting additional data onto the available data and the application of the hybrid forecasting models to the dataset.
The selected models together with its estimated parameters for new covid-19 confirmed cases and cumulative covid-19 confirmed cases for the nine (9) countries are well captured by Table 3.
Results from the 10 days forecast indicated a downward trend of new covid-19 confirmed cases for most of the countries considered in this study; with the exception of Ghana and India. For Ghana and India, new covid-19 confirmed cases will rise or increase as captured by the 10 days forecast.

Discussion
Our findings showed that with the improvement in the size and nature of data counts, INGARCH (1,1) was the best model fit for countries such as China, Iran, South Korea, and Italy among others which hitherto had ARIMA models as their best fit models for the COVD-19 confirmed cases as demonstrated by in the work of Dehesh et al., [4]. Our finding provide scientific evidence for the need for researchers to improve on forecasting models as the size of COVID-19 cases increase to provide efficient models.

Conclusion
In this study, we compared several single forecasting models with ensemble/hybrid forecasting models. Specifically, the ARIMA forecasting model, the Theta forecasting model, Simple Exponential Smoothing and the Integer Valued GARCH models were compared. We fitted time series forecasting models using 80% of the dataset for the nine (9) selected countries. The fitted time series model were used to forecast based on the length (the number of observation) of the remaining 20% of the dataset. Mean Square Error (MSE) and the Root Mean Square Error (RMSE) between the forecasted values and the remaining 20% of the dataset were estimated. Forecasting models with the minimum MSE and RMSE were selected. Ten (10) days of new covid-19 cases for the nine (9) countries were forecasted using the selected models.

Recommendation
We suggest for future works or studies to expand the training dataset by augmenting additional data onto the available data and the application of the hybrid forecasting models to the dataset. The study also suggested that hybrid forecast models should be used for complex time series dataset.