Modeling Monthly Average Temperature of Dhahran City of Saudi-Arabia Using Arima Models

Temperature is the coldness and hotness of the body and its unit is measured in Celsius. The data used for this research work is the average monthly temperature of Dhahran city which is located in the Kingdom of Saudi Arabia. The data range is from 1951 to 2010, and sample data of 1951 to 2008 was used for the estimation to choose the best model and the sample data from 2009 through 2010 was left for the forecast. Different models were tried but ARIMA (2, 1, 1) (0, 1, 1)12 is selected as the best model because of its low sic and aic criteria and also the forecast error, the best model is used for forecasting.


Introduction
Temperature as it is commonly known is the degree of hotness or coldness of a body or region. This means that there is a need to monitor the variation in the temperature of different regions from time to time. According to [1][2], the world is warming 0.6±0.2°C over 100 years. So there is need to predict future climate.
The monthly mean, maximum and minimum temperatures of countries with 37% global land mass were analyzed by [3]. According to [4], there is change in climatic condition of many countries and it is one of the major environmental threats to food production and livelihoods. Homogeneity of annual mean temperature in given stations was analyzed using Cumulative deviation test [5] and first order ACF test [6].
Dhahran's climate is characterized by hot, humid summers, and cold long winters. Temperatures can rise to more than 40°C (100°F) in the summer, coupled with extreme humidity (85 & ndash 100%), given the city's proximity to the Persian Gulf. The highest recorded temperature in Dhahran is 51.1°C (124.0°F). In winter, the temperature rarely falls below −2°C (28°F), with the lowest ever recorded being −5°C (23°F) in January 1964. The Shamal winds usually blow across the city in the early months of the summer, bringing dust storms that can reduce visibility to a few metres. These winds can last for up to six months.
Several authors have studied the climate variability in different countries, [7] studied the climate of Bahrain during the past six decades, principally the temperature and rainfall trends. The study [7] demonstrated enormous climate variability, represented by alternate hot-dry and cool-wet events. The Mean, maximum and minimum surface air temperatures recorded at 70 climatic stations in Turkey during the period 1929-1999 were evaluated by [8]. Also Climate change has the potential to affect all natural systems, thereby becoming a threat to human development and survival socially, politically, and economically [10].

Methodology
Time series models have been very useful in studying the behavior of process over a period of time. It has wider applications which include; sales forecasting, weather forecasting, inventory studies etc. In decisions that involve factor of uncertainty of the future, time series models have been found one of the most effective methods of forecasting.

Autoregressive Moving Averages (ARMA)
An ARMA process of order p, q is a stationary process X t that satisfies the relation Where {ε t } is a white noise. In lag form, equation (1) becomes; Also, Also, equation, (2) and (3) can also be written as; ( ) is called the pi-weight.

Autoregressive Integrated Moving Average Process (ARIMA)
A process X t is said to be an autoregressive integrated moving average process of order (p, d, q) if its d th difference is an ARMA (p, q) process. An ARIMA (p, d, q) model can be defined by: Where p, d, q are non-negative integers. Note: When d =0, the ARIMA (p, d, q) becomes ARMA(p, q).

Diagnostic Checking
The Box-Jenkins [9] methodology required examining the residuals of the actual values minus those estimated through the model. The model is assumed to be appropriate if its residuals are random, but if the residuals are not random, another model will be entertained, then its parameters will be estimated, in order to check for randomness. Several tests (e.g., the Box-Pierce, Box Ljung test, Shapiro test e.t.c) have been suggested to help users determine if overall the residuals are indeed random. Although it is a standard statistical procedure not to use models whose residuals are not random, it might be interesting to test the consequences of lack of residual randomness on post-sample forecasting accuracy.

Result and Discussion
The data used in this paper is the average monthly temperature of the Dhahran city which is located in the Kingdom of Saudi Arabia. The data collected range from 1951 to 2010, the sample data of 1951 to 2008 was used for the estimation to choose the best model and the sample data 2009 through 2010 was left for the forecast.   1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 TEMPERATURE The data-sets in figure 1 will be reduced for the period of 1951 to 1960, in order to reveal its monthly seasonal pattern in better details. There is no need to model trend here since there is no trace of it.
The reduced data-set shows that the graph is seasonal and some cyclical variation could be noticed. Now, the seasonality using dummy variables will be modeled.  Table 1 shows the estimation results. The 12 seasonal dummies account for more than 96 percent of the variation in Dhahran monthly temperature, as R 2 =.965633. At least few of the remaining variation are cyclical, which will be designed to capture in the final model. Also the low Durbin-Watson stat shows the evidence of serial correlation which needs to be removed.
The residual of the model in Figure 3 shows that the data are random along the mean zero. There are still some non-random pattern in the residual of the model between the period of 1970 -1977 which shows that the model still need little adjustment, also there is need to check the corellogram of the output of the model whether it is white noise or not.   3 showing the correlogram of the output that this is not a white noise therefore it is not advisable to forecast with this model since its forecast will be predictable. There is need to check if the model is stationary using the dickey fuller statistics.    (Table 3) shows that the data is not stationary since it is not significant, hence there is need to check the correlogram of the series. After series of models were tried, we arrived at ARIMA (2, 1, 1) (0, 1, 1) 12 as the best model because of its low Sic and Aic criteria and forecast error. The output is given below:

Interpretation of the Output of the Preferred Model
The dependent variable was DLSTEMP (after taking the first trend difference and first seasonal difference) while the independent variables were AR(1), MA(1) and SMA(12). All the independent variables were significant because there probabilities were nearly equal to zeros. The R square value was only about 64.4 percent of the variance of the dependent variable (DLSTEMP) by the variables included in the model (AR(1), MA(1) and SMA(12)). The R-Square measures the in-sample success of the regression equation in forecasting DLSTEMP. Durbin Watson value is about 2.020834 which indicate that there is no serial correlation in the series. The values of SIC and AIC are the lowest compare to other model tried. Now let check its residual and corellogram before using it for forecast.
Residual for ARIMA (2, 01) (0, 1, 1) 12 model is given below: Residual for ARIMA (2, 01) (0, 1, 1) 12 model is presented in figure 4, the model shows that the data are random along the mean zero, which is good for our model. There is no serial correlation in the model but the presence of white noise in the model need to be checked before using it for the forecast.  The condition of the invertibility of the MA (1) and SMA (12) process also hold. The inverses of all of the roots must be inside the unit circle. The forecast graph is very good since all the data and their estimate values lies in the 95% confidence band and also the actual values and their estimates are not too far from each other.

Conclusion
After series of models were tried, we arrived at ARIMA (2,1,1) (0,1,1) 12 as the best model because of its low Sic and Aic criteria and forecast error. It has been observed that if you fit a seasonal model, the estimate of our forecast is predictable because of its low Durbin Watson. Dhahran average monthly temperature is better fitted with seasonal ARIMA (2,1,1) (0,1,1) 12 in many ways, from its advantages mentioned above, it also has no serial correlation and its forecast is good as well not predictable beforehand.  I  II  III  IV  I  II  III  IV   2009  2010   LOWERJ1  DLSTEMP  UPPERJ1 YHATJ1