Time Series Analysis of Monthly Average Temperature and Rainfall Using Seasonal ARIMA Model (in Case of Ambo Area, Ethiopia)

Forecasting mean temperature and rainfall is an important for planning and formulating agricultural strategies. Thus, this paper, try to analyze and forecast monthly mean temperature and rainfall in Ambo area on the data from January 2012 to March 2019. From graphical analysis on time plot and ACF, the series seems to have a seasonal component. For that purpose, a Seasonal Autoregressive Integrated Moving Average (SARIMA) models were used to estimate and forecast the average monthly temperature and rainfall in the Ambo area, Ethiopia. Among the competitive tentative model, SARIMA (2, 0, 1) (2, 0, 1) 12 and SARIMA (1, 0, 1) (1, 0, 1) 12 model are the best time series model for fitting and forecasting mean temperature and rainfall, respectively. Moreover, the model diagnostic test on the residuals of SARIMA (2, 0, 1) (2, 0, 1) 12 and SARIMA (1, 0, 1) (1, 0, 1) 12 on mean temperature and rainfall satisfies the randomness, independency, normality and constant variance (homoscedasticity) assumptions. Finally, SARIMA (2, 0, 1) (2, 0, 1) 12 and SARIMA (1, 0, 1) (1, 0, 1) 12 were used to forecast mean of monthly temperature and rainfall from the period April 2019 to March 2023.


Introduction
Climate change has become a critical issue for policy makers, climate researchers, politicians and the public around the world. Climate variability is regarded as the deviation of seasonal and annual climate parameters (i.e., rainfall, temperature, humidity, precipitation etc.) from the long-term observations mean. The long-term continuous temporal change and/or trends in annual, seasonal, and monthly climate parameters are regarded as indicators of potential climate change impacts [1].
Climate variability and change are among the major environmental challenges of the 21st century. Successive reports of the Intergovernmental Panel on Climate Change [2] and various other studies [3,4] show that climate change is having versatile effects on development particularly on agriculture.
The consequences of climate variability and climate change are potentially more significant for the poor in developing countries than for those living in nations that are more prosperous. Africa is one of the most vulnerable continents to climate change and variability. It has more climate sensitive economies than any other continent with 50% of its population living in dry land areas that are drought-prone. In addition, its agricultural sector contributes an average 21% of GDP in many countries, ranging from 10% to 70% [5].
Ethiopia is heavily dependent on rain-fed agriculture, and its geographical location and topography in combination with low adaptive capacity entail a high vulnerability to adverse impacts of climate change. Regional projections of climate models indicate a substantial rise in mean temperatures in Ethiopia over the 21 st century and an increase in rainfall variability, with a rising frequency of both extreme flooding and droughts due to global warming. Given its large role in income and employment, agriculture also acts as a transmission chain of climate shocks towards other sectors of the economy. Ethiopia is historically prone to extreme weather events. Rainfall in Ethiopia is highly erratic, and most rain falls in convective storms, with very high rainfall intensity and extreme spatial and temporal variability [6].
In this regard, several studies have been conducted to the analysis the pattern and trend of climate variation in various regions of the world using different time series methods. Among the common models are an autoregressive-integratedmoving average (ARIMA) and seasonal Autoregressive Integrated Moving Average (SARIMA) models [7].
Therefore, this paper addresses the shortcomings in analytical literature about climate change using seasonal ARIMA model to analyze the pattern and trend, to fit an appropriate model, and forecast the future value of of climate data (mean temperature and rainfall) in Ambo area, Ethiopia.

Literature Review
George Box and Gwilym Jenkins popularized Autoregressive Integrated Moving Average models (ARIMA models) in the early 1970s. ARIMA models are a class of linear models that is capable of representing stationary as well as non-stationary time series. The models rely heavily on autocorrelation patterns in the data. ARIMA methodology of forecasting is different from most methods because it does not assume any particular pattern in the historical data of the series to be forecast, rather it uses an interactive approach of identifying a possible model from a general class of models. in this regard, there are several researches that have done on weather variability using an ARIMA type models. Among the common researches, few are discussed below.
Tektaş, M. tries to forecast the weather of Göztepe, İstanbul, Turkey using a data from 2000-2008 comprising daily average temperature (dry-wet), air pressure, and windspeed using Auto Regressive Moving Average (ARIMA) models. The paper explains briefly how neuro-fuzzy models can be formulated using different learning methods and then analyzes whether they can provide the required level of performance for a reliable model for practical weather forecasting. The results the most suitable model and network structure are determined according to prediction performance, reliability and efficiency. The performance comparisons of the models are evaluated using RMSE (Root-Mean-Square error) criteria [8].
similarly, Ademola, A. et al. investigate statistical modeling of monthly rainfall in selected stations in forest and savannah Eco-climatic regions of Nigeria using Autoregressive Integrated Moving Average (ARIMA) and Seasonal Autoregressive Integrated Moving Average (SARIMA) models were used. The results showed that the model fitted into the data well and the stochastic seasonal fluctuation was successfully modeled. They concluded that Seasonal Autoregressive Integrated Moving Average (SARIMA) model was a proper method for modeling and predicting the monthly rainfall. The results are useful for forecasting the pattern of rainfall in the study area and provide information that would be helpful for decision makers in formulating policies to mitigate the problems of water resources management, soil erosion, flooding, and drought [9].
Following similar literature like discussed above, we try to apply Seasonal Autoregressive Integrated Moving Average (SARIMA) model to analyze and forecast the mean temperature and rainfall in Ambo area. However, this study is differ from the previous studies in that it is considered the first study applied in analyzing and forecasting of weather data in Ambo area using an SARIMA model.

Study Area and Data
The study was conducted in the Ambo area, West Shoa Zone of Oromia regional state, Ethiopia. West Shewa Zone is located in the western direction of Addis Ababa the capital city of Ethiopia. Ambo is the capital city of west shoa zone which is 114 km away from Addis Ababa lying between 80 47' -90 21' North latitude and 37 32' -38 3' East longitude. The Zone has a unimodal rainfall patter having one significant rainy season and one peak rainfall. The main rainy season is from June to September and the short rainy season is from the end February to the beginning of May. In this study, monthly average temperature and rainfall were used. The data on temperature and rainfall were collected from Ambo University metrological station of Ethiopia. The dataset consists of (87) monthly observations from January 2012 to March 2019 on global near surface mean temperature and rainfall. Temperature is measured in degrees Celsius, while data on monthly rainfall is measured in milliliter (mm).

Stationary Test
The foundation of time series analysis is stationary. Stationary series are characterized by a kind of statistical equilibrium around a constant mean level as well as a constant dispersion around that mean level [10]. The series could be non-stationary because of random walk, drift, or trend.
Several statistical tests may be conducted to determine whether a series is nonstationary (unit root). Dickey-Fuller unit root test which used to test an AR (1) process is among the common unit root. However, if the series is correlated at higher order lags, the assumption of white noise disturbances is violated. Thus, the Augmented Dickey-Fuller Test (ADF test were used since it controls higher-order correlation by adding lagged difference terms of the dependent variable to the right-hand side of the regression. However, the two common problems in performing the ADF test are specify the number of lagged first difference terms and the choice of including a constant, a constant and a linear time trend, or neither in the test regression. A specification of ADF test with drift (constant) in the test regression: A specification of ADF test with drift (constant) and trend in the test regression: While if the series seems to be fluctuating around a zero mean, we should include neither a constant nor a trend in the test regression.

Model Specifications
One of the most popular and frequently used stochastic time series models is the Autoregressive Integrated Moving Average (ARIMA) developed by Box, G. Jenkins model in (1970). The basic assumption made to implement this model is that the considered time series is linear and follows a particular known statistical distribution, such as the normal distribution.
ARIMA model has several components, such as the Autoregressive ( I. Autoregressive (AR) Models Autoregressive (AR) refers to when the value of a series at a current time period is a function of its immediately previous value plus some error.
A general p th -order autoregressive or AR (p) process would be written as follows: where is the actual value of the series at time period t, , are coefficients, p is a non-negative integer that indicates the lag length and is assumed to be a white noise error term.
II. Moving-Average (MA) Models An Moving-Average process refers to random error, innovation, or shock, , at a previous period plus a shock at current time, t, drives the series to yield an output value of at time t.
A more general q th -order moving average or MA (q) process would be written as: where q is a non-negative integer refers to the order of the model, ! are coefficients with the constant term, and { } is assumed to be a white noise with mean zero and constant variance " # error term or shock. III. The Autoregressive Moving Average (ARMA) Models An autoregressive moving average, ARMA (p, q) model is a combination of autoregressive AR (p) and moving average MA (q) models and is suitable for modeling a univariate stochastic time series. Autoregressive Moving average model, ARMA (P, Q) process is given by: where the model orders p, q refer to p autoregressive and q moving average terms. While is zero mean white noise. IV. Autoregressive Integrated Moving Average (ARIMA) Models The ARMA models, described above can only be used for stationary time series data. However, in practice many time series show non-stationary behavior. Because the Box-Jenkins method is an analysis in the time domain applied to stationary series data, it is necessary to consider the basis of non-stationarity, with a view toward transforming series into stationarity. When a non-stationary series is characterized by a random walk, each subsequent observation of the series randomly wanders from the previous one. The random walk model can be stationary after differencing and called integrating order d. Thus, the basic processes of the Box-Jenkins ARIMA (p, d, q) model include the autoregressive process, the integrated process, and the moving average process.
Mathematically, the ARIMA (p, d, q) model using lag polynomials is given by: where the three parameters are: -=number of difference required for Stationarity, .=order of the AR component and /=order of the MA component.

V. Seasonal Autoregressive Integrated Moving Average (SARIMA) Models
Seasonality usually causes the series to be non-stationary because the average values at some particular times within the seasonal span (for example, month) may be different from the average values at other times. Box and Jenkins (1970) have generalized ARIMA model into Seasonal ARIMA (SARIMA) model to deal with seasonality. 012341(., -, /) × (6, 7, 8) 9 Model in terms of lag polynomials is given by: where p=non-seasonal AR order, d=non-seasonal differencing, q=non-seasonal MA order, P=seasonal AR order, D=seasonal differencing, Q=seasonal MA order, and S=time span of repeating seasonal pattern.

Lag Length Selection
ACF and PACF Autocorrelation (AC) and partial autocorrelation (PAC) function are a type of graphs that contain correlations of different time lags. They can be used to determine whether the series are stationary or not, have seasonal pattern and to identify the number of components (lags or parameters) in a SARIMA model. The number of significant spikes in the ACF indicates the number of MA parameters in the model, while the number of significant spikes in PACF indicates the number of AR parameters in the model.

Portmanteau Test
Time series applications often require testing jointly that several autocorrelations of the series are zero. Box and Pierce (1970) propose the Portmanteau statistic given by: where T=sample size and m=lag length A modify the 8 (;) * ) statistic by [11] to increase the power of the test in finite samples is given: The decision rule is to reject Ho if 8(A) > I2 J ⁄ , where I2 J ⁄ denotes the 100 (1−α) the percentile of a chi-squared distribution with m degrees of freedom.

Model Selection
A crucial step in an appropriate model selection is the determination of optimal model parameters. According to [12], among the common criteria of model selection in time series analysis is to use the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC).
The optimal model order is chosen by the number of model parameters, which minimizes the information criteria

Model Estimations
Maximum likelihood estimation is commonly used to estimate ARMA family models. Maximum likelihood estimation usually begins with a likelihood function to minimize or maximize. A likelihood function is a probability formula. When observations are independent of one another, the probability of the multiple successive occurrences is the product of their individual probabilities.
The natural log of the likelihood function given the parameter vector (Z) is given by:

Model Diagnostic
After estimation of the model, the Box-Jenkins model building strategy entails a diagnosis of the adequacy of the model. More specifically, it is necessary to ascertain in what way the model is adequate and in what way it is inadequate.

Test of ARCH Effect
It is a test for determining whether 'ARCH-effects' are present in the residuals of an estimated mean model.

I. Engle Lagrange multiplier test of the ARCH effect
The Lagrange multiplier test of Engle (1982) is equivalent to the usual F test [15]. Testing for the presence of ARCH errors of Engle (1982) Lagrange Multiplier test involves regressing the squared residuals from the best fitting ARIMA on a constant and the lagged residuals. If there are no ARCH effects present, the individual coefficient estimates should equal zero, and the joint effects should be slight or non-existent.
II. The Lung-Box test applied to the ARCH effect The second test is to apply the usual Ljung-Box statistics 8(A) to the ( ). The null hypothesis is that the first m lags of ACF of the series are zero. To test the ARCH effect, we can apply the Ljung-Box test which were developed by Box and Pierce (1970) and modified by Ljung and Box (1978) and tests the joint significances of serial correlation (heteroscedasticity) in the standardized and squared standardized residuals for the first m lags instead of testing individual significance to squared residuals of the model.
where T denotes number of observations. _ ? is correlation coefficient for squared errors with its lag of order m. The computed test statistics has under the null ` (A) distribution. The null of no ARCH effect) is to be rejected if the computed value of the test statistics is greater than the appropriate critical value.

Model Forecasting
One of the primary objectives of time series analysis is to forecast future value of the series, especially forecasting in weather data is great importance. In this regard, a decision needs to be made at current time t and the optimal decision depends on expected future value of a random variable, a , the value being predicted or forecast. The number of time points forecast into the future forecast horizon is called the lead time, h. The value of the random variable for such a forecast is the value of a . A forecaster would like to obtain a prediction as close as possible to the actual value of the variable in question at the concurrent or future temporal point of interest.

Results and Discussions
The monthly data used in this study covers from January 1, 2012 to March 1, 2019, which were collected from Ambo University Meteorological Agency.

Graphical Analysis
Both Tables shows the existence of consistent pattern of short-term fluctuation for the data that indicate the existence of seasonality on both variables. The overall mean temperature and rainfall during the studied period January ARIMA Model (in Case of Ambo Area, Ethiopia) 2012 to March 2019 appears to exhibit a slight (almost no) trend. Thus, from both figure, we observe that the series are stationary at level. Therefore, the time plot gives us a clue for the property of the series, but not the end since we need to conduct a formal unit root test for testing stationarity condition of the series.

Descriptive Statistics for Average Temperature and Rainfall Data
From the

Stationarity Test
From the Table 2, we observe that the series is stationary at level with trend and constant term

Model Identification (Selection) Results
The first step in Box-Jenkins methodology is to identify (select) the appropriate model. In this study, to identify the model (based on lag structure), the correlogram, autocorrelation and partial autocorrelation function were used.
On Figure 5, the Auto correlation function (ACF) and Partial Auto correlation function (PACF) on average temperature and rainfall were tested up to 72-lag interval. From the AC function of both graph, we observe that there is a fast decaying in AC values of the which indicates the stationarity condition of rainfall and temperature. From the AC function, we should to use 1 upto 2 lag for MA model since we have two significnt spike, but nothing else beyond that. Moreover, we have very strong first lag in PACF of both graphs while everything is died off. That is we have two lag for AR model for rainfall and one lag for mean temperature.
On Figure 5, the shadow line indicates the 95% confidence interval. Form theoretical view, there should be a fast decline in AC function for stationary AR model. Moreover, there are only positive autocorrelation coefficients since the first lag of the AC and PAC function is upward.
The ACF and PACF show that both temperature and rainfall have periodic in nature. These functions behave similarly in their period cycles involving seasonal variations. Thus, given time series is periodic and involve seasonal variations, we need to apply a model that captures the seasonal component of order 12. Therefore, a seasonal ARIMA models for the prediction of average temperature, and rainfall should be used. Following the Box-Jenkins technique, we depend on ACF or PACF plots to select the order of the seasonal model [16].
From the Figures 5a and 5b on average temperature data, we can choose our model based on the ACF and PACF spikes at low lags. To determine the nonseasonal AR terms, we look at the PACF, which shows clear spikes at lags 1, 2 and 3. Thus, the nonseasonal AR terms are determined to be of order 3. There is one spike at lags 1 in ACF, so we have one term for nonseasonal MA. Now for the seasonal part of the model, in this case, we look at lags 12, 24, 36, and 48 for both ACF and PACF. From the PACF we indicate that there are two significant spikes at lags 12 and 36; thus, the order of the seasonal AR is two. In the ACF, there are one spikes at lag 12, this means that the order of the seasonal MA is one.   By the same analogy, we can select a best model for rainfall based on AC and PAC function on figure 5c and 5d. From the PAC function of 5d, the nonseasonal component of MA term has 2 lags and the seasonal component of MA term has one lag. Moreover, from figure 5c, the nonseasonal AR term has one lag and the seasonal AR tem has 2 lags (lag 12 & 36).
Therefore, based on identified lag length of ACF and PACF, we should select the best possible models ('parsimonious' models) to represent the original series given below. However, in order to select appropriate model among the tentative model, we have used the highest log-likelihood statistic, the lowest Akaike information criteria (AIC) and Shwarz information criteria (SIC).

Parameter Estimation
The coefficients of autoregressive, moving average, seasonal autoregressive, and seasonal moving average were estimated using maximum likelihood estimation methods. The coefficients of AR terms are less than one, which indicates Stationarity of the series. From Tables (5 & 6), we observe that all coefficients are statistically significant as indicated by p-value, except seasonal autoregressive (SAR) in mean temperature model and non-seasonal moving average (MA) model in mean rainfall model. Therefore, the overall performance of the model is good in significance of coefficients.

Model Diagnostics Test
In time-series modeling, the selection of a best model fit to the data is related to whether the residual from the best fitted model is performed well. One of the basic assumptions of the SARIMA (seasonal ARIMA) model is that, for a good model, the residuals must follow a white noise process; that is, the residuals have zero mean, have constant variance (homoscedasticity), and are uncorrelated with past values. The special case of this process is the residuals should be normally distributed and follows a Gaussian white noise process. It is such a process that we test for here. Since the model diagnostics were performed through careful examination of SARIMA (2, 0, 1) (2, 0, 1) 12 model for mean temperature and SARIMA (0, 1, 2) (0, 1, 1) 12 model for mean of rainfall, we have used the residuals plot over time, periodogram, Portmanteau test, histogram and ARCH LM test on the residuals of the best fitted models.   From Table 7, we observe that the the p-value is almost more than 5%, which indicates, fail to reject the null of white noise residuals (no serial correlation, homoscedasticity and Stationarity).

Model Diagnostic for the Fitted Model of Mean
Portmanteau test  We cannot reject the null of white noise residuals since the p-value (0.9677) is greater than 0.05 (5%). Therefore, the residual is independently or randomly distributed which is a desirable assumption of SARIMA model.

II. Stability Condition
The graph produced by invers roots of ARMA polynomial displays the eigenvalues with the real components on the Xaxis and the imaginary components on the Y-axis. Since all the eigenvalues lie inside the unit circle, the AR parameters satisfy stability condition and the MA parameters satisfy invertibility condition. Thus, the MA process is invertible and can be represented as an infinite-order AR process.
Source: Author's Computation

IV. Heteroscedasticity
The ARCH LM test was conducted to test the existence of heteroscedasticity on the residuals of the fitted model. Since the p-value in both F-test and observed R-squared is greater than 5% we fail to reject the null of no ARCH effect (no heteroscedasticity) in the residual of the fitted model for mean temperature. Therefore, our model passes the basic assumption of constant variance error term.

Model Diagnostic for the Fitted Model of Mean
Rainfall I. Randomness of the Residual from the fitted model From figure 10, we observe that the residual is randomly distributed and almost have a stable mean.
Source: Author's Computation  The ACF plot of the residuals of the fitted model ( Figure 11) shows that the residuals are relatively small and not statistically significant. Therefore, it can be considered that the residual of the fitted model of mean rainfall is randomly distributed.
Portmanteau test As we observe from Table 9, we cannot reject the null of white noise residuals since the p-value (0.1428) is greater than 0.05 (5%). Therefore, the residual is independently or randomly distributed which is satisfies the basic assumption of white noise residuals in SARIMA model.  IV. Heteroscedasticity From Table 10, the p-value of ARCH LM test on the results of the fitted model of mean rainfall is greater than 5%, which indicates that we fail to reject the null of constant variance (homoscedasticity). Based on the above detailed analysis of residuals, it can be confirmed that the selected SARIMA (2, 0, 1) (2, 0, 1) 12 and SARIMA (1, 0, 1) (1, 0, 1) 12 model satisfies all the diagnostic tests for modeling and forecasting mean temperature and rainfall, respectively. Hence, the two models (SARIMA (2, 0, 1) (2, 0, 1) 12 & are SARIMA (1, 0, 1) (1, 0, 1) 12 are considered as the best model for forecasting the upcoming monthly temperature and rainfall, respectively, in Ambo area in a given period of time.

Forecasting Temperature Using the Best Fitted SARIMA Model
We have used a data from January 2012 to March 2019 to estimate the model and data from April 2019 to March 2023 for forecasting period as shown on Figures (14 & 15). It appears from Figures (14 & 15) that the best selected model is very well suited for forecasting the future value of the Ambo area mean temperature and rainfall since the forecasted value (blue line) have similar pattern with the actual vale (noncolored line). From the figure, we observe that almost the series have a stable value on the forecast period and shows that the estimated forecast means temperature and rainfall were identical or very close to the actual real data. The pattern of mean temperatures and rainfall in Ambo area from April 2019 to March 2023 were observed to be stationary (fluctuate around a constant mean), and hence does not follow any different pattern than the actual series.
Source: Author's Computation

Conclusion
Time series analysis is an important technique in analyzing and forecasting weather variables like temperature and rainfall. In this study, a monthly temperature and rainfall data obtained from Ambo University metrological station on the period from January 2012 to March 2019 were used to analyze the series. The seasonal autoregressive integrated moving average (SARIMA) model was used to analyze and forecast monthly mean temperature and rainfall of the study area.
The model diagnostics were performed through careful examination of the residuals from the best-fitted model, i.e. SARIMA (2, 0, 1) (2, 0, 1) 12 for monthly mean temperature and SARIMA (1, 0, 1) (1, 0, 1) 12 for monthly mean rainfall. The residuals were found to be following a white-noise process with a mean of zero and a constant variance, hence uncorrelated. Based on the best-fitted model, monthly mean temperature and rainfall for the next four years (from April 2019 to March 2023) were forecasted and seems to be slightly stable.

Recommendation
Based on the results of the study the following recommendations were forwarded to the concerned stakeholders.
Since the SARIMA models fitted and forecasted weather variables (in our case, monthly mean temperature and rainfall) appropriately in Ambo area, Ethiopia, so any concerned bodies can use as an input (information). Moreover, the researcher recommends using such model for analysis of similar data in other area. However, uncertainties are in weather data, the result by itself might become indecisive. Therefore, further research based on other model is suggested for better results by the researcher.

Conflicts of Interests
The author's declare that he have no competing interests.