Modeling and Forecasting Daily Temperature Time Series in the Memphis, Tennessee

: Temperature is an essential weather component because of its tremendous impact on humans and the environment. As a result, one of the widely researched parts of global climate change study is temperature forecasting. This work analyzes trends and forecasts a temperature change to see the transient variations over time using daily temperature data from January 1, 2016 – November 3, 2019, collected from a weather station located at the Memphis International Airport. The Mann-Kendall (M-K) test is used to detect time series analysis patterns as a non-parametric technique. The result from the test revealed that the temperature time series increased by 0.0030 °F almost every day, implying that the location is becoming hotter. The other method of analysis is the autoregressive integrated moving average (ARIMA) model, which fits temperature time series using its three standard processes of identification, diagnosis, and forecasting. Considering the selection criteria, The seasonal autoregressive integrated moving average (SARIMA) (3, 0, 0) (0, 1, 0) 365 model is found as appropriate for the studied temperature data on a daily basis. Finally, the selected model is utilized to estimate the next 50 days; after November 3, 2019, the temperature forecast showed an increasing trend. This observed trend provides an understanding of daily temperature change in the studied area for that specific period.


Introduction
Temperature variation resulting from climate change has become a global concern as it is correlated to global warming. The fifth IPCC assessment report revealed that the mean temperature increased by 0.85°C through 1880 to 2012. [1]. Global warming significantly impacts the natural ecology, agricultural production, and human health [2]. Rising temperatures have already intensified drought, flooding, rising sea level, and weather extremes. [1]. Furthermore, temperature variations will delay the onset of the monsoon and cause water loss from the soil, reducing crop productivity and lowering water levels in surface and groundwater [3]. Because of ocean-atmosphere circulation, land cover use, and other linked characteristics, surface air temperature fluctuates more at the regional scale than at the global average scale [4]. However, because the temperature is affected by many climate elements, it is an ever-difficult endeavor to predict the changes in temperature for the projected duration [5]. Therefore, it is required to conduct quantitative analyses of temperature fluctuation to take the appropriate steps to mitigate adverse effects. In temperature forecasting research, time-series analysis is considered as an essential direction [6][7][8].
Since the Mann-Kendall (M-K) test incorporates the better treatment of outliers, It is frequently employed in weather and climate time series data to discover trends. [9,10]. Various trend analysis studies have been carried out at various spatiotemporal scales, underscoring the importance of the M-K test. The M-K test was used in a research in Ethiopia's Woleka sub-basin to detect time series trends of precipitation and temperature. [11]. In a study the Mann-Kendall test, Sen's slope estimator, and linear regression were Time Series in the Memphis, Tennessee used to examine yearly and seasonal temperature patterns, along with temperature extremes. [12]. One study used the M-K test to identify changes in environmental and meteorological features at a Kolkata station from 2002 to 2011, and the M-K test's performance was reliable at the verified significant level [13]. In addition, a study looked at the trend in rainfall time series at Fifteen sites in the Swat River basin from 1961 to 2011 using both non-parametric M-K and SR statistical tests, which provided a daily forecast of the parameters with precision [14]. Given this, it is reasonable to conclude that the standard-Kendall test is widely utilized to assess how parameters change over time.
In time series analysis, projecting values in the later phase are based on previous observations of the variable under examination. Numerous studies in hydrology and meteorology used the ARIMA method to achieve more accurate forecasts, and this method has essentially superseded older statistical techniques [15,16]. Another study used a seasonal ARIMA model for agricultural irrigation and reported achievement of a significant level of model fitting in strategic planning [17]. Likewise, one study used a SARIMA model to assess temperature trend in Assam [18]. Moreover, an investigation used the ARIMA model to forecast monthly mean temperature and discovered a falling trend [19]. A substantial number of studies have successfully comprehended climate parameters and have provided a better understanding of the hydrological system using ARIMA and SARIMA models [4,[20][21][22].
This study is designed to investigate the temperature time series of daily average temperature data to discover trends using the non-parametric M-K test with the ARIMA model technique. The SARIMA model is fitted to daily temperature data (Jan 1, 2016 -Nov 3, 2019). The chosen model is used to forecast temperature for the next 50 days from Nov 4, 2019, to Jan 23, 2020, using Box-Jenkins's technique.

Data Collection and Study Area
The data for the temperature time series came from a weather station located at the Memphis International Airport, as seen in Figure 1. This data represented the local weather of Memphis, Tennessee, and offered the amount of data required to fit the SARIMA model. The data gathered from the station included temperature readings at the daily interval from Jan 1, 2016, to Nov 3, 2019.

Trend Analysis
The Mann-Kendall test implies that data are not normally distributed, and it additionally considers the effect of outliers. As a result, trend analysis frequently employs the nonparametric M-K test and the Sen slope estimator [9,12]. Using a two-tailed test with a 5% significance level [13], the alternative as well as null hypotheses are H 0 = There is no discernible trend in the time series. and H 1 = There is a rising or falling trend. [12]. Thus, the following equations (1) and (2) can be used to determine Mann-Kendall test statistics.
where x i and x k are consecutive data in the series; n is the sample size; ei is the number of ties at the ith value, and m is the number of ties if the value is tied. Z C , the standard test statistic, was calculated as follows: The Z C symbol indicates the trend's direction. A negative Z number indicates a downward trend, whereas a positive ZC value indicates an upward trend [13,23]. The magnitude of the slope (change per day) was determined using Sen's estimator [9,12].

ARIMA Model
ARIMA is the acronym for the autoregressive integrated moving average (ARIMA) model, widely known as the Box-Jenkins model (p, d, q). The order of the autoregressive (AR) is p, the degree of difference is d, and the order of the moving average (MA) is q. [24]. It is almost as if the independent variables in the regression model are the past values of the time series. Equation 4 or 5 can be used to express the general equation [22]. 2 3 4 + 6 2 3 + 6 2 3 + … … +6 8 2 3 8 + 9 3 + : : 3 + : : 3 + ⋯ … : < : 3 < (4) where 6 , 6 , … … … . . 6 8 IJ : , : … … … … … : < are the regression coefficient, Y D is the time series data (temperature), c is the intercept, 6 8 indicates the AR part's order, θ F L indicates the MA part's order, and d indicates the differencing, e D is called the random error amount. If seasonality is considered, then the ARIMA model will become a seasonal autoregressive integrated moving average (SARIMA) model and represented by ARIMA (p, d, q) (P, D, Q) S [19]. S stands for the number of seasons per year, P for the seasonal AR, D for the seasonal difference, and Q for the seasonal moving average.
The first stage in fitting the ARIMA model is to ensure that the time series is stationary. The Augmented Dickey and Fuller (ADF) unit root test are used to determine the stationarity of a time series data set [19]. The test's null and alternative hypotheses are H0: Series has a unit root and H1: Series has no unit root, respectively [19]. The ADF test statistics must be smaller than the crucial value to reject the null hypothesis. The transformation should be done using the differencing procedure if the time series is not stationary [13]. Following the discovery of the stationary time series, the autocorrelation function (ACF) and partial autocorrelation function (PACF) are used to determine the appropriate order of AR (p) and MA (q) [18].
The model coefficients are estimated using the least square approach after the appropriate values of p, d, and q have been determined. The residuals are then examined using a set of criteria, assuming that they are not autocorrelated and normally distributed [24]. Within a 95 percent confidence interval, the residual's ACF should not differ from zero. Furthermore, the histogram of the residual will have a bell shape, indicating that it is normally distributed.
Akaike's information criterion (AIC) and Bayesian information criterion (BIC) are used to select models [13]. Then the model with the least AIC and BIC values is selected as a best-fit model [21]. KLM = 2O − 2 * ln Q (6) RLM = O * ln I − 2 * ln Q where k is the number of model parameters, L is the likelihood function's maximum value, and n is the number of observations. Finally, if the model has been evaluated using the root mean square error (RMSE), the mean absolute error (MAE) or mean absolute percentage error (MAPE) is employed for the predictive capability, as shown in equation 8-10. The minimum value of the RMSE, MAE, and MAPE is ideal for the model's adequacy. The RMSE, MAE, and MAPE equations are presented in the equation (8), (9), and (10).
The value obtained at time t is Y t (obs), the predicted value is Yt (pred), and the number of observations is n. Table 1 shows the descriptive statistics for the temperature time series. The skewness, in this case, is negative, indicating that the left-handed tail is longer than the right-handed tail. The first and third quartiles are 52.60 °F and 79.27 °F, respectively, according to the box plot in Figure 2.  Table 2 shows the M-K test statistics for the time series data. Since the threshold value (p-value = 0.006) is less than 0.05, the M-K tests revealed patterns in temperature time series. Kendall's positive value implies a positive upward trend; hence, the temperature time series, which previously had a tendency, has been demonstrated to have a positive upward trend. According to Sen's slope, the temperature time series exhibits a trend of 0.003 ℉ every day, which is the slope of the trend.   Figure 3 illustrates a time series depiction of daily surface temperature. The data appear to be stationary in the graph. A consistent pattern in the data, on the other hand, suggests seasonality. As seen in Figure 4, this figure is further studied by deconstructing it using the additive method. Figure 4 shows that the data contains a seasonal component and a wavelike structure.

ARIMA Model
As a consequence, instead of the ARIMA model, the SARIMA model is investigated. Moreover, a trend in the time series data is also depicted in the figure. Furthermore, the inclusion of outliers in the data is indicated by the random component.  All The unit root in the daily temperature time series data is checked using the ADF test. The ADF data and accompanying p-value for the ADF test is shown in Table 3. The null hypothesis with a unit root is rejected because the pvalue is less than 0.05. As an outcome, the daily temperature time series data is stationary, and there is no need for the difference. All The ACF and PACF of the daily temperature time series are shown in Figures 5 and 6. The ACF plot in Figure 5 looks like a sine wave, indicating that the data has much seasonality. As a result, the seasonal difference should be considered to eliminate seasonality. PACF may be used to detect the order since ACF shows exponential series decaying to zero, suggesting the autoregressive model exclusively. As shown in figure 6, the PACF is significant at lags 1, 2, and 3, and after lag 3, the PACF shows an irregular pattern by being above and below the confidence limit. Furthermore, there is no discernible seasonal rise between 365 and 730. As a result, the non-seasonal AR term's order is possibly 3, whereas the seasonal AR term's order could be zero.  The resulting model for this daily temperature data is SARIMA (3,0,0) (0,1,0) 365 , with three no seasonal Time Series in the Memphis, Tennessee autoregressive parameters and one seasonal difference, considering the low AIC and BIC values. The estimated parameters for the selected model are shown in Table 4. The table shows that all coefficients are significant because the tstatistics are greater than 1.96 in all situations. Table 5 reveals that the highest RMSE, maximum MAE, and maximum MAPE for the selected model are 6.86, 4.28, and 8.23 percent, respectively. These numbers can be considered when determining whether or not the model is a good fit. The ACF of the residuals has no substantial autocorrelation, as seen in Figure 7. Furthermore, the histogram of the residuals is more or less normally distributed. As a result, the residuals are white noise, indicating that the chosen model can forecast.

Conclusions
The Mann-Kendall (M-K) test and the Box-Jenkins's method dubbed SARIMA were used in this work to determine daily average temperature variability and forecasting. For the Memphis international airport station, Mann-(M-K) Kendall's trend analysis showed a growing upward trend of 0.0030F each day. In addition, the identification and diagnosis for the SARIMA model reveal that the model fits well. The residuals analysis also shows that the model fits all assumptions. Moreover, the accuracy measures validate the model's predictive capacity. The following 50 days of data after November 3, 2019, has been projected using the (3, 0, 0) (0, 1, 0) 365 model. The analysis of this study will give policymakers insight into the rate of temperature change during that period and the scope and extent of possible temperature change.