On Order and Regime Determination of SETAR Model in Modelling Nonlinear Stationary Time Series Data Structure: Application to Lafia Rainfall Data, Nasarawa State, Nigeria

The linear time series model refers to the class of models for which fixed correlation parameters can fully explain the dependency between two random variables, but many real-life circumstances, such as monthly unemployment results, supplies and demands, interest rate, exchange rate, share prices, rainfall, etc., violate the assumption of linearity. For fitting and forecasting of nonlinear time series data, the self-exiting threshold autoregressive (SETAR) model was suggested. Using R to generate random nonlinear autoregressive data, a Monte Carlo simulation was performed, the SETAR model was fitted to the simulated data and Lafia rainfall data, Nasarawa State, Nigeria to determine the best regime orders and/or scheme number to make future forecast. Using Mean Square Error (MSE) and Akaike Information Criteria (AIC), the relative performance of models was examined. At a specific autoregressive order, regime order, sample size and step ahead, the model with minimum criteria was considered as the best. The results show that the best autoregressive and regime orders to be chosen are 3rd and 2nd [SETAR (3, 2)] respectively for fitting and forecasting nonlinear autoregressive time series data with small and moderate sample sizes. As the sample size increases, the output of the four models increases. Finally, it is shown that when sample size and number of steps forward are increased, the efficiency and forecasting capacity of the four models improves.


Introduction
In many real-life circumstances, time series data arises naturally, such as economics, monthly unemployment data, supplies and demands, interest rates, exchange rates, share prices, rainfall, etc. These variables are known to be responsive and affected by many variables, responding quickly to any external intervention, resulting in sudden and drastic behavioral changes.
Linear relationships are the first approximation used to characterize any relationship, according to Akeyede et al. [1], but there is no unique way of describing what a linear relationship is in terms of the underlying essence of the quantities. The nonlinear model is the model class for which the functional form of the dependency between two random variables is more general than linear equation and/or can change over time.
Nonlinear time series models have a much broader variety of potential dynamics for series, such as economic and financial data, rainfall data, etc. Compared to linear models, they are able to capture asymmetry, jump, wave, and other nonlinear behaviors. The self-exiting autoregressive threshold (SETAR) model is suggested, this class of nonlinear models are increasingly used in time series analysis to describe and forecast different empirical phenomena in an observed time series as it is helpful in capturing nonlinear dynamics, this could be seen in the works of the time series (Tong and Yeung [16], Watier and Richardson [20], Time Series Data Structure: Application to Lafia Rainfall Data, Nasarawa State, Nigeria Grabowski et al. [9]).
In relation to the traditional linear modeling approach, the essence of these classes of models is examined using simulated data and rainfall data for Lafia obtained from the Nasarawa State Meteorological Department, Nigeria. The statistical characteristics and forecast performance of the 2regime SETAR models with long-term storage in the first and short-term storage in the second were extensively examined in order to locate the threshold parameter using stock indices and individual asset prices (Tong [15]; Hansen [10]; Clements and Smith [2]; De Gooijer [3]; Dufrenot et al. [5]).
The SETAR models introduced by Tong [12] that were more thoroughly developed in the seminal paper by Tong and Lim [13] belong to the class of nonlinear models that have been increasingly used in the study and forecasting of time series as they are useful for adequately capturing nonlinear dynamics (Grabowski et al. [9]). In terms of expansion of autoregressive models, they can be considered, allowing for changes in the parameters of the model by regime switching behavior. The SETAR model in the space of the threshold variable is a piecewise linear autoregressive model. There are many approaches to SETAR model estimation that differ in their ability to estimate the hyper-parameters and to measure SETAR models of high order. The methods widely used are; the approach of Tsay, which emphasizes graphical analyses to define thresholds and, Hansen's methodology covers 2 and 3 order models in depth, which helps the thresholds to be calculated.
The success of SETAR models is because, compared to many other nonlinear time series models, they are relatively easy to specify, estimate, and interpret (Tong [17]). In the variable's relationship, the common empirical time series modeling assumes linearity and stationarity. Applied time series analysis, however, finds it difficult to assume this linearity in data simply because arguments have been raised that nonlinear specification can reflect data generation processes more realistically (Franses and van Dijk [7]). In the analysis of nonlinear time series results, this research therefore determines the best scheme and autoregressive orders for SETAR models.

Methodology
Using the Monte Carlo algorithm, we generate random data on nonlinear autoregressive processes from normal distribution to test the output of the proposed SETAR model. The importance of the choice of the proposed model is demonstrated by an empirical application on the Lafia rainfall data from Nasarawa, Nigeria. The simulation is the realization of a simple two-stage SETAR model produced to identify nonlinear phenomena and sample analytical processes that have been performed using R statistical software for the following sample sizes: 20, 40, 60, 80, 100, 120, 140, 160, 180 and 200. For all possible parameters, the parameters of the SETAR (p, d) order model (p) and the regime number (d) have been defined.
In modeling rainfall results, the empirical application will demonstrate the relevance of the choice of process. Mean Square Error (MSE) and Akaike Knowledge Criteria (AIC) were used to determine the consistency of fit for each model, and the results of the study were presented in Tables 1-3 along with their respective graphs in Figures 1-9. Simulation was done with the trigonometric function given as follows under the assumption of stationarity from second-order nonlinear autoregressive processes; The current value of the Y series is a nonlinear combination of its self's most recent past values plus an "innovation" concept that integrates something new in the series that is not explained by the previous values. For each t, therefore, we assume that is independent of , , , ⋯(Akeyede et al., 2015).

The SETAR Model
Order p's self-exiting threshold Autoregressive (SETAR) model belongs to the autoregressive threshold (TAR) family, which is important for nonlinear time series modeling. Such models are a relatively simple relaxation of standard autoregressive linear models that allow a number of states to be approximated linearly. According to Tong [15], by decomposing the one-dimensional Euclidean space into k regimes with a linear autoregressive model in each regime, the threshold principle allows the analysis of the complex stochastic system; this method makes the model nonlinear for at least two regimes but remains locally linear (Gibson and Nur [8]). In this research, the class of SETAR models considered are classified as; where the superscripts indicate states of the world or regimes in the models. It is assumed that a linear autoregressive process follows the dynamic behavior of the time series variable within each regime, the subscript in the models indicates the autoregressive order, r is the threshold value, " # is the threshold variable that governs the transition between the two regimes with d being the delay parameter that is a positive integer ($ < &) and (') white noise processes that are random variables with zero mean and constant variance distributed separately and identically i.e.
The threshold parameters satisfy the constraint −∞ = r < r < r … < r 3 < r 3 = ∞ . The mechanism that is operating at any moment depends on the measurable past history of 4Y 5 itself and, in particular, on the importance of ( − $). In equation (2) to (5), Tong and Lom [13] referred to the system as self-exiting threshold autoregressive models. The benefit of using SETAR models is their ability to generate some widely observed phenomena that are not capable of being captured by simple linear models that provide irreversibility, hops and limit intervals, such as the autoregressive moving average (ARMA) model.

Parameters Estimation
The most commonly used approach for parameter estimation under the SETAR model is conditional least squares (CLS), according to Gibson and Nur [8]. In this method, in order to obtain parameter estimates, the predictive number of square errors is minimized. First, let 6( ) < ∞, = 1,2, …. and ∅ = +: , : , ⋯ : ; ; = , = , ⋯ , = ; ; -. . This provides the requisite details for the estimation of the conditional least squares, and it is possible to estimate for ∅ by minimizing the residual number of squares in relation to ∅ such that: Firat [6] defined the necessary parameter estimation steps for the SETAR model as follows; Step 1 As a first step, it is assumed that the d and γ, the delay and threshold parameter values are known. The observed values are divided into small sub-groups based on these assumptions, and the AIC data criterion for each sub-group is determined at the level of & G () = 1, … , H) and is shown as follows; The & G value of each scheme is obtained in this situation, using M)NIJK(H G ) in exchange for the constant values d and γ.
Step 2 The d value is kept constant in the second step (it is presumed to correspond to a certain value; in other words, it is known), and the threshold parameters that minimize the value of the AIC data criterion are checked this time. Among the other threshold parameters, this is the γ value that minimizes the AIC (d, γ) value, which is shown in Tong [14] as follows; Step 3 The & G :N$ Q values are calculated in the first two steps. The value of d will be calculated in the remaining third stage. From the d option, the d value that minimizes the NAIC(d) value will be found in the k number. The model will be calculated by conditioning on the above-mentioned parameters after the 3 measures are evaluated using the data criterion.

Test of Stationarity
Stationarity in the study of time series data is a critical principle. Generally speaking, if there is no systematic change in the mean (no trend) and variance, and if periodicity is removed, a time series is said to be stationary. In other words, the properties of one segment of the data are just like those of every other section (Tsay [19]). To highlight these important associated statistical characteristics, consider the simple p th autoregressive [AR(p)] model to demonstrate these relevant statistical properties associated with the autoregressive unit root test, where ~*(0, -) and S : |∅| = 1 VW S : |∅| < 1 then the test statistic is, where ∅ X is the least square approximation and the normal standard error estimate is [6(∅ X ) and the measurement is a one-sided left tail test. If " is stationary (). . |∅| < 1), it can be demonstrated that ∅F ~*(0, 1) . Similarly, Dicky and Fuller [4] developed the unit root test in which the null hypothesis is ∅ = 0 against the alternative hypothesis of ∅ < 0. A value for the statistics for the test compared to the relevant critical value for the test of Dickey and Fuller, this is computed. If the test statistic is lower than the critical value, the ∅ = 0 null hypothesis is rejected and there is no unit root presented.

Test of Nonlinearity
Before applying possible nonlinear model, we first perform linearity test against nonlinearity, this pretesting for nonlinearity is important to help protect against over-fitting the data. In this analysis, we therefore consider the modification by Tsay [18] of Keenan's One-Degree Test for Nonlinearity (Keenan [11]), where the F-Test is modified by replacing the aggregated quantity with the disaggregated variable G , ' ; ),^= 1, 2, ⋯ _ where M is specified in the test of Keenan. The protocol for the F-test is as follows: i) Regress cN (1, , ⋯ , d ) and then calculate the fitted values ( ) and the residuals ( ̂ ) ec = _ + 1, ⋯ N, hence the regression model is; where the summation is over e cM _ + 1 c N and ] n is asymptotically distributed as ] d,{ | d

Forecasting
Different procedures exist to forecast the linear and nonlinear time series models and the SETAR model forecasts could be interpreted as enhancing the ARIMA model in order to compare these forecasts. Rising values could suggest an attempt by the model to capture the variance in the series more efficiently and represent overall movements in the process. In general, it should be noted that if a model better describes the characteristics of time series with an in-sample fit, there is no guarantee that it will also make better forecasts (Franses and Dijk [7]). Computing point estimates from models of nonlinear time series require complicated computations.
Consider , a nonlinear autoregressive lag duration one model, the forecasting phase begins with a sample of process values before say, , , … . , , our observed data reflects these values. Our forecasting model forecasts future process values, i.e. • , • , •! , …. Generally speaking, •€ is the value of the forecasting process at lead time h ahead of the prediction at lead time t where ℎ > 1.

Results and Discussions
Monte Carlo simulations were performed to investigate SETAR (p, d) efficiency, where p, d = 2, 3, i.e., in the fitting and forecasting of the simulated nonlinear autoregressive model, SETAR (2, 2), SETAR (2, 3), SETAR (3, 2) and SETAR (3, 3) models. On the nonlinear simulated results, the effect of sample size was examined. Under the SETAR (p, d) model, the best order (p) and regime number (d) were calculated. The method was also carried out for Nasarawa State's 10-year monthly Lafia rainfall data up to 2017. For more clarification, the results of the analyses are reported in Tables 1-3   The plots of the MSE and AIC for the SETAR (p, d) model for different sample sizes are shown in figures 1 and 2. It can be seen from the graphs that SETAR (3,2) from sample sizes 20 to 180 is the best model to suit nonlinear results, followed by SETAR (2, 2) on the basis of both MSE and AIC criteria. However, for sample size 180 and above, SETAR (3,3) outperforms other choices (larger sample sizes). SETAR (2,3) for all sample sizes is the worst model observed.  Therefore, 3rd and 2nd autoregressive and scheme orders are the best autoregressive and scheme orders to be chosen for fitting nonlinear autoregressive time series data with small, moderate and large sample sizes (up to 180) respectively, while those with larger sample sizes (above 180) can be fitted with 3rd autoregressive and scheme orders respectively. The performance of the four models also improves with the minimum value of MSE and AIC as the sample size increases.  Table 2 and the plots shown in Figures 3-8 showed a comparison of the forecast capacity between the four fitted models and the simulated nonlinear autoregressive data for samples of sizes 20, 100 and 200.  The results in Table 2 show that SETAR (2,2) performs better for sample size 100 while SETAR (3,2) performs better for sample sizes 20 and 200 except for d = 50. Figures 3-8 display the plots of the findings. From the plots, it was observed that SETAR (3,2) and SETAR (2,2) have the best forecast from steps 10 and 50 ahead of SETAR (2, 3) and SETAR (3, 3) based on both MSE and AIC. SETAR (2, 3), followed by SETAR (3,2), are the worst prediction models. Therefore, the best autoregressive and scheme orders to be chosen at any stage of steps ahead (i.e., from 5 to 50 steps ahead) and sample sizes of 20, 100 and 200 (small, moderate and high) to forecast nonlinear autoregressive time series data are 3rd, 2nd and 2nd, 2nd autoregressive and scheme orders, respectively. When the steps ahead are increased, the forecasting potential of the four models increases.

Analysis of Rainfall Data
Before fitting to the nonlinear SETAR models, the Lafia rainfall details, Nasarawa, Nigeria was first tested to determine whether it is linear or not. In measuring nonlinearity using R statistical software, the Tsay F-statistic is used. At 12 orders of auto-regression (p = 0.4138), the null nonlinearity hypothesis is not dismissed. Thereafter, to decide the best model as shown in Table 3, the data was adapted to the class of SETAR models being considered.
The results of the analysis in Table 3 show that SETAR (3, 2) is the best fit for rainfall data, followed by SETAR (2,2) with the lowest MSE and AIC values of (1320.1870 and 865.8924) and (1392.4540 and 868.1811). It also shows that, with lower standard errors, the parameter estimates of the two best models are important.   (3,2) and (2,2)].

Conclusion
As shown in the study, the results of this research show that SETAR (3,2) and SETAR (2,2) are the best fitting models for both simulated and real-life data for small, moderate and large sample sizes. SETAR (3,2), followed by SETAR (2, 2) at various stages ahead, are also the best forecast models. Although SETAR (3,3) or higher can be equipped for those with broad sample sizes and higher steps ahead. In addition, SETAR (3, 2) is the best fit for the monthly rainfall data from Lafia, Nasarawa, Nigeria, followed by SETAR (2,2).