Identification and Modeling of Outliers in a Discrete-Time Stochastic Series

This study was prompted by the fact that the presence of outliers in discrete-time stochastic series may result in model misspecification, biases in parameter estimation and in addition, it is difficult to identify some outliers due to masking effects. However, the iterative approach which involves joint estimation of outliers effects and model parameters appears to be a panacea for masking effects. Considering the dataset on credit to private sector in Nigeria from 1981 to 2014, we found that ARIMA (1, 1, 1) model fitted well to the series without considering the presence of outliers. Using the iterative procedure method to reduce masking effects, the following outliers, IO (t = 24), AO (t = 33) and TC (t = 22) were identified. Adjusting the series for outliers and iterating further, ARIMA (2, 0, 1) model alongside AO (t = 33) and TC (t = 22) outliers was found to fit the series better than ARIMA (1, 1, 1) model. The implication is that in the presence of outliers, ARIMA (1, 1, 1) model was misspecified, the order of integration was wrong and by extension, autocorrelation and partial autocorrelation functions were misleading, and the estimated parameters were biased.


Introduction
In statistical analysis, it is good practice to inspect the data at every stage of the analysis for extreme or unusual observations and such observations are called outlier observations (Fuller, 1996). In Chen and Liu (1993), the usual stochastic model is designed to grasp the homogeneous memory pattern of a discrete-stochastic series, the presence of outlying observations or structural changes raise the question of efficiency and adequacy in fitting general autoregressive moving average (ARMA) models to stochastic series. Thus, outliers in a discretestochastic series can adversely affect data analysis. According to Wei (2006), outliers are known to wreak havoc in data analysis, making the resultant inference unreliable or even invalid. Also, in Tsay (2010), outliers can seriously affect discrete-stochastic series analysis because they may induce substantial biases in parameter estimation and lead to model misspecification. Similarly, Box, Jenkins and Reinsel (2008) maintained that the presence of outliers in discrete-stochastic series can have substantial effects on the behavior of sample autocorrelations, partial autocorrelations, estimates of ARMA model parameters, forecasting, and can even affect the specification of the model (see also Chen and Liu, 1993). Ledolter (1989) specified that outliers affect the forecasts from ARMA models by inflating the estimated variance of the series thereby causing the prediction interval to become severely misleading. Recently, Nare, Maposo and Lesaoana, (2012) pointed out that least squares and maximum likelihood methods of ARMA estimation are both sensitive to outliers. Meanwhile, Galeano and Pena (2013) opined that outliers have a strong effect on the model building process for a given time series in that they introduce bias in the model parameter estimate, thus distort the power of statistical tests based on biased estimates and may increase the confidence intervals for the model parameters and consequently influence predictions. Moreover, outliers are of different types, Fox (1972) introduced additive outliers (AO), which affect a single observation and innovative outlier (IO) which affect a single innovation. Also, Tsay (1988) added two new types of outliers: the level shift (LS), which is a change in the level of the series, and the temporary change (TC), which is a change in the level of the series that decreases exponentially.
However, the problems of interest associated with the modeling of outliers in discrete-stochastic series are to identify the locations and types of outliers and estimating the effects of outliers. Outliers detection methods have been proposed by different authors. For instances, Fox (1972) proposed the use of likelihood ratio test statistics for testing for outliers in autoregressive models (see also Galeano and Pena, 2013), Box and Tiao (1975) used intervention models to accommodate the effects of outliers. Tsay (1986) proposed an iterative procedure for identifying outliers, removing their effects and specifying a linear model for the stochastic series. However, Kaya (2010) noted that prior outliers detection methods are powerful when the data contain only one outlier but these methods decrease drastically if more than one outliers are present in the data (see also Hadi, 1992). In addition, there could be difficulties due to masking effects when the series has multiple outliers that occur in patches, especially when they are in the form of additive and level effect. Chen and Liu (1993) proposed a modified iterative procedure to reduce masking effects by jointly estimating the model parameters and the magnitudes of outlier effects (see also Luceno, 1998;Sanchez and Pena, 2010). This modified iterative procedure, according to Battaglia and Orfei (2002) identifies outliers sequentially by searching for the most relevant anomaly, estimating its effect and removing it from the data, estimating again the model parameters on the corrected series, and iterating the process until no significant perturbation is found. Hence, this study focuses on the use of Chen and Liu (1993) approach to detect and model outliers effects in joint estimation with model parameters.

Methodology
A stochastic process is a family of random variables { : = 1,2, … , } where ∈ for ( = 1,2, … , and is a number index. Therefore, it is the random variables sequentially ordered in time. Meanwhile, the realization of a stochastic process is considered as time series (see for example, Ebong, 1998;Moffat, 2007;Akpan, 2016).

Autoregressive Moving Average (ARMA) Process
A natural extension of pure autoregressive and pure moving average processes is the mixed autoregressive moving average processes, which includes the autoregressive and moving average as special cases (Wei, 2006).
A stochastic process { } is an , process if { } is stationary and if for every ,

Autoregressive Integrated Moving Average (ARIMA) Model
Box, Jenkins, and Reinsel (2008) considered the extension of ARMA model in (1) to deal with homogenous nonstationary time series in which is non-stationary but its $ % difference is a stationary ARMA model. Denoting the $ % difference of by where is the nonstationary autoregressive operator such that d of the roots of = 0 are unity and the remainder lie outside the unit circle. & is a stationary autoregressive operator.
Therefore, (2) is called an autoregressive integrated moving average model and can be referred to as an * , $, model (see also Akpan and Moffat, 2016).

Model Selection Criteria
For a given data set, when there are multiple adequate models, the selection criterion is normally based on summary statistics from residuals of a fitted model (Wei, 2006). There are several model selection criteria based on residuals (see Wei, 2006). For the purpose of this study, we consider the well-known Akaike's information criterion (AIC), (Akaike, 1973) where Q = 1 − Q − Q − ⋯ Temporary Changes A time series , …, > affected by the presence of a temporary changes at t = T is given by where S is an exponential decay parameter such that 0 < S < 1. If S tends to zero, the temporary change reduces to an additive outlier, whereas if S tends to 1, the temporary change reduces to a level shift. The temporary change affects the innovations as follows: If Q is close to 1 − S , the effect of temporary change on the innovations is very close to the effect of an innovative outliers. Otherwise, the temporary change can affect several observations with a decreasing effect after t = T (Sanchez and Pena, 2010).
Generally, a time series might contain several, say k outliers of different types and we have the following general outlier model; The outlier effect B _ at t = T is estimated using the least squares method. The least squares estimates for B _`a , B _ b> , B _ cd and B _ >e are respectively given by:  (22) where ƒ " … = 1.483 × {|M̂ − M|} , M̃ is the median of the estimated residuals Chen and Liu (1993).

Data Analysis and Discussion
The realization considered in this paper is the credit to private sector (CPS) denoted as . The period of realization spans from January, 1981 to December, 2014. The plot in Figure 1 indicates that the series is not stationary. To achieve stationarity, we take the first difference of as shown in Figure 2.

Detecting and Adjusting the Effects of Outliers
To examine and detect the presence of outliers in , we iterate through the residuals obtained from the model in equation (23). The following outliers were found to be present in ; IO, AO and TC. The three (3) outliers detected and their estimated effects are presented in Table 4.

Joint Estimation of Model Parameters and Outlier Effects
After adjusting for the outlier effects in the series with the parameters of ARIMA(1,1,1) model are jointly estimated, ARIMA(2,0,1) model with AO (t = 33) and TC (t = 22) is found to fit well to the series. The effect of the IO is not significant and is being removed from the set of detected outliers. Therefore, the adjusted outlier model is presented in equation (24)   From the results, we found that ARIMA(1, 1, 1) model fit well to the original data when the outliers were not considered. Iterating through the residuals of ARIMA(1, 1, 1) model, IO (t = 24), AO (33) and TC (22) outliers were identified. Also, adjusting the series for outliers and carrying out the joint estimation of ARIMA(1, 1, 1) model parameters and outlier effects, it is found that ARIMA(2, 0, 1) model fit well to the adjusted series with effects of AO (33) and TC (22). Therefore, it is clear that the presence of outliers in the series affects the model specification. When the outliers were not considered, ARIMA(1, 1, 1) model was fitted to the series but when the series was adjusted for outliers, ARIMA(2, 0, 1) model was fitted to the series. Moreover, comparing the information criteria, we found that the AIC pertaining to ARIMA(2, 0, 1) model is 618.12 which is less than 625.14, being the AIC pertaining to ARIMA(1, 1, 1) model. This shows that the adjusted outlier model, ARIMA(2, 0, 1) model, fit the series well. Therefore, the fact that ARIMA(2, 0, 1) model is a better model than ARIMA(1, 1, 1) indicates that ARIMA(1, 1, 1) model is mispecified due to the presence of outliers. The implication is that the presence of outliers in discrete-time stochastic series could result in model misspecification, wrong order of integration, substantial biases in parameter estimation, misleading autocorrelation and partial autocorrelation functions, etc.
Furthermore, the results of this work are in tandem with the evidence that the presence of outliers in times series leads to model misspecification, misleading autocorrelation function, partial autocorrelation function and biases in model parameters estimation (Tsay,

Conclusion
The importance of detecting and estimating the effects of outliers can never be overemphasized. According to Battaglia and Orfei (2002), outliers may have a significant impact on the results of standard methodology for time series. Also, the presence of outliers can result in model misspecification, misleading autocorrelation and partial autocorrelation functions, and biases in parameter estimation. Meanwhile, when two or more outliers occur in patches, there is a resultant masking effect that leads to spurious outlier detection. However, Chen and Liu (1993) proposed an iterative procedure method to cushion this masking effect.
In relation to the data considered in this paper, we found that ARIMA(1, 1, 1) model fit well to the original series when outliers were not considered. Using the iterative procedure method, IO (t = 24), AO (t = 33) and TC (t = 22) outliers were detected and the series adjusted. However, ARIMA (2, 0, 1) model was found to fit well to the outlier adjusted series in joint estimation with AO (t = 33) and TC (t = 22) outliers. Therefore, we can deduce that the presence of outliers in the employed series results model misspecification, wrong order of integration, misleading autocorrelation and partial autocorrelation functions, and biases in parameter estimation. Hence, there is need to detect outliers before modeling and analyzing discrete-time stochastic series. Moreover, this study could be extended to cover the effects of outliers on time series forecasting.