American Journal of Theoretical and Applied Statistics
Volume 5, Issue 3, May 2016, Pages: 115-122

Performance of Two Generating Mechanisms in Detection of Outliers in Multivariate Time Series

Olufolabo Olusesan Oluyomi.1, Shittu Olarenwaju Ismail.2, Adepoju Kazeem Adesola.2

1Department of Statistics, Yaba College of Technology, Yaba, Nigeria

2Department of Statistics, University of Ibadan, Ibadan, Nigeria

Email address:

(Olufolabo O. O.)

To cite this article:

Olufolabo Olusesan Oluyomi., Shittu Olarenwaju Ismail., Adepoju Kazeem Adesola. Performance of Two Generating Mechanisms in Detection of Outliers in Multivariate Time Series. American Journal of Theoretical and Applied Statistics. Vol. 5, No. 3, 2016, pp. 115-122. doi: 10.11648/j.ajtas.20160503.16

Received: April 5 2016; Accepted: April 25, 2016; Published: May 10, 2016


Abstract: This work is focused on developing two outlier generating mechanisms for the detection of outliers in the multivariate time series setting that is capable of ameliorating the swamping effect on regular observations in time series data. Specifying two-variable Vector Autoregressive (VAR) models and assuming innovative and multiplicative effect of outliers on time series data, the magnitude and variance of outlier were derived for the generating models by method of least squares. A modified test statistics were also developed to detect single outliers both in the response and explanatory variables. Real and simulated data were used to establish the validity of the models. The results show that the multiplicative is better than the additive model in terms of the number of outliers detected and the residual variance. This result is in line with previous studies in outlier detection in univariate time series.

Keywords: Innovative Outlier, Additive Outlier, Multiplicative Outlier, Vector Auto Regressive


1. Introduction

In time series or any classical data, it has been established that outliers do cause biases in parameter estimation as well as model misspecification, and poor forecast performance to misleading conclusion. For this reason, several outlier detection techniques and robust estimation procedures have been proposed in the literature for univariate time series analysis but however very limited for multivariate time series.

"An early and detailed examination of detection of outliers in stationary univariate time series was done by Fox [18]". Ever since, a quiet number of literature have been dedicated to the study of impact of outliers in univariate time series. Some of the authors include; Denby and Martin [15], Pena [33], Tsay [45], Chang, Tiao and Chan [11] in which they all use iterative procedure for the detection of outliers. R. Baragona, F. Battaglia and D. Cucina [5] "proposed Identification and estimation of outliers in time series by using empirical likelihood methods." Theory and applications are developed for stationary autoregressive models with outliers distinguished in the usual additive and innovation types. Pena and Maravall [34] considered the case of when the model is known and when it is unknown alongside the effect of missing data linked with outlier. Chan and Liu [13], McCulloch and Tsay [29]. Le Martin and Raftery [25] and Luceno [28] used the method based on robust Bayes factors in the consideration of additive outliers.

However, Justel, Pena and Tsay [47], in their paper, "proposes a procedure to detect patches of outliers in an autoregressive process". The procedure is an improvement over the existing detection methods via Gibbs sampling. It was shown that the standard outlier detection via Gibbs sampling may become extremely inefficient in the presence of sever outlier.

Shittu [41], in his work, considered two additional outlier generating models, which are Multiplicative and Convolution and concluded that Convolution model preforms more efficiently than all other single outlier generating models. Ji. Yanjie, D. Tang, A. Gou, P. T. Blythe and G. Reu [48] in their work "introduced outlier mining and nonparametric detection methods for detecting and analyzing outlier in available parking space data sets. The technique was able to detect Additive and Innovative outlier simultaneously".

Shittu and Sangodoyin [40], considered the identification of outliers in frequency domain using the spectral method.

The above and other literature shows that not much work has been done on outlier detection in multivariate time series. Among available works on multivariate outlier detection in time series is the projection pursuit techniques used by Galeano, Pefia and Tsay [19] to find the linear combination of a multivariate time series that maximizes kurtosis with the purpose of best reproducing the outlying signal. Detection of time points of outliers and estimating its magnitudes were accomplished by employing univariate searching methods.

Baragona and Battaglia [4] proposed the Independent Component Analysis (ICA) as a tool for identifying the locations of multiple outliers in multivariate time series. The ICA was therefore used at identifying a set of independent unobservable variables that are supposed to generate the data set of interest. An unknown mixing matrix was postulated to linearly transform the unobservable variables to produce a set of observable mixed ones. Both unobservable variables and the mixing matrix have to be estimated from the data. ICA has been applied successfully to a variety of fields such as biomedicine, speech, and radar, signal processing and time series.

In their own work, Cucina, Salvatore and Protopapas [14], used meta-heuristic methods to detect additive outliers in multivariate time series. The implemented algorithms were; simulated annealing, threshold accepting and two different versions of genetic algorithm. They used the same objective function, the generalized AIC-like criterion, and in contrast with many of the existing methods, they do not require specifying a vector auto regressive moving averages model for the data and are able to detect any number of potential outliers simultaneously. They concluded that "almost all available methods for outlier detection are iterative, but the difference with respect to the meta-heuristic algorithms is that it seems to be able to provide more flexibility and adaptation to the outlier detection problem".

Furthermore, Robert and Cleroux, [44] in their own work, introduced the coefficient of vector autocorrelation, obtained its influence function together with its distribution, and used it for testing the hypothesis of presence of outliers.

Barnett and Lewis, [7] and Shittu, [41] emphasis on the challenges in outlier analysis; namely smearing and masking. These concepts are related to the detection of outliers in statistical data and can even be intertwined to complicate the situation even further. Smearing (popularly known as swamping in the literature of outlier identification in statistical data is talked of when one outlier affects the series in a manner that makes the other observations appear to be outliers as well even when they are actually not. Conversely, masking occurs when one outlier tends to hide the others from being identified. It is generally believed that these notions are closely connected to specific outlier detection methods and not properties of data itself and smearing and masking are only deficiencies of certain methods, not types of outliers as such.

As a result of the effect of both Additive and Innovative outliers on the estimates of parameters, Shittu [41] introduced the Convolution Outlier (CO) and Multiplicative Outlier (MO) models in univariate time series. To this effect Multiplicative and Innovative generating mechanisms outlier will be extended to multivariate time series in this paper with a view to comparing their performance in terms of parameter estimates and outlier detection capabilities.

2. Derivation of the Models

In this section, by assuming that outliers have either Innovative or Multiplicative effect on a series for bivariate time series. The estimate of the parameter shall be derived and the corresponding test statistics developed.

2.1. Innovative Outlier Model

An Innovative Outlier (IO) represents an unexpected change in the innovations that drive the vector time series. Suppose that the noise in a bivariate series consisting of oven temperature and a chemical concentration reading is mainly due to the random variation of the feed rate. Then a sudden change in the feed rate that happens at just a particular time point, due to some exogenous effect, will produce an IO in the series.

The innovative outlier-generating model for univariate series is defined as:

(1)

with the unobservable free series given by

(2)

where ~ (0, ,  and

(3)

where = (x1t, …, xkt) is a k-dimensional time series, Zt is an outlier free time series that is assumed to be ARIMA (p, q),  is a time indictor for outliers such that  for all  otherwise,  = 1- Ө1B- Ө2 B2... – Өp Bp are polynomials of order p and  represent the size of the magnitude of outliers.

Now, given a vector model  and  such that  contains outlier and  is outlier free, the magnitude of such outlier and its corresponding variance can be obtained by specifying the bivariate VAR (2, 2) as:

(4)

(5)

Where;  is the current value of the response variable

 is the lag value of the current variable

 is the current value of the explanatory variable

 is the lag value of the explanatory variable

Now, when  contains an outlier,

Then

(6)

Substituting (6) into (4), to have:

(7)

recall that

(8)

We then have

(9)

assuming = ; when

therefore

Using the least squares method to obtain

(10)

Since  is a time indictor where  for all  otherwise, we have

(11)

Therefore, the estimator of the magnitude outlier for IO is

(12)

Its variance is

(13)

Therefore

(14)

Having obtained the estimate and its corresponding variance, we then construct the test statistic for innovative model as

(15)

(16)

2.2. Multiplicative Outlier Model

Since outlier may have multiplicative interaction effect on a series (Shittu, 2003), there is need to develop the outlier generating model.

The multiplicative outlier model is defined as:

(17)

again using

 as defined in equation (4)

with the outlier free series

linearize (17) by taking the logarithm to have

(18)

Let ,  and

Therefore,

(19)

since,

then

(20)

If we let   and =

Then we have

(21)

By sum of squares of  we have:

(22)

Differentiating equation (32) with respect to  and equating to zero, to get

(23)

(24)

Setting =

(25)

recall that  in the presence of outlier, we have

(26)

The variance of  is

(27)

Hence the test statistic is defined as:

(28)

(29)

Table 1. Summary of Estimates and Test Statistic for the two models when  contains outlier.

3. Analysis of Data

From the derived outlier generating mechanisms in section two and with the estimation of the magnitudes of outliers and their variances, the test statistics constructed will be used to detect the existence of outliers in both the generated series and real data.

Simulation data of varying sample sizes of 10, 50, and 100 will be used to evaluate the performance of the derived models, while data of commercial bank deposits and loans from Nigerian commercial banks extracted from the Annual Statistical Bulletin of the Central bank of

Nigeria, 2011 will also be used to establish the validity of the developed models.

Statistical software R3.0.1 is used to analysing the data. The results and outcome for the two models i.e. Innovative and Multiplicative models are summarised below.

3.1. Analysis of Simulated Data When X1t Contains Outliers

The results of the two models in terms of their outlier detection performance from simulated data are tabulated below.

Table 2. Summary of Result on Detection Rate of the Models on Simulated Data when X1t contains outlier.

n=10 n=50 n=100
Model Type No of outliers injected No of outliers % of outliers No of outliers injected No of outliers % of outliers No of outliers injected No of outliers % of outliers
Detected detected detected detected detected detected
Innovative 2 0 0% 5 2 40% 8 2 25%
Multiplicative 2 2 100% 5 4 80% 8 5 80%

As shown in Table 2 above, the multiplicative model is more sensitive to outlying observations than the innovative model for different sample sizes.

3.2. Detection of Outlier in Real Data

In order to investigate the performance of the derived models, a pair of real data on Deposit and Loan of banks in Nigeria obtained from the Annual statistical bulletin of the Central bank of Nigeria, 2011 were used.

Here two cases are considered. The first case is when loan is contaminated. The vector autoregressive model is given as

where  is the current value of deposit,  is the immediate past value of deposit, and is the immediate past value of loan granted.

The estimated VAR model via the use of statistical package R is given as:

= 0.4826  –– 0.1579

s.e (0.1836) (0.1561)

t (2.628) (–1.012)

P-value (0.0142) (0.3210)

The second case is when Deposit is contaminated.

Then, the vector autoregressive model is given as

where  is the current value of loan, is the immediate past value of loan and  is the immediate past value of deposit.

The estimated VAR model

s.e (0.1712) (0.2015)

t (5.610) (–1.657)

P (6.78e.06) (0.1095).

Table 3. Detection Performance of Innovative Model on Deposit and Loan Data when Loan when Loan is contaminated.

D = Outlier detected

ND = No outlier detected

The critical value (c) = 4.

Table 4. Detection Performance of Multiplicative Model on Deposit and Loan Data when Loan is contaminated.

It could be deduced from the Table 4 above that no outlier was detected for multiplicative as a result of non-multiplicative nature of the data.

4. Discussion of Results

Results obtained from the simulated data with varying sample sizes (from small, moderate to large sample) of 10, 50, and100 gave an average detection rates for Innovative Outlier model (IO) and Multiplicative Outlier model (MO) as (0% and 100%), (40% and 80%) and (25% and 80%) respectively for sample sizes 10,50 and 100 for the injected outliers. However, as the sample size increases, MO was found to be most sensitive to outliers considering the simulated data sets.

For the real data set of Deposit and Loan, 5 pairs of observations were identified as outliers by IO, however, MO could not identify any outlier as a result of non-multiplicative nature of the data.

Considering the two-outlier detection models, MO has been found to be most efficient with minimum standard error of the estimate and is therefore recommended for outlier detection in multivariate time series data.

5. Conclusion

This paper introduced outlier generating mechanism in multivariate time series using VAR. It also developed test statistic for detecting outliers assuming two different nature of outliers, the innovative and multivariate models. The test statistics were derived for each generating mechanism. Attempts were made also to unravel the model with greatest detective power in terms of relative efficiency and their sensitivity to outliers by applying the models to both simulated and real data. All these were achieved using theoretical and analytical means. The multiplicative model was found to be more sensitive to outlier detection but its ability to detect outliers in real data depends heavily on the nature of the series (whether the bound is multiplicative or not)

This work can be further extended to the frequency domain since this work is limited to time domain.


References

  1. A. Arribas-Gil, and J. Romo, "Shape Outlier Detection and Visualization for Functional Data: the Outliergram," Biostatistics, 15, 603-619, 2014.
  2. Z. Azami, A. Ibrahim, and S Mohd, S, "Detection Procedure for a Single Additive Outlier in Bilinear Model" Journal of Pak. Stat. Oper. Res. 1, 1-5, 2007.
  3. R. Baragona, and F. Battaglia, "Outlier Detection in Multivariate Time Series by Independent Component Analysis". Neutral Computation, 19: 1962-1984, 2007.
  4. R. Baragona, F. Battaglia and D. Cucina,“Empirical Likelihood for Outlier Detection and Estimation in Autoregressive Time Series" Journal of Time Series Analysis Vol 37, 3, 315–336, 2015.
  5. R. Baragona, F. Battaglia, and T. Calzini, "Genetic Algorithms for the Identification of Additive and Innovational Outliers in Time Series.’’ Computational Statistics and Data Analysis 30, 147, 2001.
  6. V. Barnett, and T. Lewis, Outlier in Statistical Data, John Wiley & Sons U. K, 1995.
  7. G. E. P. Box, G. M. Jenkins, and G. Reinsel, Time Series Analysis: Forecasting and Control, 3rd Ed., New Jersey:' Prentice-Hall, 1994.
  8. K. Chaloner and R. Brant, "A Bayesian Approach to Outlier Detection and Residual Analysis," Biometrika, 25, 651–660, 1988.
  9. I. Chang, "Outlier in time series". Technical Report, Department of statistics, University of Wisconsin, 1982.
  10. I. Chang, G. D. Tiao, and C. Chen, "Estimation of Time Series Parameters in the Presence of Outliers. Technometrics," 3, 193. 204, 1988.
  11. S. Chattfield, The Analysis of Time series an introduction, New York, Charian and Hall, 1980.
  12. C. Chen, and L. M. Liu, "Joint Estimation of Model Parameters and Outlier effects in Time Series". Journal of the American Statistical Association, 88, 284–297, 1993.
  13. D. Cucina, A. Di Salvatore, and M. Protopapas, "Meta-heuristic Methods for Outliers Detection in Multivariate Time Series.’’ Comisef working paper series, 003, 270, 2008.
  14. L. Denby, and R. D. Martin, "Robust Estimation of the First Order Autoregressive Parameter," Journal of the American Statistical Association, 74, 140-146. 1979.
  15. W. Enders, Applied Econometric Time Series, 2nd Edition, John Wiley & Sons, ISBN 0-471-23065-0, 2003.
  16. M. Forni, and I. Reichlin, "Let's Get Real: A Dynamic Factor Analytical Approach to Disaggregated Business Cycle". Review of Economic Studies, 65, 453/474, 1998.
  17. A. J. Fox, "Outliers in Time Series". Journal of the Royal Statistical Society. B34: 350 – 363, 1972.
  18. P. Galeano, D. Pena, and R. S. Tsay, "Outlier Detection in Multivariate Time Series Via Projection Pursuit". Working paper 0-42. Statistics and Econometrics Series II, Dept. De Estadistica, Universidad Carlos III de Madrid, 2004.
  19. I. Georgiev, "A Factor Model for Innovational Outliers in Multivariate Time Series", ICEE, First Italian Congress of Econometrics and Empirical Economics, Venice, 24-25, 2005.
  20. A. Justel, D. Pena, and R. S. Tsay, "Detection of Outlier Patches in Autoregressive Time Series," Statistica Sinica, 11, 651–673. 2000.
  21. A. Kaya, "An Investigation: The Analysis of Outliers in Time Series". PhD Thesis. Dokuz Fylit Unversity, Izmir, Turkey, 1999.
  22. A. Kaya, "Modelling Outlier Factors in Data Analysis", (advances in Information Systems), LNCS 3261, 88–95, 2010.
  23. A. Khattree and D. N. Naik, "Detection of Outliers in Bivariate Time Series Data". Communications in Statistics – Theory and Methods, 16 (12): 3701–3714, 1987.
  24. N. D. Le, A. E. Raftery, and R. D. Martin, R. D. "Robust Order Selection in Autoregressive Models Using Robust Bayes Factors". Journal of the American Statistical Association, 91, 123-131. 1996a.
  25. G. M. Ljung, "On Outlier Detection in Time Series," J. R. Statist. Soc. B. 55 No. 2, 559-567, 1993.
  26. A. Luceno. Detecting Possibly Non-Consecutive Outliers in Industrial Time Series.
  27. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 60 (2): 295–310, 1998.
  28. H. Lutkepohl, New Introduction to Multiple Time Series Analysis, Springer, Berlin, 2005.
  29. R. E McCulloch, and R. S. Tsay, "Bayesian Analysis of Autoregressive Time Series Via the Gibbs sampler", Journal of Time Series Analysis 15, 235–50, 1994.
  30. C. R. Nelson, and C. I. Plosser, "Trends and Random Walks in Macroeconomic Time Series," Journal of Monetary Economics, 10, 139–162, 1982.
  31. D. Olivier, and C. Amelie, "The Impact of Outliers on Transitory and Permanent Components in Macroeconomic Series". Economic Bulleting, Vol. 3, No 60 PP 1–9, 2008.
  32. A. Pankratz, "Detecting and Treating Outliers in Dynamic Regression Models," Biometrika, 80, 84'7-54, 1993.
  33. D. Pena, and G. E. D. Box, "Identifying a Simplifying Structure in Time Series", Journal of the American Statistical Association, 82, 836-843, 1987.
  34. D. Pena, and A. Maravall, "Interpolations, Outliers and Inverse Autocorre­lations", Communications in Statistics, Theory and Methods 20, 3175-86. 1991.
  35. B. Rolin, "Comparing Classical and Resistant Outlier Rules," Journal of American Stat. Ass. Vol. 412 pp. 1083–1090, 1990.
  36. B. Rosner, "On the Detection of Many Outliers," Technometrics. 17, 221–227, 1995.
  37. S. Ruey and R. S. Tsay, "Outliers, Level Shifts, and Variance Changes in Time Series. Journal of Forecasting, Vol. 7, I-20 Department of Statistics, Carnegie Mellon University, U.S.A, 1988.
  38. M. J. Sanchez, D and Pena, D. "The Identification of Multiple Outliers in ARIMA Models," Communications in Statistics, Part A—Theory and Methods, 32, 1265–1287. 2003.
  39. D. K. Shangodoyin, "On the Specification of Time Series Models in the Presence of Aberrant Observations," Ph.D Thesis in the Dept. of Statistics, Univ. of Ibadan, 1994.
  40. I. O. Shittu, and D. K. Shangodoyin, "Detection of Outliers in Time Series Data: A Frequency Domain Approach’’ Assian Journal of Scientific Research 1, (2) 130-137, 2008.
  41. I. O. Shittu, "On Performance of Some Generating Models in Detection of Outliers Under Classical Rule’’ Mphil Thesis Dept. of Statistics, Univ. of Ibadan, 2000.
  42. C. Sims, "Macroeconomics and reality" Econometricsa 48 (1), JSTOR 112017, 1980.
  43. S. Sridevi, S. Abirami, and S. Rajaram, "Detecting and Revamping of X-Outliers in Time Series Database," International Journal of Computer Applications 60 (19): 28-33, 2012.
  44. C. Robert and J. Helbling, "On Outlier Detection in Multivariate Time Series" Acta Mathematica Vietnamica, 34, 1, 19-26, 2009.
  45. R. S. Tsay, "Time Series Model Specification in the Presence of Outlier". Jour. Amer. Stat. Asso. 81, 132–141, 1986.
  46. R. S. Tsay, "Outliers, Level Shifts and Variance changes in Time Series," Journal of Forecasting, 7, 1-20, 1988.
  47. R. S. Tsay, D. Pena, and A. E. Pankratz, "Outliers in Multivariate Time Series. Biometrika, 87, 789-804, 2000
  48. Ji. Yanjie, D. Tang, A. Gou, P. T. Blythe and G. Reu, "Detection of Outliers in a Time Series of Available Parking Spaces," Mathematical Problems in Engineering. Volume 2013: 1-12, 2013.

Article Tools
  Abstract
  PDF(267K)
Follow on us
ADDRESS
Science Publishing Group
548 FASHION AVENUE
NEW YORK, NY 10018
U.S.A.
Tel: (001)347-688-8931