Studying Changes on Stock Market Transactions Using Different Techniques for Multivariate Time Series

There are many studies dealt with univariate time series data, but the analysis of multivariate time series are rarely discussed. This article discusses the theoretical and numerical aspects of different techniques that analyze the multivariate time series data. These techniques are ANN, ARIMA, GLM and VARS models. All techniques are used to analyze the data that obtained from Egypt Stock Exchange Market. R program with many packages are used. These packages are the "neuralnet, nnet, forecast, MTS and vars". The process of measuring the accuracy of forecasting are investigated using the measures ME, ACF, MAE, MPE, RMSE, MASE, and MAPE. This is done for seasonal and non-seasonal time series data. Best ARIMA model with minimum error is constructed and tested. The lags order of the model are identified. Granger test for causality indicated that Exchange rate is useful for forecasting another time series. Also, the Instant test indicated that there is instantaneous causality between Exchange rate and other time series. For non-seasonal data, the NNAR() model is equivalent to ARIMA() model. Also, for seasonal data, the NNAR(p,P,0)[m] model is equivalent to an ARIMA(p,0,0)(P,0,0)[m] model. For these data, we concluded that the ANN and GLMs of fitting multivariate seasonal time series is better than multivariate non-seasonal time series. The transactions of Finance, Household and Chemicals sectors are significant for Exchange rate in non-seasonal time series case. The forecasts that based on stationary time series data are more smooth and accurate. VARS model is more accurate rather than VAR model for ARIMA (0,0,1). Forecasts of VAR values are predicted over short horizon, because the prediction over long horizon becomes unreliable or uniform.


Introduction
Time series analysis is one of the most important processes that many companies and even many countries need. These companies or countries need to forecast the behavior of some phenomenon in the future. Not only for univariate time series analysis but also for multivariate time series analysis. Artificial neural networks (ANN) have become one of the most important methods of artificial intelligence in the processes of forecasting, and given that many recent articles do not deal much with the processes of multivariate analysis, whether by the autoregressive integrated moving average (ARIMA) models or ANNs models. We will combine both methods to forecast multivariate time series about applications are based on real data using some of R program packages. In addition to these two methods, we will use both the vector autoregressive models (VARS) and the generalized linear models (GLMs) for multivariate time series analysis, and then try to find which one of these methods is better.
The predictors form the ANN bottom layer, and the forecasts form the top layer. Intermediate layers containing hidden neurons. The simplest networks contain no hidden layers, this is equivalent to linear regressions. The coefficients attached to these predictors called weights. The weights are selected in the NN framework using a learning algorithm that minimizes the costs. The number of nodes in each hidden layer must be specified. We can consider ANN as a nonlinear statistical data. Complex relations between inputs and outputs are happened. ANNs have been applied to numerous applications in many field including pharmaceutical research, engineering and medicinal chemistry. ANNs were used in drug discovery. It allows the estimation of some non-linear models without need to define an accurate functional. There are many ancient and recent researches and articles have presented the comparisons between the ARIMAs and the ANNs, in the process of forecasting the univariate time series.
Intrator and Intrator [1] used NN for interpretation of nonlinear models. Zhang et al. [2] introduced the forecasting with ANNs. A comparison between NN and Box-Jenkins forecasting techniques with application to real data devoted by Al-Shawadfi [3]. Hothorn et al. [4] have designed and analysis of benchmark experiments. LiHong et al. [5] devoted the NNs based drug discovery approach and its application for designing aldose reductase inhibitors. Zou and Zhou [6] presented QSAR study of oxazolidinone antibacterial agents using ANNs. Eugster et al. [7] introduced exploratory and inferential analysis of benchmark experiments. Kose [8] presented modelling of color perception of different age groups using ANNs. Al-Shawadfi and Hagag [9] have suggested that the ANN approach may provide a superior alternative to the Box-Jenkins forecasting approach for developing forecasting models in situations that do not require modeling of the internal structure of the series. The numerical results showed that the approach has a good performance for the forecasting of ARMAX models. Rostampour et al. [10] used an ANN for prediction of apple bruise damage. Doreswamy and Chanabasayya [11] presented a performance analysis of NN models for oxazolines and oxazoles derivatives descriptor dataset. Hanjouri and Qamar [12] have devoted two methods for analyzing and forecasting time series data. The two methods aimed to compare Box and Jenkins models, and the ANN to forecast global sugar prices. The analysis of sugar price data reversed the superiority of the ARIMA time series model and gave more accurate predictions than ANNs. This paper discusses the theoretical aspects of different techniques and models that analysis time series data and will concentrate on the multivariate time series data. These techniques are ANN, ARIMA, GLM and VARS models. All models are used to analyze the multivariate time series data that obtained from Egypt Stock Exchange Market. The R program with many packages and functions are used.
This paper can be organized as follow: Section 2 presents some materials, algorithms and models that deal with multivariate ordinary, time series, seasonal data. Section 3 presents the numeric analysis section divided into some subsections devoted the features of dataset, and the results of analysis dataset using the previous methods. Section 4 presents the discussion of obtained results for all models. Section 5 presents the conclusions of this article

Materials, Algorithms and Models
In this section, we will refer to some materials, algorithms and models that are used in this paper:

ARIM Models
ARIMA model is a generalization of ARMA model. Both of them fitted to time series data either to better understand or forecast. ARIMA models are applied where there non-stationary case. Non-seasonal ARIMA models denoted ARIMA(p,d,q), where p is an order of the autoregressive model, d is the degree of differencing, and q is the order of the MA model. An order p and q can be determined using the sample autocorrelation function (ACF), partial autocorrelation function (PACF). Other alternative methods like AIC, BIC, etc. are used to determine an order of a non-seasonal ARIMA model. Seasonal ARIMA models are denoted ARIMA(p,d,q)(P,D,Q)[m], where m refers to the number of periods in each season. P,D,Q refer to the autoregressive, differencing, and moving average terms for the seasonal part of the ARIMA model. ARIMA(1,0,0)represents AR(1), ARIMA represents I(1), and ARIMA(0,0,1) represents MA(1). ARIMA models can be estimated according to Box-Jenkins technique. VARIMA model may be suitable if multiple time series are used then. Sometimes a seasonal effect on the model, SARIMA (Seasonal ARIMA) model is better to use. The non-stationary ARIMA model can be written as: Where t is an integer index, L is the lag operator, and the X t are time series data, and d-dimensional multivariate time series.
While the stationary ARIMA cab be written as: ϕ are the parameters of AR part, i θ are the parameters of MA part, and t ε are error terms. The t ε are generally assumed to be "iid" variables from normal distribution with zero mean. Useful criterion is AIC: Where L is the likelihood function, p is the order of AR part and q is the order of MA part, and k represents the intercept of the ARIMA model.

2(
)( 1) AICc AIC 1 p q k p q k T p q k The aim is minimizing AIC, AICc or BIC values for a good model. The lower value, the better model. AIC tries to approximate models towards the reality. BIC attempts to find the perfect fit. BIC is useful method for selection models having more parameters. AICc is used to compare ARIMAs with same orders differencing. Also, the root of mean squared error (RMSE) can be used for comparing ARIMAs with different orders differencing. [13][14][15][16][17]

Artificial Neural Networks (ANNs)
ANN consists of Input layers, Hidden layers, and Output layers. A first layer of ANN receives the raw input, processes it, using Back-propagation algorithm, and passes the processed information to the hidden layers. The hidden layer passes the information to the last layer, which produces the output. ANN is characterized by an activation function. The activation function is often defined by the range [-1,1] or [0,1], and that function may be is linear or nonlinear, and can be in several shapes like: (1). 2 2 ( ) 1 1 e x g x = − + . (2).
A perceptron (single layer ANN) receives multidimensional input and processes it using a weighted summation and an activation function. A major limitation of perceptron model is its inability to deal with non-linearity. A multilayered ANN overcomes this limitation and helps to solve non-linear problems. Figure 1 displays the simple ANN structure: The most basic type of ANN is called Feed-forward, an information flow in only one direction. A unit used to send information to another unit that does not receive any information. A second type is Feed-back, the information can flow in multiple directions. Feed-back ANN allows Feed-back loops. The Back-propagation is used to calculate the error at the output unit. This error is back-propagated to all the units such that the error at each unit is proportional to its contribution towards the total error. The errors at each unit are used to optimize the weight at each connection. We can use linear regression as a more efficient method of training the model. If we add an intermediate layer with hidden neurons, the ANN becomes non-linear. This is known as a multilayer Feed-Forward ANN, where each layer of nodes receives inputs from the previous layers. The outputs of the nodes in one layer are inputs to the next layer. The inputs to each node are combined using a weighted linear combination. The result is modified by a nonlinear function before being output.
When we train ANN, we divide dataset into three sets. Training set, Validation dataset and Test set. Training set is used to find the relationship between dependent and independent variables. The test set assesses the performance of model. The experimental performance of set of ANN models are estimated, compared, and ordered.. The commonly cross validation (CV) technique is that the k-fold cross validation. This method can be referred as a resampling process. Every data point gets a chance to be in test set and training set, thus this method reduces the dependence on test-training split and reduces the variance of performance. The extreme case of k-fold CV will occur when k is equal to number of data points.
We have evaluated our ANN model using the residual methods such as RMSE for the test set. [18][19][20][21]

Packages of ANN and ARIMA Time Series Models
There are many packages allow us to compute ANN models. We can apply the algorithm of ANN regression model using the "neuralnet" package after scaling the data, and splitting it to train and test sets. Train ANNs using the Back-propagation, Resilient Back-propagation with or without weight back-tracking, or using the globally version.
The ("forecast","nnet") packages are used for ANN time series forecasting using Feed-Forward ANN with a single hidden layer and lagged inputs for forecasting univariate time series. The "nnetar" function fits the model. A total of repeats networks are fitted. For non-seasonal data, the fitted model is denoted as an NNAR(p,0,k) model, where k is the number of hidden nodes. This is analogous to AR(p) model but with non-linear functions. The default is the optimal number of lags according to the AIC for linear AR(p) model. For seasonal data, the fitted model is called an NNAR(p,P,k)[m] model, which is analogous to an ARIMA(p,0,0)(P,0,0)[m] model but with nonlinear functions.
With seasonal data, we add the last observed values from the same season as inputs.

VARS Models
The vector autoregressive models (VARS) are used for multivariate time series. Each variable is a linear function of the past lags of itself and the past lags of other variables. Suppose that three different time series variables, denoted by ( Each variable is a linear function of the lag 1 values for all variables in the set. In a VAR(2) model, the lag 2 values for all variables are added to the right sides of the equations, In the case of three variables (time series), there would be 6 predictors on the right side of each equation: 3 lags 1 terms and 3 lags 2 terms. In general, for VAR(p) model, the first p lags of each variable in the system would be used as regression predictors for each variable. VAR models are a specific case of more general VARMA models. VARMA models for multivariate time series include the VAR structure above along with moving average terms for each variable. These are special cases of ARMAX models that allow for the addition of other predictors that are outside the multivariate dataset. Here, we will fit the model of the form:

Accurate Measures (AMs)
There are some measures of model accuracy that reflects how much the predicted values close to the actual observed values. The measure ME refer to Mean Error, MAE [29][30][31][32][33] The expressions of all measures are given below: and are observed and predicted vâ l s ue .
i i y y

Numeric Analysis
ANNs are used to solve many artificial intelligence problems. They often better than the traditional machine learning models because they deal with non-linearity variable relations, and customization. In this section, first we specify the used dataset indicate its features and decompose the periodic time series. Then we apply the technique of ANN regression model using the "neuralnet" package after scaling (normalizing) the original data and test the ANN predictions. Then, we deal with the same dataset as a time series and apply the nnetar() function to fitting the multivariate time series and explain predictions accuracy. After that we decompose the seasonal data and applied the same function on the seasonal data and also explain the predictions accuracy. Finally, we apply the GLM on the multivariate time series and seasonal data and got some comparable results. In the other way we discuss the ARIMA analysis beginning with checking the stationary of time series and transform it to stationary time series using the difference way. Finally, we use the VARS modeling via determining the maximum lags according to many criteria and test the causality and instantly of the best ARIMA model, and forecast the VAR model.

Dataset Features
In this study, the selected data are explained the average monthly Exchange Rate values, and the related monthly transactions values for some sectors in the Egypt Stock Exchange Market, during the period from January 2015 to February 2019. Therefore, we divided these data into 13 variables, the first one is an independent variable (represents the average monthly Exchange rate), and the remaining 12 variables (represent the monthly transactions values' sectors) that are dependent variables (not as a group). These sectors are: Communications, Financial services (excluding Banks), Real Estates, Tourism and Entertainment, Construction and Building, Household and Personal Products, Services Industrial Products and Cars, Food and Beverages, Banks, Healthcare and Medicines, Basic Resources, and Chemicals. The aim of this study is to know whether or not there is a statistically significant effect of the Exchange rate on the transactions values of the sectors in the Egypt Stock Exchange.
The plots of multivariate time series data can be displayed in Figure 2: We can construct the seasonal, trend and remainder time series (for example; Exchange rate), in Figure 3.

ANN of Multivariate Non-time Series Data
The "neuralnet" package, and the "neuralnet" function are used to construct the ANN and fit the multivariate non-time series data. Now we fit a ANN on our data. The first step is to scale the real dataset. Using unscaled may lead to meaningless results. The common techniques to scale data are: min-max normalization, Z-score normalization, median and MAD, and tan-h estimators. We scaled (normalized) all data to get reasonable results since error = 18264160, and the reached threshold =0.000948. In addition, we discuss the results for the original data and the data as a seasonal data. We can split the data to 70% as the train data and 30% as test data. We used the "neuralnet" function to fit the model that the inputs represented by Exchange rate and the outputs the transactions' sectors. Figure 4 displays the ANN fitted model for multivariate scaled data.   These values reflects the model is accurate.

ANN of Non-Seasonal Multivariate Time Series Data
In this subsection, we used the "nnet" package to fit the time series data for all variables. We used the Exchange rate as a x-regressor time series. And used the "nnetar()" function for fitting all time series (12 variables as a time series) separately.
For accuracy model, we can present the next measures' values in Table 3:

ANN of Multivariate Seasonal Time Series Data
Here we decompose the time series data to get the seasonal data. Accuracy values for seasonal data presented in Table 4: The AMs values of seasonal time series Exchange rate, reset time series are x-regressors, are lower than the AMs values for non-seasonal time series Exchange rate. This indicates that the ANN model of seasonal time series is better. Table 5 presents the results of GLM fitting for non-seasonal Exchange rate:  Table 6 presents the results of GLM fitting for seasonal time series Exchange rate:  Comparing between the residual deviance and AIC for the seasonal and non-seasonal time series Exchange rate forecasting, we found that it is lower for seasonal data. This indicate that GLM model is better for seasonal time series Exchange rate. However, the parameter estimates of Finance, Household and Chemicals sectors are significant for Exchange rate in non-seasonal time series case. Both of them contains the significance intercept of the model. Figure 8 displays the first three forecasts values for seasonal time series Exchange rate: Exchange rate seasonal forecasts 0.29099 0.29038 0.0639

ARIMA of Multivariate Time Series Data
For the time series data, we can use Adjusted Dickey Fuller (ADF) test for all testing the stationarity of these data using significance level 5%. The results can be presented in Table 7: Multivariate Time Series The ADF test indicates that the multivariate time series are non-stationary under significance level 5%. The time series should be stationary, so we differenced the whole time series to make it stationary. After differencing the time series, P-values are become 0.01, then all time series are become stationary. Figure 9 displays the plots of stationary multivariate time series: Using the function "auto.arima()" for stationary time series, we got the models of ARIMA in Table 8: Exchange rate time series with the best model of ARIMA(0,0,1) with zero mean, we have the measures in Table  9: To indicate the importance of stationary process for time series let us explain the plots for Exchange rate time series that effect of selection of the best model of ARIMA analysis.
Plot of non-stationary time series Exchange rate Autocorrelation and Partial autocorrelation Functions, ACF and PACF respectively, to determine the order of models. Figure 10 displays the ACF and PACF for Non-stationary time series Exchange rate: Best Arima Model (0,0,1) with zero mean Plots 10,11 of ACF and PACF measures for Exchange rate time series are explained that: the forecasting that based on stationary time series data, are more smooth and accurate.

VARS of Multivariate Time Series Data
We will use two different functions "VAR()", "vars::VAR()" to lag order identification, from two different packages in R "vars" and "MTS" respectively. Both functions are quite similar to each other but differ in their outputs. To identify the lag order for the VAR model, we obtained the estimations and its standard errors in Table 10: The "vars::VAR" function is a more powerful and convenient to identify the correct lag order, as shown for its results in Table 11: We use a predict() function to forecast (over a short horizon) of VAR values, because the prediction (over long horizon) becomes unreliable or uniform. Table 12 presents the first three forecasting values for VARS:

Discussions
From the previous results, we can summarize these results as below: (1) We scaled (normalize) all non-time series data to get reasonable results since the error achieved a too high value = 18264160, and the reached threshold =0.0009480014.
( (4) For using the GLM model for forecasting the time series Exchange rate, we found: Comparing between the residual deviance and AIC for the seasonal and non-seasonal time series Exchange rate forecasting, we found that it is lower for seasonal data. This indicate that GLM model is better for seasonal time series Exchange rate. However, the parameter estimates of Finance, Household and Chemicals sectors are significant for Exchange rate in non-seasonal time series case. Both of them contains the significance intercept of the model. (5) For using the ARIMA model for forecasting the multivariate time series data, we found: As shown the ADF test indicates that the multivariate time series are non-stationary under significance level 5%. The time series should be stationary, so we differenced the whole time series to make it stationary. After differencing the time series, P-values are become 0.01, then all time series are become stationary. All measures of accurate indicate that ARIMA(0,0,1) model is the best model, because all measures are achieved the lowest values. Figures 10, 11, that display ACF and PACF measures for Exchange rate time series are explained that the forecasting that based on stationary time series are more smooth and accurate.
(6) For using the VARS models for forecasting the multivariate time series data, we found: Comparing between the results of VARS models, we found that the model VARS is more accurate for ARIMA (0,0,1) model. This is because the results of AIC and standard residuals of estimations for VARS model is lower than VAR model.

Conclusions
This article discusses the theoretical and numerical aspects of different models that forecasting the multivariate time series data, whether these data are seasonal or not. These methods are ANN, ARIMA, GLM and VARS. All models are used to analyze the data that obtained from Egypt Stock Exchange for the Exchange rate, and the related transactions of twelve sectors. During the period from Jan 2015 to Feb 2019. We displayed the features of these data graphically, and also displayed the periodic features of these data. The ANN are to analyze these data in the Non-time series matter using the "neural" packages, after scaling these data to be reasonable and get good fitting results. We displayed the network of fitting process that considered the inputs is Exchange rate, and the related sectors' transactions are outputs indicating the weights of the net using one hidden layer. The ANN model is constructed, and the accuracy values of the model is constructed using two accurate measures RMSE and R 2 , and we have obtained the high accuracy in the case of considered the Exchange rate is input or output variable. Also, ANN is used to analyze the time series using "nnet" package using Exchange rate as a Xregressor (Exogenous variable). The process of forecasting and accuracy are investigated using ME, ACF, MAE, MPE, RMSE, MASE, and MAPE measures. This is done for Seasonal and non-seasonal time series data, and their results are compared. The GLM is used for Exchange rate is dependent variable, the model and estimations are tested and the predicted values for seasonal and non-seasonal time series data are explained numerically and graphically. ARIMA models are also used to analyze the multivariate time series data. Where we check the stationary of time series using ADF test. This test indicated that the most of time series data non-stationary. The differencing method, to make the time series to be stationary, is used. We retest the differenced multivariate time series data and found them are stationary (P-value = 0.01). More of ARIMA models are constructed, and selected the best model ARIMA(0,0,1) with zero mean, and minimum error, for Exchange rate time series, using auto.arima () function. The ACF and PACF plots are displayed before and after stationary process, for Exchange rate time series. The ARIMA(0,0,1) model is tested using the measures AIC, BIC, AICc, MASE, ACF, MAPE, RMSE, MAE and ME. Finally, The VARS models are used to analysis the multivariate time series. The lag order is identified, the estimation of parameters are calculated, the standard errors for estimators are calculated, the measures of accuracy are indicated. Granger test for causality, with causality() function, indicated that Exchange rate is useful in forecasting other time series with 5%. Also Instant test indicated that there is instantaneous causality between Exchange rate and other variables with significance level 5%. The predicted values for Exchange rate are forecasted using the predict() function, and we explained that graphically. We indicate that the NNAR() model for fitting the ANNs for time series models is equivalent to an ARIMA() model, but without the restrictions on the parameters to ensure stationarity. Also, for seasonal data NNAR(p,P,0)[m] model is equivalent to an ARIMA(p,0,0)(P,0,0)[m] model but without the restrictions on the parameters that ensure stationarity. For these data, we conduct that The ANN and GLM models of fitting seasonal time series is better than non-seasonal time series. The transactions of Finance, Household and Chemicals sectors are significant for Exchange rate in non-seasonal time series case. The forecasting that based on stationary time series data are more smooth and accurate. The VARS model is more accurate rather than VAR model for ARIMA (0,0,1). Finally, we used the predict() function to forecast of VAR values over a short horizon, because the prediction over long horizon becomes unreliable or uniform.