Building a Model for Prediction Exchange Rate from USD to VND Using a Novel Method

In this paper, the combination of the Hilbert-Huang Transform (HHT), Support Vector Regression (SVR) and an embedding theorem is described to predict the short-term exchange rate from United States dollar to Vietnamese Dong. Firstly, we use Empirical Mode Decomposition (EMD) of the HHT to decompose a signal into multi oscillation scales called Intrinsic Mode Function (IMF). After that, we synthesis the signal without highest oscillation IFM to reduce noise. Next, we use the False nearest neighbors algorithm to find the embedding dimension space of the de-noise signal. Finally, we use SVR to build a model for prediction exchange rate between US dollar and VND. By using the Hilbert-Huang Transform as an adaptive filter, the proposed method decreases the embedding dimension space from twelve (original samples) to four (de-noising samples). This dimension space provides the number of inputs to the SVR model, which affects the complexity and the training time decrease of the model. Experimental results indicated that this method not only reduces complication of the model but also achieves higher accuracy prediction than the direct use of original data.


Introduction
Forecasting financial time series serves an important role in daily life, especially the exchange rate forecast that helps an importer and exporter to choose the best time to import or export products to obtain the highest profit. Researchers have successfully employed approaches for predicting exchange rates. In [1], authors combined kernel regression (KR) and the Function Link Artificial Neural Network (FLANN) to predict the exchange rate from United States dollar (USD) to British Pound (GBP), Indian Rupee (INR) and Japanese Yen (JPY). KR served a role in filtering, and FLANN was a model for prediction. The authors in [2] use chaos theory and reconstructed state space for predicting the exchange rate between USD and EURO (EUR). Fan-Yong Liu [3] uses a hybrid discrete wavelet transform (DWT) and support vector regression (SVR) to predict the exchange rate between Chinese Yuan (CNY) and USD. First, this researcher uses DWT to decompose time series data to different time scales and later appropriately chooses a kernel function for SVR and a prediction that corresponds with each time scale. Her synthesis prediction is obtained from different predicted time scale results. The authors in [4] use a local fuzzy reconstruction method to predict the exchange rate between JPY, USD and Canadian dollar (CAD). The authors in [5] use a successful wavelet transform to filter noise in the exchange rate time series before using it to train and predict the base on the Multi-Layer Feed Forward Neural Network. Weiping Liu [6] uses a hybrid neuron and fuzzy logic to predict the exchange rate between JPY and USA. Authors in [13] used Cartesian Genetic Programming Evolved Artificial Neural Network to build model for prediction exchange rate between US dollar and various currency. Authors in [14] used neural networks with five different training algorithms to build model for prediction exchange rate between four foreign currency exchange rates against Indian Rupee. They found out that the neural networks was trained by Levenberg-Marquardt algorithm achieved the best performance. In 1998, Dr. Huang proposed a novel method for decomposition nonlinear and nonstationary time series into a set of multi-time scale signals, which are referred to as intrinsic mode function (IMF). This method is successfully applied in many fields, such as an adaptive filter for nonstationary and nonlinear data [7,8]. In this paper, the combination of HHT, SVR, average mutual information and an embedding theorem to predict the exchange rate from USD to VND is presented. This method is simple, adaptive and high-accuracy prediction.
This paper is organized as follows: In the next section, related studies are presented. Section 3 discusses the principle of the proposed method. Section 4 demonstrates the accuracy prediction of the method by experiment. Section 5 presents the conclusion and future studies.

Finding the Time Delay
According to the literature review [11], If we select too small time delay T, then two data points s(n+jT) and s(n+(j+1)T) will be so close to each other that we cannot distinguish them from each other. Similarly, if we choose so large T, then s(n+jT) and s(n+ (j+1)T) are completely independent of each other in a statistical sense. To determine the proper time delay of a time series we can base on average mutual information. Assume we have two systems called A and B, and measured values from those systems denoted by a k , b k , the mutual information between a k and b k is specified as equation (1) below: where P A (a) is probability of observing a out of the set of all A, and the probability of finding b in a measurement of B is P B (b), and the joint probability of the measurement of a and b is P AB (a, b). The average mutual information between measurements of any value a i from a system A, and b k from a system B is average over all possible measurements of I AB (a i , b k ) and can be calculated by equation (2) To apply this definition into time series data s(n) which is measured from a physical system. We consider the set of measurements s(n) as the set A and measurements a time lag T, s(n+T), as the B set. The average mutual information between time series s(n) and s(n+T) can be evaluated as equation (3).
Hence, the average mutual information is a function of time lag T and T can be specified as the first min of the I(T). If I(T) has not a minimum, then T will be chosen as 1.

Finding the Time Delay
The method of false nearest neighbors (FNNs) has been proposed in [11] to obtain the minimum embedding dimension. The principle of the method is based on the idea that points that are close to each other may not be neighbors even if the embedding dimension is increased. The FNNs method is used to calculate the adequate number of dimensions for embedding a time series. For a given time series, the data comprises y(k), where k=1, 2, …, n. The idea of the method is to combine sequence values into vectors and construct d-dimensional vectors from the observed data using a delay embedding as shown in Eq. (4) [11,12].

k=1, 2, …N-(d-1)
Each vector y(k) has a nearest neighbor y NN (k) with nearness in the sense of some distance function, in dimension d. For each vector, its nearest neighbor is obtained in d-dimensional space using the Euclidean distance in Eq. (5).
Next, the distance between the vectors in d-dimensional space is compared with the distance between the vectors when embedded in dimension d + 1, as shown in Eq. (6).
where R t represents the threshold. In [11], the authors recommend the range 10 ≤ ? @ ≤ 50. In our case, ? @ = 10 and a second criterion of falseness of nearest neighbors has been considered as suggested in [11] (refer to Eq. (7)).
where ? is the standard deviation of the given time series data, and I @ = 2 . For instance, K ) and its nearest neighbors are false nearest neighbors if either Eq. (7) or Eq. (8) fails.

Hilbert-Huang Transform
The Hilbert-Huang Transform is proposed by Dr. Huang in 1998 [9] and consists of two parts. The key part is empirical mode decomposition (EMD). In this part, each signal is decomposed into a finite set number of Intrinsic Mode Function (IMFs), which satisfies two criteria [9]: The first one is that the number of extremes and zero crossings must be equal or differ at most by one in the whole data set.
The second one is that this number is symmetrical, which indicates that the mean of the upper envelope at any point connects all local maxima and the lower envelope that VND Using a Novel Method connects all local minima is zero. The flowchart of the EMD algorithm to decompose any signal L ' ) to IMFs is illustrated in Figure 1. Three stopping criteria exist. The first criterion was employed by Dr. Huang in 1998 [9]. This stopping criterion is determined using a Cauchy type of convergence test. The test requires the normalized squared difference between two successive sifting operations, which are defined as follows: For the given small threshold (TU) value, the sifting process will stop when M0 is less than a small chosen threshold TU. For the second criterion, the sifting process will only stop after M consecutive times when the numbers of zero-crossings and extremes remain the same and are equal or differ at most by one. S is the predefining value; its optimal value ranges from four to eight as suggested by Dr. Huang [8,9]. The criterion has also been suggested by Dr. Huang: the number of shifts should be fixed at ten. In our case, the first criterion is applied.
After the EMD process, a set of IMFs is obtained from high frequency to low frequency oscillation and residue (trend). Summing all IFMs and residue, the original signal is obtained.

Building the Model Based on Support Vector Regression (SVR)
The goal of a Support Vector Machine (SVM) is to find the optimal hyper plane (Hyper plane may be plane or curve) to classify data into two separate regions so that the distance between the closest point and the hyper plane is at maximum. This is also called the margin [10,15]. Figure 2 illustrates a hyper plane and a margin.
Assume, the equation of a hyper plane is w.x + b=0. The goal for the SVM algorithm is to find w and b to maximize the margin. The SVM algorithm not only applies to solving classification problems but also to finding solutions to regression subjects. The SVR algorithm is based on a loss function, which is tolerant of error for points distant from the true value with in a small epsilon. This means that this function gives zero error for all the points in training set that lie in the epsilon range. Figures 3 and 4 illustrate linear and nonlinear regression within the epsilon range.    For SVR, the input x is mapped into m dimension feature space by a nonlinear mapping function first, and then the linear model is built, which is based on this dimension feature space by equation (9): Where g i (x), i=1, 2, 3,…, m is a set of nonlinear mapping functions.
The accuracy of the estimate is evaluated by loss function L(y, f(x, w)). SVR uses a loss function called epsilon-an insensitive loss function which proposed by Vapnik: Thus, SVR performs linear regression in multi dimension feature space using function L and minimizing _W_ for decreasing complexity of the model. This problem can be solved by introducing slug variables ` and ` a with i=1, 2, 3,…n, to measure the deviation of the training samples which lie outside of the epsilon range. Therefore, SVR is minimized by the function below: with constraints: Applying the duality theorem for minimizing problems, we finally obtain the function f(x): where n SV is the number of support vector, and K(x i , x) is the kernel function which can be defined as

Principle of the Proposed Method
The principal of our proposal method is illustrated by Figure 5. Firstly, we decompose a signal into IMFs. Secondly, we reconstruct the signal without IMF1 (highest frequency oscillation) to reduce noise. After that, we find the time lag and embedding dimension of the de-noised signal. Next, we use the de-noise time series and obtained the time lag and embedding dimension to create a model using SVR. After that we test the accuracy of the model by using test set. We use MSE (mean square error) as performance index. If this index less than or equals small predefined value then we move to the final step. Otherwise, we change parameters (C, ε) of the model and train the model again.
Finally, we use the tested model to predict the future value of the exchange rate between USD and VND.

Experimental Results
To assess the performance of our proposal, we use the daily exchange rate between USD and VND from January 1, 2019 to December 31, 2019 (https://vn.investing.com/currencies/usd-vnd-historical-data). The total data length of the data set is 261 samples. We divide data set into two sets. The training set consists of the first 221 data points and the test set has the last 40 data points. Figure 6 shows the original data vs. the de-noises data using HHT.     Figure 9 shows the testing results produced from the model that be trained by the original data and the filtered data. Table 1 compares the performance between the model using the original time series data and the model using the de-noise time series data.
The embedding dimension of the original time series is twelve, whereas the embedding dimension of the de-noised time series is four. Decreasing the embedding dimension causes a decrease in the complexity of the dynamic system.
In our case, we decrease the number of inputs to the SVR. This phenomenon not only reduces the training time but also decreases the prediction time and increase the accuracy of prediction.

Conclusion
We presented the combination of HHT and the algorithm of false nearest neighbor to obtain a minimum embedding dimension space, average mutual information to find the time lag of time series and support vector regression to predict the exchange rate between USD and VND. The HHT serves a role in adaptive filters and reconstructed signals without high-frequency oscillation, and the IMF can reduce noise. We use a de-noised signal to obtain the time lag, and the embedding dimension space and the training support vector regression model. The experiment revealed that the de-noised signal can decrease the embedding dimension space, decrease the complexity of the system and achieve higher accuracy prediction than the direct use of the original signal.
In the future, we are going to apply more soft computing techniques such as Feed Forward Neuron Networks (FFNN), fuzzy logic to build the model for prediction exchange rate, compare the accuracy prediction among models to choose the best model for prediction exchange rate between USD and VND.