Support Vector Regression and Artificial Neural Network Approaches: Case of Economic Growth in East Africa Community

: There has been increased interest of late on the application of nonlinear methods to economic and financial data due to their robustness in handling large and complex data. With increasingly complex ‘big data’, focus has shifted into use of robust techniques in analysis of data. Various nonlinear approaches have so far been established including support vector machine which is widely adapted in classification and regression problems. This research project applied support vector regression technique and neural network models in modeling and forecasting economic growth for the five countries in the East Africa Community including Kenya, Uganda, United Republic of Tanzania, Rwanda and Burundi. Data for the period 1990 to 2014 from World Bank databases was used for the research. Support vector model and neural network models were trained using the data for the 1990-2002 whereas the remaining data was used for prediction performance to determine the robustness of the two models on external datasets. The study revealed that specific country models had better performance compared to the combined model and that although the two models compared similarly under specific-country models, the neural network performed better in most countries. The study recommends the use of the two machine learning techniques in economic growth modeling. It also recommends that the performance be compared with the traditional econometric models but using countries with more data periods.


Introduction
Economic growth is a complex process affected by a number of factors, and theory gives us no clear or single answer to the question about the right model specification. The number of variables collected in economic data have also grown enormously and become highly complex to be modeled with specified relationships only.
Of late there has been an increased interest on the application new statistical methods of nonparametric regression methods to big data. These methods are considered the most flexible approach in nonlinear modeling since they do not have defined assumptions that have restrictive application that require specific considerations.
Researchers have been utilizing economic theory to define the structural relations between economic growth and other variables. Besides, statistical approaches have been developed to identify correlation in historical tendencies such as the famous Solow-Swan model [1,2] and Cobb-Douglas models [3]. Others include the gray theory method [4], autoregressive moving average (ARIMA) model, regression model and random walk model. However, due to various assumptions such as linearity, these models can hardly be reliable to be used in forecasting [5].
Recently, artificial neural networks (ANN) have had increasing attention in financial forecasting due to their nonlinear mapping capabilities and data processing characteristics. The application of markov switching models on the US gross national product (GNP) data [6] enhanced consideration of applying nonlinear models which has led to the large techniques that have been developed [7] including the logistic regression, Support Vector Machine (SVM), artificial neural networks (ANN), Holts exponential smoothing and machine learning techniques. These methods have been applied to economic time series [8], weekly foreign exchange rates [9], daily foreign exchange rates [10] as well as bankruptcy prediction [11]. The techniques have also been applied in tourism and hospitality [12,13], energy [14,15] and meteorology [16].
SVM is one of the types of supervised learning techniques introduced in the early 1960s. It is in the family of generalized linear classifiers and which uses classification and regression machine learning theory to maximize predictive accuracy and avoiding over-fitting data simultaneously. Under SVM, rather than implementing the empirical risk minimization (ERM) principle to minimize the training error, it employs the structural risk minimization (SRM) principle to minimize an upper bound on the generalization error, and allow learning any training set without error.
Such models, having been introduced by Vapnik in the 1990's [17,18], have proved to be effective and promising techniques for handling linear and nonlinear classification and regression [19,20] due to their principle of maximal margin, dual theory and kernel trick. In contrast to other nonparametric methods such as Neural Networks, SVMs has become powerful tools to solve the problems and overcome some traditional difficulties such as over-fitting. It has performed exceptionally well in various fields of study. Specifically, SVR has been investigated and found to be the efficient, accurate and robust in handling complex analysis. This has seen the application of SVR surpass other machine learning approaches including the ANN.
This research applied SVR on the economic growth in the East Africa Member countries. There are three main measures for economic growth. These are the gross domestic product (GDP), gross domestic product on purchasing power parity (GDP PPP) and the human development index.
The most common parameter used to gauge economic growth is the gross domestic product (GDP) and its derived indicators such as the gross national product (GNP) and gross national income (GNI). Although this value is derived by dividing the gross domestic product by the population, it does not take into account the distribution amongst the population.
GDP PPP is gross domestic product converted to international dollars using purchasing power parity rates. The concept which is based on the law of one price was introduced by the School of Salamanca in the 16 th century and developed by Gustav Cassel [21]. PPP exchange rates help to minimize misleading international comparisons that can arise with the use of market exchange rates.
The human development index (HDI) has been utilized by the United Nation Development Programme in its Human development reports in rating countries in respect to life expectancy, education and per capita GDP PPP.
The GDP PPP has been used as the measure of economic growth in this research work. The research sought the possibility of developing an SVR model for economic analysis. In this research work, epsilon-SVR and a backwardpropagated Artificial Neural Network (ANN) models were developed and the results compared both for the train and test data for country-specific as well as combined data.

Research Data and Data Preparations
This research study used the official annual economic data from the World Bank and IMF websites. Data covering the period 2000 to 2014 was obtained from the World Bank databank (http://databank.worldbank.org/data/databases.aspx) and the 2016 data estimate from the IMF data portal (https://www.imf.org/external/data.htm). The study focused on the economic data of the 5 East Africa countries that make up East Africa Community. These are Kenya, Uganda, United Republic of Tanzania, Rwanda and Burundi.
The data from the World Bank was filtered on country criteria to ensure only data for the East Africa region covering the period 1990 to 2014. Variables contributing to economic growth selected for modeling. The physical capital, inflation, exchange rates, research and development investment, population growth, financial development, international trade are some of the variables included in economic growth models by the OECD [22]. Solow [1] and Swan [2] models have a real income output with labour, capital and knowledge variables as the independent variables.
Factors influencing economic growth can be categorized into human capital investment, natural resources, physical capital investment and entrepreneurship whereas the main dependent variable measuring economic growth is the gross domestic product per capita PPP (GDP_PPP).
Scaling transformation of the all the independent variables into [0,1] was done to avoid domination of features with high numeric range values. The data was divided into training and test datasets.

Support Vector Regression Model
Given training dataset, = , | ∈ ℝ , ∈ ℝ, = 1,2, … , . That is, X i is a multivariate input consisting of all the independent variables, y i is the corresponding scalar output and n is the number of the training samples. The support vector regression is defined as, where is the weight vector corresponding to , nonlinear transformations mapping function, and b is a constant threshold. The parameters and b need to be estimated.
The variables -a vector of values for the gross domestic product per capita PPP (GDP_PPP) for each of the East Africa Community members for the period 2000-2014 -matrix of variables consisting of the human capital investment, natural resources, physical capital investment and entrepreneurship corresponding to

Parameter Estimation
Flatness in the regression model means that one is seeking a small ω . The values of and b can be estimated by minimizing the following formula based on structural risk minimization principle: Where ε is predetermined value, L ε ( , ) is the empirical error measured by ε-insensitive loss function. Under SVR The empirical error is defined as, , |, ,-./01 2/ Thus to minimize the norm ‖ω‖ " * ω, ω 3 , involves solving a convex optimization problem: ) 9 + 5 * 5 , 5 * : 0 Where ξ < and ξ * < are slack variables introduced due to the error in fitting in the optimization problem. They represent the distances from actual value to the corresponding boundary value of ε-tube as shown in figure 1 below. The primal problem can also be transformed into a dual problem and its solution is given by where n sv is the number of Support Vectors (SVs) and K(x i ,x) is the kernel function.
The constant C>0 determines the trade-off between the fitness of SVR function and the amount up to which deviations larger than ε are tolerated. Optimal choice of regularization parameter C can be derived from standard parameterization of SVM solution.
Finally, estimation of b done by exploiting the so called Karush Kuhn Tucker (KKT) conditions [23,24] which state that at the optimal solution the product between dual variables and constraints has to vanish. In SVR this means, and )∝ * 5 * 0, From this, several useful conclusions can be deduced. Case of Economic Growth in East Africa Community Firstly only samples (x i , y i ) with corresponding ∝ * lie outside the ε-insensitive tube. Secondly ∝ ∝ * 0. That is there can never be a set of dual variables ∝ ∝ * which are both simultaneously nonzero as this would require nonzero slacks in both directions. For ∝ * C 0,1 , 5 * 0 while the second term vanishes. Hence can be computed as follows, ) , +, for ∝ * C 0,1

Comparative Performance with Artificial Neural Network
The study compared the performance of the support vector regression with the artificial neural network using root mean square error (RMSE).

Artificial Neural Network Procedure
Consider a supervised learning problem with labeled training examples (x (i) ,y (i) ) . Neural networks give a way of defining a complex, non-linear form of hypotheses h W ,b(x), with parameters W,b that to fit the data.
The diagram below describes the simplest possible neural network comprising of a single neuron. This neuron is a computational unit that takes as input x 1 ,x 2 ,x 3 (and a +1 intercept term), and outputs where f:R↦R is called the activation function. A neural network is put together by combining many of our simple neurons, so that the output of a neuron can be the input of another. In figure 4 above, the circles labeled "+1" are called bias units, and correspond to the intercept term. The leftmost layer of the network is called the input layer (L 1 ), and the rightmost layer the output layer (L 3 ). The middle layer (L 2 ) of nodes is called the hidden layer, because its values are not observed in the training set. This neural network has 3 input units 3 hidden units, and 1 output unit.

Back Propagation
Let δ (l+1) be the error term for the (l+1)-st layer in the network with a cost function J (W,b;x,y) where (W,b) are the parameters and (x,y) are the training data and label pairs. If the l-th layer is densely connected to the (l+1)-st layer, then the error for the l-th layer is computed as and the gradients are If the l-th layer is a convolutional and subsampling layer then the error is propagated through as Where k indexes the filter number and f′(z (l) k ) is the derivative of the activation function. The upsample operation has to propagate the error through the pooling layer by calculating the error with respect to each unit incoming to the pooling layer. In max pooling the unit which was chosen as the max receives all the error since very small changes in input would perturb the result only through that unit.
Finally, calculation of the gradient with respect to the filter maps relies on the border handling convolution operation again and flip the error matrix δ (l) k the same way, the layers is flipped in the convolutional layer.
Where a (l) is the input to the l-th layer, and a (1) is the input image. The operation (a (l) i ) * δ (l+1) k is the valid convolution between i-th input in the l-th layer and the error with respect to the k-th filter.

Model performance and forecasting
The performance of the model will be assessed on the validation dataset using both the mean square error (MSE).
Where y < is the i th observed value, y r < is the i th fitted value and n is the total number of validation data. The developed SVR model was used to predict outputs of the given inputs in the test data.
The predicted economic growth indicator per capita PPP (lnGDP_PPP) will, therefore, be: Where y r < is the predicted lnGDP_PPP values, matrix data for 2016, k and s are the parameters estimated and optimized in the previous sections.
The analysis applied various packages including e1071, caret and neuralnet packages in R statistical R Version 3.1.

Pre-processing and variable selection
To enable utilization of both models to be compared (Neural network and support vector regression), the variable importance was determined using model independent metrics whereby each variable is evaluated by filtering.
In this case, the relationship between each possible independent variable with the dependent variable was calculated. Variable selection for the modeling was done through principal component analysis (PCA) with scaling.
A cutoff eigenvalue of 1 was used to select the number of principal components reducing them to only four (PC1=3.02, PC2=2.83, PC3=2.03 and PC4=1.27) which explains over 76% of the variance. The dependent variable is correlated to the first two principal components (PCA1 ry =0.474, PCA2 ry =0.690) and 8 of the independent variables a correlation of at least 0.4 with the either of the first two. Further analysis was carried out to remove variables with almost constant values which could affect the models. Variables with over 90% frequency ration were included for further analysis.

Summary Statistics
The distribution of all the variables used are in figure 5 below. the GDP growth has been fluctuating around zero for all the East African countries. Around 1994, Rwanda had the lowest GDP growth ever experienced by any other country. This was due to the effect of genocide which has also been reported elsewhere [25,26]. On the other hand, Burundi has had the lowest economic growth compared to the other members of the community from the mid-90s to date except towards 2010 when Kenya had the lowest economic growth which has been attributed to the 2007/2008 post-elections violence.

SVR Modeling and hyper-parameter optimization
The optimization of the SVR parameters was carried out using the standard grid search technique having been utilized by other authors [27]. In this way, the best combination of the epsilon, cost and gamma parameters was chosen based on the error obtained from the training data. A 5-fold cross validation method was included in this process. The optimized SVR hyperparameters per country is presented in table 2 below. All the optimal hyperparameters varied across the countries. Generally, Kenya (γ=0.85) and the combined model (γ=0.05) had higher gamma parameters compared to Burundi (γ=0.05), Uganda (γ=0.001), Rwanda (γ=0.05) and Tanzania (γ=0.005). These two SVR models would be more nonlinear as compared to the rest. The cost parameters were lower for Burundi, Kenya and Tanzania (C=5) which indicates lower penalties to support vectors as compared to those for Uganda (C=30), Rwanda (C=10) and the combined mode (C=55).
For the neural network model, all the specific models performed better than the combined model (MSE=0.211) which yield about 10 times poorer than the specific model for Kenya (MSE=0.02), 5 times that of Tanzania (MSE =0.042), Rwanda (MSE train=0.064) and Uganda (MSE =0.048). The specific model for Burundi had the lowest training error (MSE=0.006).

Comparative Performance of SVR and Neural Network
Based on the training models, table 3 below are the results of the SVR and neural network models. The results presented are the fitted values obtained using the respective models as well as the observed values for the period 1990 -2002.

Prediction results
Based on the test data covering the period 2003 -2014, below are the results both for the SVR and neural network In undertaking the prediction, the poor performance of the combined models both in SVR and neural network was taken into account and therefore only performance of the countryspecific SVR and neural network models was tested. Based on the table 4 above, except the SVR model for the Burundi which performed extremely poor compared to the neural network, the two models were generally robust outside the training data.

Conclusions and Recommendations
This work sought to establish the application of support vector regression in modeling economic growth using the East Africa Community data obtained from the World Bank, compare its performance to the neural network model and determine their robustness in forecasting.
The comparative performance of the two models based on their mean square errors indicated similar performance. However, the neural network model was better compared to the support vector regression for Tanzania and Burundi under the specific-country models while it performed better in combined model for all countries.
The robustness of the models in using external test data for the period 2003-2014 showed that the two models compared similar except for Burundi in which neural network highly outperformed the SVR and vice-versa for Tanzania. Using the combined model, neural network outperformed SVR across all countries.

Appendix III: Neural Network model
The neural network model is presented in the table 5 below.