Performance Comparison of Various Kernels of Support Vector Regression for Predicting Option Price

The famous Black Scholes Option Pricing Model is a well-known option pricing model. Owing to some limitations it fails to perfectly detect the option price. In this study various regression and optimization techniques for predicting option price and analyzing various phenomena and properties with machine learning techniques for valuation and improving the accuracy of the option pricing model are used. The Proposed method is divided with different stages. Firstly, Principal Component Analysis (PCA) is used in order to identify the most influential inputs in the framework of the option pricing model and to reduce the dimensionality of our working data. Secondly, Support vector machine (SVM) and support vector regression (SVR) is used which is a very special type of learning algorithms characterized by the capacity of input variable as option price parameter and the use of the kernel functions. The combination of these two methods shows that SVM and PCA can perform better by consuming less time and memory. In this study, we investigate the estimation performance of option pricing model with SVM and PCA. A brief analysis of the accuracy of the approach also provided. The training of SVM and normalization of PCA is computed by MATLAB and it leads towards a new way for predicting option price perfectly if the formulation will be simulated using enough data.


Introduction
Now a day one of the most important topics in finance is the valuation of option pricing. Option price accuracy yet difficult task in computational financial world. It follows a complex pattern and a stochastic behavior in stock price. 'Option' which is the right (not the obligation) to buy (call option) or sell (put option) the underlying asset at a particular price. Options can reduce the financial risk on the future events [12]. Since 1970 many methods were adopted for dealing this type of problem. The influential work in the field of option pricing is done by Black and Scholes (1973) with their formula for option pricing or more famously known as the Black-Scholes equation for Option Pricing [2]. It is difficult to justify certain assumptions for different parametric specification in the real-world data as there exists nonlinear relationship between option price and various variables. Therefore, in recent years, many researchers turned in to machine learning or nonparametric methods as they are capable to capture nonlinear relationship between input and output [2,14].
Two different approaches will be given in this study. The first approach is using Principal Component Analysis (PCA) was invented by Karl Pearson (1901). Principal Component Analysis is considered an experimental technique that can be used to gain a better understanding of the interrelationships between variables. Principal Component Analysis (PCA) can be consumed to test for normalcy. In this study the Principal Component Analysis (PCA) used to Normalize the original data to a simple data.
The other approach is using Support Vector Machine (SVM). One of the objectives of financial methods is price of option evaluation. In this study, it is concentrated more on the Support Vector Machine, a pattern classification algorithm was developed by V. Vapnik (1992). Support Vector Regression (SVR) and Support Vector Machine are powerful methodology for approximating complex function.
This study presents option pricing model that combines PCA and SVM to predict option prices. In these approaches, the PCA serves as a state estimator and makes predictions based on the Black-Scholes formula. The residuals between the actual prices and the Black-Scholes model are fed into the SVM in the model, and the SVM is conducted to further reduce the prediction errors. Empirical results of this study established that the model of PCA and Support Vector Machine. The empirical results shown that the PCA can capture all of the patterns in the option prices, and the performance of the model can significantly reduce the option price errors.

Related Work
A large number of academic studies have examined the relative performance of options price in several countries, few of the studies are mentioned here. Hutchinson et al. [14] in 1994 used neural network for option pricing and compared its performance with Black Scholes model. The proposed model performed fairly well. M. Liu [22] in 1996, Yao et al. [31] in 2000 and Andreou [1] in 2008 successfully applied neural network in option pricing. Saxena [26] studied European-style CNX Nifty Options traded at National Stock Exchange of India. He combined the BS model and Artificial Neural Networks (ANNs), for option pricing and concluded that hybrid model can improve the pricing performance of options under all market conditions and Mitra [23] in 2012 studied Nifty Options in India and forecasted it using neural network. Lajbcygier et al. [19] improved the Hybrid Neural Network using bootstrap methods to reduce bias in existing model.
There are so many researches, in which Support Vector Regression has been successfully used as option pricing tool. In my previous work Support Vector Regression and multiple kernel Support Vector Regression has been successfully used for nifty index option pricing. M. M. Pires et al [24] compared the performance of a Multi-Layer Perceptron neural network and a Support Vector Regression in pricing American styled options. It was concluded that a Support Vector Regression approach provided promising results than that found with Multi-Layer Perceptron. S. C. Huang et al. [13] combined the unscented Kalman filters (UKFs) and Support Vector Regression (SVR) to predicting option prices. The difference between the market option prices and the Black-Scholes option price is taken input to SVR for reducing the prediction errors. The performance of the new hybrid model is better than pure SVR models or UKFs models in option pricing. P. Wang et al. [30] used Support Vector Regression (SVR) integrated with stochastic volatility models for forecasting of currency option pricing. The results reveal that integrated model performed better than traditional approaches such as Garman-Kohlhagen Formula (GK) model and ANN Option pricing model. Panayiotis C. Andreou et.al. [1] used Support Vector Regression and Least Squares Support Vector Regression for pricing S&P 500 index call options with Deterministic Volatility Functions approach and compared results with the traditional Black Scholes model. He obtained promising results for the both SVR models. L. Xun, et al. [21] gave some modifications on three parametric methods, the binomial tree method, the finite difference method and the Monte Carlo method, to forecast the option prices and further refined the forecast results by nonparametric methods ANN and SVR by decreasing the nonlinear errors. He found that, compared with the standard and improved parametric option price forecasting methods, the ANN and SVR have higher forecasting accuracy. ChihMing Hsu et al. [13] compared the price of Taiwan Stock Exchange Capitalization Weighted Stock Index Options (TAIEX Options) by three approaches i.e. Black-Scholes (BS) model, Genetic Programming (GP) and Support Vector Regression (SVR) with all basic factors in the B-S model and the other factors in GP and SVR model. They concluded that, both GP and SVR forecasting models gave more promising results than Black-Scholes model.
For further improving forecasting performance in option pricing, multiple kernel Support Vector Regression with SMO algorithm is applied. There are many kernel methods which have been applied in various applications, Multiple Kernel Learning is one of them, where kernels are combined in linear or nonlinear ways for maximizing a generalized performance. This approach learns both Lagrange's multipliers and kernel weights in a single optimization Lanckriet et al. [20]. In the area of kernel learning F. R. Bach et al. [3] considered combinations of kernel matrices in MKL based on sequential minimization optimization with smoothed version of the given problem.
In this paper the Principal component analysis (PCA) is used to normalized the data, Support vector machine (SVM) and SVR with various kernel are applying to predict the option price. The paper is structured as follows. First introduce the theoretical analysis of Black-Scholes, Principal Component analysis (PCA), Support vector machine (SVM) and Support vector regression (SVR). In second, this report discusses about the data structure and algorithm on each type of option pricing dataset. Then result and discussion with scope and limitations is described and Conclusion is presented in last section.

Related Literature Reviews
This section provides related topics that were used to create the proposed system. Black-Scholes Model (BS), Principal component analysis (PCA), Support vector machine (SVM) and Support vector regression (SVR) are explained in short.

Black-Scholes Option Pricing Model
The Black-Scholes Model describes the behavior of options on assets that follow a Geometric Brownian Motion that satisfies the following stochastic differential equation Where, µ = drift rate: The mean change per unit time for a stochastic process is known as the drift rate, σ = standard deviation: The variance per unit time is known as the variance rate, X = standard Brownian motion, S = Current stock price, t = unit time The variable µ is the stock's expected rate of return. The variable σ is the volatility of the stock price. The variable 2 σ is referred to as its variance rate. The model in the above equation represents the stock price process in the real world. In a risk-neutral world, µ equals the risk-free rate r . This model is often referred to as the geometric Brownian motion assumption in Black-Scholes, which looks like geometric growth driven by a drifting Brownian motion. In this paper, beside these parameter it also use the parameter Rho ( ρ ), Gamma ( Γ), Vega ( Λ ) to predict the option price of a data set.

Principal Component Analysis (PCA)
One of the main problems in statistics is the problem of picturing data that has many variables. But when there are more than three variables, it is more complex to imagine their connections. In data sets there are many variables, groups of variables often change together. In many systems there are only a few such dynamic forces. But a plenty of planning enables us to measure dozens of system variables. This report can simplify the problem by substituting a group of variables with a single new variable.
Principal component analysis (PCA) is a quantitatively challenging method for achieving this simplification. The method produces a new set of variables, called principal components. Each principal component is a linear combination of the original variables. Every principal component is orthogonal to each other. The principal components as a whole form an orthogonal basis for the area of the data. From the above discussion the full set of principal components is as large as the original set of variables. But it is common for the sum of the variances of the first few principal components to exceed 80% of the total variance of the original data. To use PCA, it needs to have the actual measured data that is to be analyzed. However, if it has shortage the actual data, but have the sample covariance or correlation matrix for the data. To reduce the dimension of our original data by PCA, the following methodology is use. Support vector machine (SVM) and Support vector regression (SVR). Support Vector Machine (SVM) is a classification and regression forecast tool that uses to maximize predictive accuracy while automatically avoiding over-fit to the data. It is a supervised learning algorithm which is also known as Support vector network. Vladimir N. Vapnik and Alexey Ya. Chervonenkis invented the original SVM algorithm in 1963. Depending on the nature of the data, such a separation might be linear or non-linear.
Let us consider a linear classifier (or, hyperplane) where w represents weight vector, x is the input feature vector and b represents the position of the hyperplane. Here, (a) if the input vector is 2-dimensional, the linear equation will represent a straight line. (b) if the input vector is 3-dimensional, the linear equation will represent a plane. (c) if input vector more than 3-dimension, the linear equation will represent a hyperplane. The SVM algorithm is to find an optimal hyperplane for classification of two classes. Assume that the equation of hyperplane is The distance between . For non-linear classifier SVM use kernel function to separate the data points. Obtain a nonlinear SVM regression model by replacing the dot product 1 2 .
T x x with a nonlinear kernel function is a transformation that maps x to a high-dimensional space. Popular kernel functions are a Linear kernel function: ( , ) where i x and j x are support vector where support vector is the input vectors that just touch the boundary of the margin.
Simply, support vectors are the data points that lie closest to the decision surface (or hyperplane).

Data Structure and Methodology
Financial data offer an excellent source of difficult and challenging problems to the computing community. Many applications, time series prediction to stock selection have been attempted. Hutchinson et. al. (1994) and Niranjan (1996), focused on the widely used Black-Scholes formula may be obtained with neural networks, while the latter looked at the non-stationary aspects of the problem. De Freitas et. al. extends this work by incorporating noise estimation and powerful sampling algorithms. In this section, it will present some useful discussion. Firstly, it discusses the data structure and source of the data. Then discussion about the relationship between predictor and response of the data. The next section will discuss the accuracy of the given data using SVM and PCA method. Finally, it will observe the accuracy of the predicted option prices.
We collect the data from the spy option of yahoo finance 2015. From this data, we will calculate the accuracy of the option price. In the original data, we modified some of the dimensions of the variable.

First Approach
Firstly, this paper evaluated and tested the relationship between input variables and output variables by using a simple quadratic polynomial regression after standardizing the data by its mean and standard deviation and by transforming the data with the bi-square weighting scheme. This paper calculated various measurements which are discussed below.

What is R-Squared
R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression. The definition of R-squared is fairly a straight-forward; it is the percentage of the response variable variation that is explained by a linear model, or a R-squared = Explained variation / Total variation b R-squared is always between 0 and 100%: c 0% indicates that the model explains none of the variability of the response data around its mean. d 100% indicates that the model explains all the variability of the response data around its mean. In general, the higher the R-squared, the better the model fits data. The adjusted R-squared compares the explanatory power of regression models that contain different numbers of predictors. The adjusted R-squared is a modified version of R-squared that has been adjusted for the number of predictors in the model. The adjusted R-squared increases only if the new term improves the model more than would be expected by chance. It decreases when a predictor improves the model by less than expected by chance. The adjusted R-squared can be negative, but it's usually not. It is always lower than the R-squared.

How to Increase R-squared and Adjusted R-squared
Since our data is a bit noisy and not scaled for training a machine learning model. It had seen that data is unable to fit a 3 rd degree polynomial model. So, it's necessary to normalize our dataset. From above discussion, the value of data is not suitable and there is a very weak relationship among them, so there are a few options for removing this limitation. a Removing residuals b Removing Outliers c Normalizing and Scaling Data so that it can fit a model An alternative weighting scheme is to weight the residuals using a bi-square. This stage, first compute the residuals from the unweighted fit and then apply the following weight function: where, M is the median absolute deviation of the residuals. The weight is set to 0 if the absolute value of the residual is greater than 6M . This method provides an effective alternative to deleting specific points. Extreme outliers are deleted, but mild outliers are reweighted rather than deleted altogether. The values of R-squared and Adjusted R-squared after applying this method are:

Evaluating the Model Using SVM and PCA
In the market price data, this paper takes the seven-input variable. But the Table 1 But in the above data table, the 4742 rows and eight columns are difficult to run the data into any program. So, we use the PCA to normalize the data. By using the PCA the normalization of the seven column is transformed into two columns. By this time the 4742 rows and the two columns are eligible to run the programmer for valuation of the option price. The required two input variables able to explain 68.6% and 33.3% of Variance of the total data are: Var Name 1 Var Name 2 Feasibility gap obtained by SMO is a nonnegative scalar. If Gap Tolerance is 0, then SVM does not use this parameter to check convergence. The reason behind the convergence is Feasibility gap in our case. The Feasibility gap found after converging is mentioned the   Figure 2 shows that the prediction of pattern with real option price. This figure, shows that the real option price and the predicted pattern is far away. These blue points refer to the real option price and the red points refer to predicted patterns. In this figure, the value of R-Squared is 0.25. So, the quadratic kernel with Predicted price and real option price is covered by these patterns. Figure 3 shows that the prediction of pattern with real option price with Gaussian kernel. This figure, shows that the real option price and the predicted pattern is nearly close to each other comparing with Quadratic kernel pattern. These blue points refer to the real option price and the red points refer to predicted patterns. In this figure, the value of R-Squared is 0.27.

Figure 3. Identified Pattern by SVR with Less Optimized Gaussian Kernel
Where Epsilon ( ε ) = 0.2. Figure 4 shows that the prediction of pattern with real ρ Λ Γ option price. This figure, shows that the real option price and the predicted pattern is nearly close to each other comparing with the Quadratic kernel and Gaussian kernel pattern. These blue points refer to the real option price and the red points refer to predicted patterns. So, the Gaussian kernel (with highly optimized) with Predicted price and real option price is maximally covered by these patterns.

Figure 4. Identified Pattern by SVR with Highly Optimized Gaussian Kernel
Where Epsilon ( ε ) = 0.002. Figure 5 shows that the response of the pattern with the record number. This figure, shows that the actual option price and the predicted option price by SVR with the Quadratic kernel. These blue points refer to the predicted option price and the red points refer to the actual option price. This figure, shows that with Quadratic kernel the actual option price and the predicted option price maximally covered.    Figure 7 shows the response of the pattern with the record number. This figure, shows that the actual option price and the predicted option price by SVR with the Gaussian kernel. These blue points refer to the predicted option price and the red points refers to the actual option price. This figure, shows that with Gaussian kernel the actual option price and the predicted option price mostly covered. For this pattern the number if iteration is 1665.   Figure 9 shows that the response of the pattern with the record number. This figure, shows that the actual option price and the predicted option price by SVR with Gaussian kernel (Highly Optimized). These blue points refer to the predicted option price and the red points refers to the actual option price. This figure, shows that with Gaussian kernel (Highly Optimized) the actual option price and the predicted option price are maximumly covered comparing with the Quadratic and Gaussian kernel. Here the number of iterations is 3527. The bias also positive. So, the Gaussian kernel (Highly Optimized) best kernel among the above three kernels.       Figure 13 shows that the price of the pattern with the record number. This figure, shows that the actual data points and the support vector points by SVR with Gaussian kernel. These blue points refer to the actual data points and the red points refer to the support vector points. This figure, shows that with Gaussian kernel the support vector can capture 2354 points. Where the box constraint is 40. So that comparing with the Quadratic kernel, the Gaussian kernel can capture fewer vector points.   Now in the below, this section describes a table comparing with the Quadratic Kernel SVR, Gaussian Kernel SVR and Gaussian Kernel (Highly Optimized). In this table, we compute the actual option price and the predicted price with respect to the above three kernels. Among these three kernels the Gaussian kernel (Highly Optimized) the actual option price and the predicted option price is close. We can see that the SVR with Gaussian kernel scores highest but its support vector points are minimum, And SVR with Quadratic Kernel and another Gaussian Kernel with different epsilon have more support vector points but the error and R-Squared are less than the middle one. From the above table we can reach the decision that: a Option prices follow the Gaussian process mostly but if we optimize any Gaussian SVR highly, the recognized pattern is as same as the Quadratic SVR. b The Quadratic SVM can predict option price sometimes more accurately than Gaussian SVR but R-Squared is low here that indicates more optimization is required for gaining better accuracy. But the SVR with the Quadratic kernel is able to predict those values more accurately. c If we decrease the value of epsilon, then support vector points increase highly and it indicates that the model with a smaller margin is highly optimized and able to explain a higher variance. d It's possible to predict the option prices which is less than 50 with a high accuracy by using each and every model. e The larger value of Box Constraints allows us to predict relatively higher option prices but the accuracy gained it is not satisfactory. f The training time is very high in case if Quadratic SVR even after doing PCA. g Any SVR with Gaussian Kernel is capable to capture the pattern easily but with low accuracy for each data points but Quadratic SVR is able to capture the partial pattern with a high accuracy. h After optimizing the SVR with Gaussian Kernel, it performs like an SVR with Quadratic Kernel and in addition, it is able to predict higher option prices.

Scopes and Limitations
a A very small dataset is used here for doing this project which isn't enough to capture the pattern of option price and predict them. But for the limited computational capacity, we were unable to work with a bigger dataset. b We didn't filter the data which may bring significant change in this case. c We think that adding more input variables like the difference between two successive option prices can be included for tuning the model and for gaining better accuracy. d This paper's aim was to apply various mathematical techniques like Linear Programming, Control, Optimization and etc. than only learning pattern from the data. If it's possible to access better research environment with highly configured machines, this work can be extended towards an automated software which can continuously predict option prices.

Conclusion
In this report, we study the use of Support Vector Machines (SVM) and Principal Component Analysis (PCA) to analyze the option price. The estimator is based on the Black-Scholes Model, which is captured by the SVR and PCA. Compared with another hybrid model based on SVR and PCA. In any case, PCA and SVR is a successful system to understand minimization. SVM is a promising type of tool for financial estimating. As demonstrated in the empirical analysis, SVM is superior to the other individual regression methods in analyzing option pricing. This is a clear message for financial estimator and traders, which can lead to a capital gain. However, each method has its own strengths and weaknesses. Thus, we propose a combining model by incorporating SVR with PCA. The weakness of one method can be balanced by the strengths of another by achieving a systematic effect. The combining model performs best among all the analyzing methods In this project we have presented a novel disintegration algorithm that can be used to train Support Vector Machines (SVM) on large data sets (4742 data points). But an implementation of the technique currently under development will be able to deal with much larger number of support vectors (say about 100,000) using less memory. We discuss the predicted option price with Quadratic Kernel, Gaussian Kernel and Gaussian Kernel (Highly optimized) with Support Vectors 4386, 2354 and 4656 respectively. The number of iterations among them is 142001,1665 and 3527 respectively. There are several reasons for which we have been investigating the use of SVR. Among them, the fact that SVR with Gaussian Kernel (Highly Optimized) are very well founded from the mathematical point of view, being an approximate implementation of minimizing the error. The only hyper parameters of SVMs are the positive constant and the parameter associated to the kernel (in this case the degree of the polynomial is two). Since the expected value of the ratio between the actual option price and predicted option price, the total number of data points is an upper bound on the generalization error by using Gaussian Kernel (Highly Optimized). Finally, we were able to analyzing the option price model with the market price.
Mathematics Discipline, Khulna University, Khulna. We also want to thank to our batchmates, seniors and juniors of Mathematics Discipline, Khulna University, for helping us to complete this project work. Eventually, we would like to thank all of our honorable teachers for their precious advices and support throughout the period of our study. We would also like to thank all of our elder brothers, friends and nonteaching staff of Mathematics Discipline, Khulna University, because of their support and encouragement for whole period of our study.