Comparative Analysis and Investigations of Various SVM Kernels Using Cellular Network KPI Data

: Classification plays a major role in every field in of human endeavors which Support Vector Machine (SVM) happened to be one of the popular algorithms for classification and prediction. However, the performance of SVM is greatly affected by the choice of a kernel function among other factors. In this research work, SVM is employed and evaluated with six different kernels by varying their parameters especially the training ratio to investigate their performance. The training ratio was varied in the proportion of 60-20-20, 40-30-30 and 20-40-40 to obtain higher classification accuracy. Based on the performance result, GRBK and ERBK kernels are capable of classifying datasets at hand accurately with the best specificity and sensitivity values. From the study, the SVM model with GRBF and ERBF kernels are the best suited for call algorithm data at hand in terms of best specificity and sensitivity values, followed by the RBF kernel. Also, the research further indicates that MLP, polynomial and linear kernels have worse performance. Therefore, despite SVMs being limited to making binary classifications, their superior properties of scalability and generalization capability give them an advantage in other domains.


Introduction
Global demand for high-speed, large data transmission services, such as internet access via a mobile terminal, voice and video conferencing with mobile phones, are gaining popularity and growing faster than the required infrastructure in place or deployed already. With the tremendous advances in telecommunication and data communications, it can be concluded, perhaps, that the most revolutionary is the development of cellular networks which are radio network comprising cells, which are interconnected usually over coverage area of a given geographical area [15]. This is the 18th year since Global System for Mobile communications (GSM) was introduced to Nigeria. However, 18 years on, the quality of mobile cellular services are getting worse and disgusting due to the exponential growth rate of subscribers on voice, data and internet access. The telecom companies failed to expand, upgrade, automate and optimize their systems to meet up with the ever-increasing subscribers [11]. Therefore, the efficient user roaming and management of the limited radio resources become important for grade of service (GoS), quality of service (QoS) provisioning and network stability, which means that, there is need for effective radio resource management (RRM). In RRM, call admission control (CAC) is one of the most integral components in which the fundamental purpose is to judge whether radio resource can be assigned to the incoming channel request, according to the current situation of the system. Recently, CAC has been extensively researched and in order to implement the increasing demand for wireless access to packet-based services, and to make network roaming more seamless, there are a variety of CAC algorithms being proposed [1,3,8,16,17].
Artificial Intelligence (AI) is a technique that is known to be applied to solve complex and ill-defined problems in which soft computing, computational intelligence and granular computing form some of the major offshoot [12]. The machine learning feature of Support Vector Machine Using Cellular Network KPI Data (SVM) which handles predictive modeling through statistical technique and analyzing historical data was developed to compare various kernel variants on telecommunication data [4].
The organization of the rest of the research work is as follows. Module 2 gives literature review of the related work. Modules 3 and 4 present a general overview of SVM and the underlying techniques involved and the data preprocessing techniques. Module 5 describes the methodology with the evaluation framework. Module 6 presents the comparative results and discussions and conclusion is given in Module 7

Related Work
One of the most important operations of machine language (ML) is classification. Classification using SVM can be done either through the quadratic programming (QP) or through the sequential minimal optimization (SMO). Since, SVM is a feed-forward neural network [4], the classifications are categorized into global and local in which the former is Multi-Layer Perception (MLP) and the latter is SVM with various kernel functions.
A research titled a comparative study of SVM kernel functions based on polynomial coefficients and V-transform coefficients was authored [14]. It was observed that Polynomial kernel, Pearson VII kernel function and RBF kernel show similar performance on mapping the relation between the input and the output data in which Pearson VII kernel function gives better classification accuracy compared to others.
Therefore, the performance of the RBF flavours of SVM model could be enhanced if the process of picking up and C is carried out using new optimization techniques, where is the multiplier and the coefficient C, the penalty factor, tends to affect the trade-off between complexity and proportion of non-separable samples that can be viewed as the reciprocal of a parameter usually referred to as the 'regularization parameter' [4,5].
The use of various kernel functions in SVM results in different performance [10] was employed where kernel functions of polynomial kernel function and Gaussian radial basis function to conduct research in which the performance of classification was analyzed using overall confusion matrix, sensitivity, classification accuracy, and specificity measures.
A hybrid technique which makes use of a combination of the two or more soft computing paradigms is an interesting prospect which operates synergistically rather than competitively to have a mutual dependence will produce unexpected performance improvement. Simulated Annealing (SA) technique was developed by [9] for parameter determination and feature selection in SVM, in which the experiments resulted to good performance and jointly reported by [18].

Support Vector Machines (SVMs)
Support Vector Machines (SVMs), is a machine-learning algorithm that is perhaps the most elegant of all kernellearning methods [4], was first introduced in early 90s by Vapnik as a supervised learning system based on statistical theory that gained wider acceptance in the academic and industrial communities. SVM uses linear models to implement non-linear class boundaries and found great applications in a wide variety of fields [2]. A SVM modeled to tackle regression problem is known as Support Vector Regression (SVR), which has tremendous growth in control predictive and optimization systems face detection, bioinformatics and communication systems, Protein Fold and Remote Homology Detection, Handwriting Recognition, Geo and Environmental Sciences amongst others [6,13,19].
In SVM regression, the input is first mapped onto an mdimensional feature space using some fixed (nonlinear) mapping, and then a linear model is constructed in this feature space. Using mathematical notation, the linear model (in the feature space) , is given by equation (1).
where , 1, … , denotes a set of nonlinear transformations, b and are the bias and weights. The quality of estimation is measured by the loss function as shown by equation (2) , , SVM regression uses a new type of loss function called ! " loss function proposed by Vapnik as depicted in equation The empirical risk is shown in equation (4) + , - SVM regression performs linear regression in the highdimension feature space using -insensitive loss and, at the same time, tries to reduce model complexity by minimizing ‖ ‖ 1 . This can be described by introducing (non-negative) slack variables 2 / 2 / * , 1, … , to measure the deviation of training samples outside ! " zone. Thus, SVM regression is formulated as minimization of the following functional as illustrated in equation (5)  This optimization problem can be transformed into the dual problem and its solution is given by equation (6) ∑ 7 / 7 / * . 89 / ):( / , ) where ;< is the number of Support Vectors (SVs) and the kernel function shown in equation (7) : The SVM algorithms use a set of mathematical functions that are defined as the kernel which takes data as input and transform it into the required form of different functions such as linear spline, nonlinear, polynomial, radial basis function (RBF), and sigmoid, ANOVA radial basis kernel, Hyperbolic tangent kernel, Laplace RBF, Polynomial and Gaussian kernel. In this study, six kernels were used, i.e. the Radial Basis Function (RBF), Gaussian Radial Basis Function (GRBF), Exponential Radial Basis Function (ERBF) kernel, Linear Kernel (LK) Multilayer perceptron (MLP) and Polynomial kernel

Gaussian Kernel
It is general-purpose kernel; used when there is no prior knowledge about the data and is shown in equation (8).
where ‖ − ‖ 1 is the squared Euclidean distance and F provides a good fit or an over fit to the data

Gaussian Radial Basis Function (GRBF)
Also, a general-purpose kernel; used when there is no prior knowledge about the data is equation (9), where = 1 2F 1 ⁄ , then the term ‖ − / ‖ is the Eucldean distance from the set of point /

Polynomial Kernel
This is a kind of function that is usually used with SVM and other kernelized models that represents the similarity vector. Kernelization is a technique or method used to design efficient algorithm in other to achieve their efficiency by preprocessing state, where the inputs to the algorithm are replaced by a smaller input called kernel. The Polynomial kernel is popularly in use in image processing and can be depicted by equation (10) =( , / ) = (F J / + K) L where d is the degree of the polynomial, , / are the vectors in the input space and c is a constant.
When c = 0, then the kernel is called homogeneous while if c ≥ 0, then it is a free parameter trading off the influence of higher-order versus lower-order terms.

Linear Kernel
It is useful when dealing with large sparse data vectors and text categorization and it can be represented mathematically in Equation (11).

Exponential Radial Basis Function (ERBF) Kernel
In ERBF, the outputs function as the Euclidean distance between vectors as a measure of similarity instead of the angle between them.

Radial Basis Function (RBF) Kernel
The RBF kernel is more popular in SVM classification than the polynomial kernel. It is popularly used in Natural Language Processing (NLP) and computer vision [20,21]. RBF kernel represents the similarity as a decaying function of the distance between the vectors. The RBF is defined as in equation 12 where is a parameter that sets the 'spread' of the kernel

Data Pre-Processing
In this work, data pre-processing plays a vital role in data mining process especially when most data gathering methods are lightly controlled, resulting in outliers, missing values and impossible data combinations etc. The merits of data preprocessing as observed include data normalization, noise reduction, size reduction of the input space, smoother relationships and feature extraction. Data Normalization such as Zero-Means, Min-Max, AbsMax, Sigmoid, FullMinMax, PCA and Z-Score normalization preprocessing techniques. Data preparation and filtering methods can take considerable amount of processing time but once pre-processing is done the data become more reliable and robust results are achieved [7]. The Z-score and Sigmoidal were employed in this work.

Sigmoidal Normalization
Sigmoidal normalization is a nonlinear transformation. It transforms the input data into the range -1 to 1, using a sigmoid function. First and foremost, the mean and standard deviation of the input data should be calculated. Sigmoidal normalization is expressed in Equation (13).

Z-score Normalization
In this technique, the input variable data is converted into zero mean and unit variance which means the mean and standard deviation of the input data should be calculated first. The algorithm is expressed in Equation (14) .,V = B^_`Aa ;bL (14) .,V is the new value, M and std are the mean and standard deviation of the original data range and Z[L is the original value.

Methodology
This involves collection and selection of data over a period of 12 to 24 months from a Mobile Switching Centre of an established cellular network in Nigeria to be fed into the system. The data used consist of key performance indicator Since, the performance of various kernels of SVM is highly influenced by the size of the datasets and the datapreprocessing techniques employed for training and optimization, the work employed data normalization process of Z-score at the input function and Sigmoid for output function techniques to encourage high quality, reliable, accuracy and less computational cost associated to the learning phase.
In this study, six kernels were introduced to obtain higher classification accuracy by varying the training ratio. The six kernels in consideration are Radial Basis Function (RBF), Gaussian Radial Basis Function (GRBF), Exponential Radial Basis Function (ERBF) kernel, Multilayer Perception (MLP), Linear Kernel (LK) and Polynomial kernel. In this module, we attempt to evaluate the performances of the proposed SVM model on six different kernels by considering data sample of 506 x 9 matrices. It was demonstrated to select and compare the effect of how different kernel functions contribute to the solution of the proposed model. In this scenario, the Epsilon ( ) takes charge on the control of the width of -insensitive zone, that is used to fit the training sample. Hence or otherwise, the bigger Epsilon ( ), the fewer support vectors (SVs) are selected. On the other hand, bigger -values result in more 'flat' estimates. Hence, both C and -values affect model complexity in different ways.

Results and Discussions
The  The result of the first trial with configuration of training ratio 60-20-20 was shown in Table 1 and the corresponding graphical performances of the six different kernels are depicted in Figure 1 to 6. It was deducted that, fair performance was recorded on RBK while ERBK and GRBK outperform MLP, Linear and Polynomial which have performed poorly. The computational time and convergence speeds are relatively high except for that of MLP and linear kernels.

Training Ratio 40-30-30
The result of the second trial with configuration of training ratio 40-30-30 was shown in Table 2 and the corresponding graphical performances of the six different kernels are depicted in Figure 7 to 12. It was deducted that, fair performance was recorded on RBK while ERBK and GRBK outperform MLP, Linear and Polynomial which had performed poorly. The computational time and convergence speeds are moderately low except for that of MLP and linear kernels where they were quicker and faster.

Training Ratio 20-40-40
The result of the third trial with configuration of training ratio 20-40-40 was shown in Table 3 and the corresponding graphical performances of the six different kernels are depicted in Figure 13 to 18. It was deducted that, fair and best performance was recorded on RBK and Linear kernel respectively while ERBK and GRBK outperform MLP and Polynomial which have performed poorly. The computational time and convergence speeds are very high and encouraging       In general, the findings brought about practical recommendations for setting SVM regression parameters which were implemented and underscore the importance of selecting appropriate value of − ! " zone parameter on the generalization performance. It is worth noting that SVM generalization performance (estimation accuracy) depends wholly on a good setting of metaparameters. The constant 4 that determines the trade-off between the model complexity (flatness) and the degree to which deviations larger than Epsilon ( ) are tolerated in optimization formulation. As the comparisons were made between the different kernel functions for the varied kernel training ratio and fixed kernel parameters, it is deduced that ERBF and GRBF give best classification with good specificity and sensitivity values.

Conclusions
The proposed parameter selection in this study yields good generalization performance of SVM estimates under different kernel functions and sample sizes of data. In totality, this study impacts to the body knowledge of call algorithm control by examining potential application of SVMs with six kernel functions. The comparative study on SVM kernels was evaluated appropriately and accordingly. From the study, the SVM model with GRBF and ERBF kernels are the best suited for call algorithm data at hand in terms of best specificity and sensitivity values, followed by the RBF kernel. Also, the research further indicates that MLP, polynomial and linear kernels have worse performance.