Comparing Two Classical Methods of Detecting Multicollinearity in Financial and Economic Time Series Data

: Multicollinearity is an unavoidable problem being faced by researchers in financial and Economic data. It refers to a situation where the degrees of correlations between two or more independent variables are high. This is to say, one explanatory variable can be used in forecasting the other variable. This creates redundant information in a series under study, skewing the results in regression models. There is need to search for the source of the problem and proffering solution to this problem in Economics and Financial data. The data used was extracted from the record of Federal trade commission (FTC), 2019. The commission usually ranks annually arrays of locally made cigarettes in relation to Tar, nicotine and carbon monoxide components that was made available. Farrah-Glauber test and variance inflation factor were used as methods of detection multicollinearity in this paper. SPSS and J-muliti packages were used to analyse the data collected for empirical illustration. The results of analysis indicated that variance inflation factor of X 1 and X 2 (Tar and Nicotine) are far above 10 (21.63 and 21.90) must be removed or collapsed from the model in order to correct multicollinearity. So, the preciseness of VIF made it to be preferred to Farrah-Glauber test. In line with the analysis, the use of Variance Inflation Factor is more preferred to Farrah-Glauber method. As VIF not only detected but also pointed to the direction of the problem.


Introduction
Multicollinearity refers to the circumstances where two or more independent variables in a statistical model are linearly related they are sometimes called collinearity: [1]. It is an important economic problem that has received several attentions globally but unfortunately the problem of resolving it has not yielded desire result. Of recent authors like [18,15,8,2,10,11] researched into this econometric problem and established the danger the problem posed to the forecast ability of regression models. It is also regarded as economic problem that can lead to poor judgmental error and lead to poor economic policy formulation in financial time series the error is assumed to be independent and identically distributed whereas in the real-life situation most of the time is not so.
Multicollinearity among predictor variables has been attended to severally in econometric theory and in econometric texts (for examples., [6,7,19]. [9] Determines how collinearity upshots parameter coefficient instability in a measurement error situation. Many statistical models, notably those that are commonly use in ecology, finance, marine and Economics are liable to collinearity [3,4,17]. This occurs when too many variables have been pulled together in the model and a number of them measure similar phenomena. The existence of multicollinearity in a variable under study affects both the estimation of the parameters of the model and also gives rise to wrong interpretation of the results. Regression parameters estimates so obtained are compromised and may lead to instability, the estimated errors are extremely stretched and as a result inferences made based on these statistics are biased and lead to wrong policy formulation. However, for the models that are not robust enough two problems are bound to happen under multicollinearity: any effects arising in the variable cannot be put apart variable effects cannot be separated and extrapolation or out of sample forecast is likely to be seriously erroneous and give a very wrong judgmental decision (s) [12].
Most introductory textbooks on statistics recognized multicollinearity as a problem principally associated with finance and Economics data. It is regarded as a situation where the model is not identified. As terrible as it is, several approaches for investigating it and working with it have been mapped out. Regardless of the peculiarities of the problem and the several available methods of solving them, most ecological, finance and Economics research have not made efforts to address this ubiquitous problem of multicollinearity [5,16]: Non-addressing of these problem are directly linked to a very erroneous belief that statistical methods are not affected by multicollinear problems, ambiguity that surrounded the method to use couple with incompatible of a method in relation to the available data to be analysed, inability to interpret the results as a result of usage of approaches that incorporate variables or software that cannot be accessed. This problem is not only limited to ecology, finance and Economics [10,13,14].
The central objective of this paper is to provide a better perception of multicollinearity and to compare two methods (Farah-Glauber test and variance inflation method) of detecting its presence and determine the better one.

The Farrar and Glauber Test
This is a test to determine the presence as well as the degree of Multicollinearity in an equation. To achieve this objective a matrix of pair wise correlation coefficients is formed from the explanatory variables.
This test is performed in three stages i. Chi-square test to determine or ascertain the existence and degree of multicollinearity.
ii. F -test to locate the variable (s) that are intercorrelated, provided the test appeared positive.
iii. t -test is use to determine the variable (s) that is (are) causing the multicollinearity problem provided the F -test is positive.

Chi-Square Test
where K is the number of explanatory variables present in the series. S

− F Test
If the Chi-square test confirmed the presence of Multicollinearity, we therefore, have no choice than to proceed to F − test using the following steps: i. List out the i x considered to be inter-correlated with other xs as a function of xs . Therefore, Using data, we can  x .

Variance Inflation Factor (VIF)
The aftermath of the multicollinearity is the rise in variance inflation factor. For the jth independent variable, the Variance Inflation Factor is given as

Specification and Analysis of Data Used
The data used for this study was obtained from The Federal Trade commission (FTC), 2018, annually ranks varieties of domestic cigarettes according to their tar, nicotine and carbon monoxide contents.  Table 1 describes reveal hidden statistics about the data used for the study, such statistics include, the mean, variance standard error kurtosis. Skewness and so on just to mention the few. The importance of all this information is to enrich would be policy makers, investors and academia on the associated properties of the data used. For example, Tar and carbon could be both regarded as being approximately normal as their Kurtosis is less than 3, Nicotine is nonnormal (Kurtosis greater than 3) as it possesses heavier tails compared to normal distribution. The skewness analysis show that the data are moderately skewed. .151 a. The underlying process assumed is independence (white noise). b. Based on the asymptotic chi-square approximation. Critical evaluation of the features exhibited by both ACF and PACF reveal that they contain 16 lags each, they slowly reduced exponentially. these features could be linked to the presence of multicollinearity or long memory in the series under study.

F − Test
Next is to carry out F − test to determine variable (s) causing multicollinearity ( ) The values obtained from the analysis for 2 3 and are β β The F − computed is 206.52 and the tabulated value of 0.05 2, 22 4.30 F = Since F − computed is greater than F − tabulated we reject 0 H and conclude that 1 X is inter-correlated with 2 3 and X X .

t Test −
The following hypothesis were set up Since computed value of T is greater than table value of T , we can conclude that 2 3 and X X are responsible for multicollinearity.

Variance Inflation Factor
The following values were obtained for VIF based on the computer analysis.  Existence of multicollinearity was established as shown in the above table 1, the next thing is how to correct it. To correct the existence of multicollinearity in the study, variable 3, 2 X (Nicotine) was removed and the new VIF checked.  Table 5 showing the values for 0 , β 1 β and 3 β with its VIF value. From VIF value for 1 3 and X X the values obtained indicated that multicollinearity has vanished as none of VIF is up to 10.  Table 6 above pooled together and compared the results obtained before and after variable 2 X was excluded coupled with parameters estimate and standard errors. It is glaring that after the exclusion of the variable 2 X the model becomes multi-collinearity free which fulfills the mission of the study.

Summary and Conclusion
So far, so good the study examines the descriptive nature of the series. Both the ACF and PACF decay exponentially establishing the fact that the series contain element of multicollinearity or long memory. Farrar-Glauber and variance information confirm the existence of multicolllinearity. Having established this variable 2 X was excluded and test re-conducted which after the analysis indicated that the multicollinearity earlier noticed had disappeared the preciseness of VIF made it to be preferred to Farrah-Glauber test. In line with the above assertion the use of Variance Inflation Factor is more preferred to Farrah-Glauber method. As VIF not only detected but also pointed to the direction of the problem.