Selection of Stocks on the Ghana Stock Exchange Using Principal Component Analysis
Abonongo John*, Oduro F. T., Ackora-Prah J.
College of Science, Department of Mathematics, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana
To cite this article:
Abonongo John, Oduro F. T., Ackora-Prah J. Selection of Stocks on the Ghana Stock Exchange Using Principal Component Analysis. International Journal of Theoretical and Applied Mathematics. Vol. 2, No. 2, 2016, pp. 100-109. doi: 10.11648/j.ijtam.20160202.21
Received: July 19, 2016; Accepted: September 12, 2016; Published: December 10, 2016
Abstract: A major problem in stock selection is the use of the right procedure(s) in identifying the best stock(s). The principal component analysis was employed as a data reduction technique in selecting stock(s) that characterize each sector on the Ghana Stock Exchange. The results indicated that, among the 9 stocks in the Finance sector, only 3 stocks (CAL, ETI, and GCB) were able to characterize the sector. The Distribution sector had 2 stocks (PBC and TOTAL) among the 4 stocks characterizing the sector. The Food and Beverage sector had only FML characterizing the sector out of the 3 stocks. Also, the information Technology had CLYD characterizing the sector out of the 2 stocks. The Insurance sector had EGL characterizing the sector out of the 2 stocks. The Manufacturing sector had only 2 stocks (PZC and UNIL) characterizing the sector out of the 10 stocks and for the Mining sector, 2 stocks (TLW and AGA) among the 4 stocks were the best. In effect, the 34 stocks considered from the Ghana Stock Exchange were reduced to 12 stocks (CAL, ETI, GCB, PBC, TOTAL, FML, CLYD, EGL, PZC, UNIL, TLW and AGA). The results also indicated that the selected stocks were able to explain much of the variance in their respective sectors compared to the rest of the stocks in that same sector and thus could be considered for further analysis and probably investment.
Keywords: Principal Component Analysis, Stock Selection, Screen Plot, Uncertainty
Investing on the stock market is poised with high risks and high gains, hence, it attracts a great number of investors. Also, as far as information regarding stocks is concerned, it is often complex and has a lot of uncertainty, making it difficult to select attractive stocks. Even though the selection of attractive stocks is not easy for investors, Principle Component Analysis (PCA) can guide an investor in telling attractive stocks from unattractive ones. The PCA is more suitable in studying the covariance structure of a vector time series. It is appropriate when one have obtained measures on a number of observed variables and wish to develop a smaller number of artificial variables that will account for most of the variance in the observed variables; a variable reduction procedure.
Principal Component Analysis technique has been extensively used in many studies in (e.g., ) described the joint structure with a model that can potentially be used for scenario estimation and analysis of the risk of interest rate-sensitive portfolios. Three variations of the principal component analysis technique to decompose global interest rate and yield curve implied volatility structure were examined, highlighting that global yield curve structure can be explained with 15 to 20 factors, whereas implied volatility structure needs at least 20 global factors, furthermore in (e.g., ) also used principal component analysis in the granting of loan. The result showed that the utility of principal component analysis in the banking sector to decrease the size of data, without much loss of information in (e.g., ) performed a selection of optimal SNP sets that capture intragenic genetic variation. Their results revealed that principal component analysis may be a strong tool for establishing an optimal SNP set that maximizes the amount of genetic variation captured for a candidate gene using a minimal number of SNP set in (e.g., ) used the principal component analysis in investigating the structure of light curves of RRabstar. They concluded that the principal component analysis was an effective way to account for many aspects of RRab.
Again, the decomposition of interrelated variables into uncorrelated components makes it convenient to use in analyzing the complex structure of financial markets. It has been applied to the study of market cross-correlation and systemic risk measurement (e.g., ) and to produce market indices in (e.g., . ), also used principal component analysis technique reducing from the 19 stocks to 9 stocks for Nigerian stock exchange. The main task of feature extraction is to select or combine the features that preserve most of the information and remove the redundant components in order to improve the efficiency of the subsequent classifiers without degrading their performances. The result exhibited principal component analysis merit of quantifying the essentials of each dimension for describing the variability of a data set. In (e.g., ) further supported the use of principle component analysis for the identification of the most essential factors and in the process, considerably reducing the number of input variables to an efficient and sufficient sets (e.g., ), also applied principle component analysis on daily frequency observations on stock market indexes, long term and short term rate and interest rate spot exchange for nine countries (e.g., ) also showed that principal components analysis may be used to reduce the effective dimensionality of the scenario specification problem in several cases in (e.g., ) applied principle component analysis to the Korean composite stock price index (KOSPI) and the Hangseng Index (HIS) to reduce the data points into two components and observed that the co movement stocks clusters.
Moreover, the eigenvalue one criterion (e.g., ) is an approach for retaining and interpreting any component with an eigenvalue greater than one (1). That is, each observed variable contributes one unit of variance to the total variance in the data set. Hence any component that shows an eigenvalue greater than one (1) is accounting for a greater amount of variance than the rest of variables and components with eigenvalue less than one (1) is accounting for less variance than had been contributed by one variable. This criterion is very useful for its ability to always retain the correct number of components especially when a small number of variables are being analyzed and the variables communalities are high (e.g., ) investigated the accuracy of the eigenvalue one criterion and recommended its use when less than 30 variables are being considered and communalities are greater than 70 or when the analysis is based on over 250 observations and the mean communality is greater than or equal to 60. Again, the components can be selected using the screen test, with the screen test, the eigenvalues are plotted with their associated components. The breaks between the components that appear before the break are assumed to be meaningful and are retained. Also those appearing after the break are assumed to be trivial.
The purpose of this paper is to apply the principal component analysis in selecting attractive stocks from seven sectors on the Ghana Stock Exchange. This is to provide investors with a simple technique in selecting winning stocks for investments.
2. Materials and Methods
2.1. Source of Data and Methods of Data Analysis
This paper used secondary data of 34 stocks from the Ghana Stock Exchange (GSE) and Annual Report Ghana databases comprising the daily closing prices from the period 02/01/2004 to 16/01/2015.
The daily index series were converted into compound returns given by;
Where is the continuous compound return at time , is the current closing stock price index at time and is the previous closing stock price index.
2.2. Stationarity Test: PP and KPSS Tests
This paper employed two quantitative unit root tests namely; the Phillip-Perron (PP) unit root test and the Kwiatkowski, Phillips, Schmidt and Shin (KPSS) test in other to establish the existence or non-existence of unit root in the time series under study so as to be able to ascertain the nature of the process that produces the time series.
The KPSS test was used to test the null hypothesis that the data generating process is stationary, Ho: I(0) against the alternative that it is non-stationary, H1: I(1). It assumes that there is no linear trend term and is given by;
Where is a random walk, ; and is a white noise series. The previous pair of hypothesis is equivalent to;
If is true, the model becomes hence is stationary. The test statistic is given by;
Where is the number of observations, is an estimator of the long-run variance of the process .
The PP statistic test of the hypothesis:
Ho: unit root against
H1: stationary about deterministic trend
Under the Ho of p = 0, the PP test Zp and Zτ statistics have the same asymptotic distributions as the ADF t-statistic and normalized bias statistics. The PP test is categorized into two statistics known as Phillips Zp and Zτ tests given by;
, for , then is a maximum likelihood estimate of the error terms while is the covariance between the error terms j-periods apart for .
, when there exists no autocorrelation between the error terms, for , then .
2.3. Principal Component Analysis
This study employed this method in selecting stock(s) that characterized each sector. It involves a mathematical method that changes a number of correlated variables into a smaller number of uncorrelated variables known as principal components. The first principal component accounts for as much of the variance in the series (data) whereas each succeeding component accounts for as much variance in the series. Also, it is an eigenvector/value based approach employed in dimensionality reduction of multivariate data. It assists in finding patterns in data and expressing the data in a manner that highlights their differences and similarities.
Given a n-dimensional variable x = with covariance matrix , a few linear combinations of xi can explain the structure. If x is the monthly lag return of n assets, then the Principal Component Analysis (PCA) can be used to study the origin of variation of these n asset returns. Also, PCA can be applied to either the covariance matrix orto the correlation matrix of x. The correlation matrix is the covariance matrix of the standardized random vector x∗= S−1x, where S is the diagonal matrix of the standard deviation of the components of x. Using covariance matrix, if where , then
is a linear combination of the random vector x. If x consists of the returns of n stocks, then is the return of a portfolio that assigns weight to thestock. By standardizing the vector , we get . From properties of linear combination, so, random variables:
PCA assists in determining linear combination such that and are uncorrelated for ij and the variances of are as large as possible.
The first principal component of x is the linear combinations such that that maximizes subject to the constraint . The second principal component of x is the linear combination that maximizes subject to the constraints and . The ith principal component of x is the linear combination that maximizes subject to the constraints and . Since the covariance matrix is non-negative definite, it has a spectral decomposition.
Also, if,,) are the eigenvalues and eigenvectors pairs of where . Then, the ith principal component of x is given by
If some eigenvalues are equal, the choice of the corresponding eigenvectors and Xi is not unique. In additionally we have
Thus, the proportion of the total variance in x explained by the ith principal component is simply the ratio between the ith eigenvalue and the sum of all eigenvalues of . Since , the proportion of variance explained by ithprincipal component becomes when the correlation matrix is used to perform the PCA. The results of the PCA is that a zero eigenvalue of or , indicates the existence of an exact linear relationship between the components of x. If the smallest eigenvalue , then . Hence, is a constant and there are only k − 1 random quantities in x, therefore the dimension of x can be reduced.
3. Results and Discussion
3.1. Descriptive Statistics
From Table 1, it is evident that, the Finance sector had seven of the mean returns found to be positive, ranging from 0.0006 to 0.0022 and two of the mean returns were found to be negative (-0.0006 to -0.0003). Volatility (standard deviation) was high in ETI (0.0646) with the least found in HCF (0.0124). The highest and least mean returns were found in ETI and TBL respectively. The variability between risk and returns as a measure of the coefficient of variation (CV%) ranges from -7144.1700 (SOGEGH) to 7749.5900 (ETI). Also five mean returns were positively skewed (4.6600 to 28.3400) and the rest four negatively skewed (-20.8100 to -0.1700) and the kurtosis was high ranging 108.5460 to 850.2200. The Distribution sector had three of its mean returns strictly positive (0.0001 to 0.0017) with the exception of PBC (-0.0019). MLC and PBC had the highest and least mean returns respectively. The sector had high volatility in MLC (0.0582) with the least found in GOIL (0.0210). Also the sector exhibited variability ranging 230.6400 (PBC) to 16906.9400 (GOIL). Two mean returns were positively skewed (1.9100 to 9.4500) and the other two negatively skewed (-13.9600 to -1.0800). The kurtosis was high ranging from 132.8100 to 363.0600.
The Food and Beverage sector has two positive mean returns, ranging from 0.0008 to 0.0012 with the exception of CPC (-0.0005). FML and CPC had the highest and least mean returns respectively. The sector exhibited high volatility in CPC (0.0458) whereas GGBL (0.0155) exhibited low volatility. The CV% ranged from -14476.9400 (CPC) to 1953.0700 (GGBL). Also two out of the three mean returns were negatively skewed (-3.5600 to -0.0300) and the kurtosis was high ranging from 11.0700 to 71.0900. The Information Communication Technology sector has the two negative mean returns, ranging from -0.0002 to -0.0001. The sector recorded a higher volatility in TRANSOL (0.0352) and low volatility in CLYD (0.0260). The sector had CV% ranging from -95087.0400 (TRANSOL) to -24856.2300 (CLYD). Also this sector has all the two mean returns positively skewed. The kurtosis ranged from 32.8200 to 79.8900. Also, the Insurance sector has its two mean returns positive (0.0002 to 0.0010). Volatility was high in EGL (0.0380) than SIC (0.0304). The sector had CV% ranging 3159.3300 (SIC) to 16299.1900 (EGL). The sector exhibited negative skewness in EGL (-16.8800) and positive skewness in SIC (24.3700). Also the sector had kurtosis ranging from 347.1800 to 692.5800.
The ten stocks in the Manufacturing sector had five positive mean returns, ranging from 0.0001 to 0.0009 and five negative mean returns, ranging from -0.0014 to -0.0001 with the highest mean returns found in UNIL and least mean returns found in ALW. Volatility was high in ALW (0.0445) compared to PKL (0.0038). The sector was found to have CV% ranging from -51723.4200 (SPL) to 65846.8300 (PZC). Out of the ten stocks, six were positively skewed ranging 0.2800 to 6.9500 whereas the remaining four were negatively skewed ranging from -15.2300 to -0.5800. The kurtosis was ranging from 35.9900 to 390.0100. The Mining sector had all the stocks recording positive mean returns, ranging 0.0011 to 0.0018. Volatility was high in GRS (0.0609) and low in ADDs (0.0263). The sector had coefficient of variation (CV) ranging from 2441.7100 (AADs) to 3341.0500 (GSR). The skewness was all positive ranging from 28.7400 to 29.800. This sector had kurtosis ranging from 866.000 to 909.04000.
Furthermore, the highest mean returns for the period under study was found in EBG (0.0022) and the least mean returns found in PBC (-0.0019). Also 24 of the stocks exhibited positive mean returns whereas 11 exhibited negative mean returns over the sample period. It is also evident that, over the sample period, volatility was high in ETI (0.0646) from the Finance sector and lower in PKL (0.0038) from the Manufacturing sector. The coefficient of variation for the entire sample period was high in PZC (65846.8300) and low in TRANSOL (-95087.0400), i.e. from the Manufacturing sector and Information Communication Technology sector respectively. The Manufacturing sector have six of the mean returns positively skewed (0.2800 to 6.9500) and four negatively skewed (-15.2300 to 0.5800).
Out of the 35 stocks, 22 had their mean returns positively skewed as against 13 stocks having their mean returns negatively skewed. The excess kurtosis for all the sectors and stocks for that matter were all positive indicating that all the mean returns were more peaked. Also the excess kurtosis for the entire sample period had the mean returns of GSR (909.0400) in the Mining sector more peaked than CPC (11.0700) in Food and Beverage sector.
The results revealed that, investors in the Finance sector saw gains in CAL, EBG, ETI, GCB, HFC, SCB and UTB since their mean returns were positive whereas investors of SOGEGH and TBL recorded losses (negative mean returns). Volatility (standard deviation) was high in ETI, CAL and EBG as indication of their risk levels. There was high probability of gains for investors of CAL, EBG, ETI, GCB and UTB whereas there was high probability of loss for investors of HCF, SCB, SOGEGH and TBL because the two groups recorded positive and negative skewness respectively. The sector was seen to be volatile since all the excess kurtosis were greater than three. The Distribution sector recorded more gains than losses. That is the mean returns of GOIL, MLC and TOTAL were positive whereas that of PBC was negative, an indication of loss for investors. The mean returns of MLC was commensurate with the risk taken by investors since it recorded the highest mean returns and standard deviation in the distribution sector. The skewness of GOIL and TOTAL was negative posing investors of these two stocks to high probability of loss whereas investors of MLC and PBC had high chances of gains (positive skewness). There existed high volatility trends in these stocks. Also, the Food and Beverage sector saw investors of FML and GGBL achieving gains compared to CPC investors who experienced losses during the same period. Investors of CPC were not compensated for assuming risk since they made losses but recorded the highest volatility (standard deviation) in the sector. It was also indicative that investors of CPC had high chances of making losses. Investors GGBL also had high chances of making losses than gains. Investors of FML had high chances of gains than losses it recorded a positive skewness. Investing in this sector was also volatile. Investors in the Information Communication Technology sector saw the two stocks (EGL and SIC) making losses even though the two had high chances of making gains than losses once the skewness were all positive and that investors were compensated for the risk they assumed. The sector was also seen to be volatile. Again, investors in the Insurance sector saw gains but there was high probability for investors of EGL making losses compared with investors of SIC who had high chances of making gains. This sector was also seen to be volatile since all the excess kurtosis was greater than three. The Manufacturing sector had investors of AYRTN, CLMT, PZC, UNIL and SWL making gains as compared to investors of ALW, SPL, PKL, GWEB and ACI who recorded losses in the same period. It is also evident that investors who made losses in this sector were not compensated as their mean returns recorded high standard deviations. Also, there was high probability of gains for investors of AYRTN, CMLT, SPL, UNIL, SWL and ACI even though investors of SPL and ACI recorded losses. The sector even though recorded same losses as gains but there was high chances of making gains than losses as it is indicative of the skewness signs. Lastly, the Mining investors making gains and that the two sectors also saw investors having high chances of gains. The two sectors were all volatile.
Moreover, it was clear that most of the sectors and stocks for that matter recorded much gains than losses for investors since most of them recorded positive of their mean returns. For the entire sample period, most of the stocks had their skewness positive or asymmetric in nature indicating that the upper tail of the distribution of the returns was ticker than the lower tail and that there were more chance of gains than losses. The excess kurtosis for all the stocks were greater than three (3) meaning the underlying distribution of the returns were leptokurtic in nature and heavy tailed and that there was more frequently extremely large deviations from the mean returns than a Gaussian distribution. This confirms that investors have been experiencing high levels of volatility on the GSE.
|Food and Beverage|
3.2. Further Analysis
In testing, for stationarty in the return series using the PP and KPSS tests. All these tests as shown in Table 2 revealed that, for the PP tests, p − values were very significant at 5% significance level and therefore the null hypothesis of non-stationary or unit root was rejected. In the case of the KPSS test, we failed to reject the null hypothesis of stationary since the test was significant at the 5% significance level. Therefore, the returns series were all stationary at the 5% level of significance for all the three tests.
|PP Test||KPSS Test|
|Sector||Test Statistic||P-value||Test Statistic||Critical value (5%)|
|Food and Beverage|
** Significance level: 5%
Figure 1, 2, 3, 4, 5, 6 and 7 show the screen plots of Finance, Distribution, Food and Beverage, Information Communication Technology, Insurance, Manufacturing and Mining sectors respectively. The results show that, for the Finance sector between component 1 and component 2 there exists a large break in eigenvalues whereas small breaks in eigenvalues start from component 3. Therefore the components before the small breaks are retained. This indicates that components 1 and 2 have large eigenvalues compared to the rest of the components. For the Distribution sector, the breaks are all equal but the last break where the eigenvalue levels off is at component 3 hence the eigenvalues before component 3 are retained. Therefore, component 1 and component 2 are retained. The screen plot for the Food and Beverage has a large break between component 1 and component 2 hence they were retained. Again, for the Information Communication Technology and Insurance sectors only component 1 and 2 are retained since the large break is between the two components. Also, for the Manufacturing sector, the large breaks are between component 1 and component 2 and from component 2 to component 3 but the small break in eigenvalues starts at component 3 hence components 1 and 2 are retained. The Mining sector had component 1 and component 2 retained since the large break existed between component 1 and component 2 and also from component 3 the eigenvalue is levelling off.
The principal component analysis was employed in selecting the stocks that characterize each sector. For each sector, the PCA was employed in selecting the components that explains much of the variance in that sector. Also using the Eigen-value-one criterion, component(s) with eigenvalue greater than one (1) were retained. Therefore it is evident from Table 3 that, component 1 and 2 were retained by most of the sectors. The component loadings were set at 0.5 and that variable(s) with loadings greater than 0.5 was/were selected. The Finance sector had ETI, GCB and CAL selected with loadings 0.670, 0.7576 and -0.6696 respectively. The Distribution sector had PBC and TOTAL selected with loadings 0.7391 and 0.6022 respectively. The Food and Beverage sector had FML selected from comp1 with loadings 0.8835. The information communication technology sector had CLYD selected in comp1 with loadings -0.7071. The insurance sector had EGL selected with loadings (0.7071). The manufacturing sector had PZC and UNIL selected in comp1 and comp2 with loadings 0.5932 and 0.6121 respectively. Also, the mining sector had TLW and AGA selected with loadings -0.7076 and 0.7071 respectively. The results also indicates that the selected stocks are able to explain much of the variance in their respective sectors and hence could be considered for further analysis and probably investment.
|Food and Beverage|
* Selected stock under each component.
This paper employed the principal component analysis in selecting attractive stocks on the Ghana Stock Exchange. The results showed that, all the stocks on the exchange were highly volatile but there was higher probability of making gains than losses. The results also indicated that, among the 9 stocks in the Finance sector, only 3 stocks (CAL, ETI and GCB) were able to characterize the sector. The Distribution sector had 2 stocks (PBC and TOTAL) among the 4 stocks characterizing the sector. The Food and Beverage had only FML characterizing the sector. Also, the information Technology CLYD characterizing the sector. The Insurance sector had EGL characterizing the sector out of the 2 stocks. The Manufacturing sector had only 2 stocks (PZC and UNIL) characterizing the sector out of the 10 stocks and for the Mining sector, 2 stocks (TLW and AGA) among of the 4 stocks were the best ones. In effect, the 34 stocks were reduced to 12 stocks. The selected stocks are much better to be considered by investors in the various sectors on the Ghana Stock Exchange for productive investment since they explain much of the variance their respective sectors compared to stocks from the same sector.