Amendments of a Stochastic Restricted Principal Components Regression Estimator in the Linear Model

Principal component Analysis (PCA) is one of the popular methods used to solve the multicollinearity problem. Researchers in 2014 proposed an estimator to solve this problem in the linear model when there were stochastic linear restrictions on the regression coefficients. This estimator was called the stochastic restricted principal components (SRPC) regression estimator. The estimator was constructed by combining the ordinary mixed estimator (OME) and the principal components regression (PCR) estimator. It ignores the number of components (orthogonal matrix Tr) that the researchers choose to solve the multicollinearity problem in the data matrix (X). This paper proposed four different methods (Lagrange function, the same technique, the constrained principal component model, and substitute in model) to modify the (SRPC) estimator to be used in case of multicollinearity. Finally, a numerical example, an application, and simulation study have been introduced to illustrate the performance of the proposed estimator.


Introduction
According to the Gauss Markov theorem, the linear regression model (LM) take the form: .
. [1] (1) where is an n × 1 vector of responses, is an n × p observed matrix of the variables, assumed to have full rank, i.e., rank ( ) = , is a × 1 vector of unknown parameters, and is an n × 1 vector of error terms assumed to be multivariate normally distributed with mean 0 and variance covariance . It is known that the ordinary least squares (OLS) estimator of is: is normally distributed , / . The standard regression model assumes that the column vectors in are linearly independent. The restricted model for can be written as where is an x matrix , and is x 1 vector of restrictions. The restricted estimator using Lagrange function was derived as follows: Researchers in 1961 used the next method to get the Ordinary Mixed Estimator (OME) for the least squares method, where they combined between the LM and the restricted model as follows [2]: where: ) *& * ' / * / + & 0 0 , ' , i.e. , is the variance of the error term that was found in the restricted model (-./ ( * )), where V assumed to be known and positive definite (pd) matrix. The (OME) for the least square is given by equations (5) and (6) which were equivalent as follows: 01 / + / , ) ( / + / , ) (5) = + ( / ) / (, + ( / ) / ) ( − ) (6) The expectation of the OME was )2 01 3 = , the variance was given by -./2 01 3 = ( / + / , ) [2], and the matrix mean square error MMSE2 01 3 = ( / + / , ) . From (3) and (6), the equations indicate that 01 = adding V to the term ( / ) / Section two presented another view of the SRPC, while section three introduced four different methods for computing the SRPC estimator. Finally, the last section introduced a numerical example to show the difference between the old method that introduced by previous papers [3], and the new method that was proposed in this paper.

Another View of the C D EFGH Estimator
As indicated in 2014, the SRPC calculated the OME for principal component [3]. Unlike the estimator introduced by researchers in 1961 that calculated the OME for the least squares method [2]. They used (7) and (8) to derive their estimator: where I J = (K , K , … , K J ) represents the remaining columns of the orthogonal matrix I = 2K , K , … , K 3 after having deleted the last − / columns, where 0 ≤ / ≤ . In a study [3] researchers assumed that all changes led to β in case of using principal component analysis. They used the equation: M = I J I J / in their analysis, the summarizing of their study was as follows: where the expectation for their estimator was:  [4]. This paper tries to solve these problems by introducing the same estimator using the principal component model, where the next sections indicate four different methods of the proposed estimator.

The Proposed Estimator
The author don't agree with researchers in 2014 at equations (9) and (10)

The First Method
According to previous studies, in case of multicollinearity problem, the researchers used another forms to estimate the parameters like principal component regression PCR, where this problem occurs when the predictors included in the linear model are highly correlated with each other. In this case the matrix / tends to be singular hence, identifying the least squares estimators will face numerical problems [6].
The researchers used the orthogonal matrix I in the GLM to get the PCR estimator for as follows: A spectral decomposition of the matrix X / X was given by: where where (ab cd) refers to the modified stochastic restricted principal components.

The Second Method
Researchers in 2009 said that M = I J (Λ U ) I J / / , where Λ U = I J / / I J [6]. Following previous studies [2], if changed to M , 0 NM become as follows:

The Third Method
According to previous studies, Constrained Principal Component Analysis is a method for structural analysis of multivariate data that combine features of regression analysis with principal component analysis. In this method, the original data are first decomposed into several components according to external information. The components are then subjected to principal component analysis to explore structures within the components [8].
The constrained principal component model is: where i is an N × n matrix of responses, k and l are observed matrices of the variables, assumed to have full rank, Regression Estimator in the Linear Model a, g, and d are matrices of unknown parameters, and m is an N × n matrix of error terms assumed to be multivariate normally distributed with mean 0 and variance covariance . Researchers in a study [9] estimated the unknown matrices of parameter as: a D = (k / nk) k / ni l(l / l) (24) g o = n np q/r i l(l / l) p q/r = − c q/r , c q/r = k(k / nk) k / n p s/ = − c s/ , c s/ = l(l / l) l / n, a symmetric nonnegative definite (nnd) matrix of order N denote the rows Metric matrix, and , a symmetric nnd matrix of order n, to denote the columns metric matrix. If n and/or are positive-semidefinite (psd) but not pd, the conditions: rank (n G) = rank (G), and rank ( H) = rank (H) has been required. These conditions were essential for the projection matrices [9]). When n = and = . Putting the estimates of a, g, d, .tu m above in model (27) yields the following decomposition of the data matrix: i = c q/r ic / s/ + n np q/r ic / s/ + c q/r ip / s/ + (i − c q/r ic / s/ − n np q/r ic / s/ − c q/r ip / s/  The previous table indicate that all independent variables have high correlations with the dependent variable. There are found a high correlations between the independent variables, this means that the multicollinearity problem has been found. To solve this problem (PCA) technique will be used. An addition information will be used to tell us that the summation of the profit for each type of the air conditioner is 1500 pound. The previous table shows the parameter coefficients and the bias for the least square estimator ( ), the ordinary mixed estimator ( 01 ), the principal component estimator ( M ), the stochastic restricted principal component ( NM ) [3]. The modified stochastic restricted principal component which introduced in this paper ( 0 NM ). And the principal component estimator in case of p = r ( M (p = r)), the ( NM ) in case of p = r, and the ( 0 NM ) in case of p = r. The numbers between brackets represent the bias. The total mean square error (TMSE) criteria used to compare between the SRPC and the MSRPC. It represent the summation of the MSE for each estimator. It was 148.0862 for the SRPC and 147.9901 for the MSRPC. This is means that the new estimator (MSRPC) is better than the old estimator (SRPC). A simulation study have done to assess this result.
The previous results show that both estimators the old ( NM ) and the new ( 0 NM ) are equivalent with the OLS estimator in case of r = p, the parameter coefficient, and the bias were the same in each case. A different results in case of r < p was found, where the TMSE in the old estimator was 148.0862 greater than that TMSE 147.9901 in the new estimator. This mean, that the MSRPC estimator is better than the SRPC estimator. Moreover, this same result from the previous section has been got.

Simulation Study
A simulation study with 5000 replications has been done to check the results at different cases. The different cases of the sample size (n) was 10, 20, 30, 50, 100, and 200. The results of the restriction model (k) was 0, 1, and -1. The number of variables in the model (p) was 2, 3, 4, and 5. the number of components (r) were always less than the number of the variables (p). Constant values has been used for R, and the true parameters ( ) where it consists of vector of 1 ._ . Data was distributed multivariate normally with vector of means ‚ = 0 ._ , and variance covariance matrix Ʃ . Ʃ has been chosen according to high correlations between variables (multicollinearity problem), it was:

Summary
The stochastic restricted principal components (SRPC) regression estimator ignoring the number of components (orthogonal matrix I J ) that has been chosen to solve the multicollinearity problem in the data matrix (X). This paper introduced another estimator which uses matrix I J to get more accurate results. The new estimator uses any number of components that have been required. A numerical example and an application were given to illustrate the performance of the proposed estimator. The previous results show that, the TMSE for the old estimator is greater than the TMSE on the new estimators, this is means that there was found more accuracy in making decision when using three components. Both estimators, the old and the new were equivalent with the OLS estimator in case of r = p, regarding the parameters coefficients, bias values, and the MSE. The simulation results indicate the same results that have been got from the numerical example (real data) and the application in many different cases.