The Optimal Estimation of Lasso
Huiyi Xia
Department of Mathematics and Computer Science, Chizhou University, Anhui, China
Email address:
To cite this article:
Huiyi Xia. The Optimal Estimation of Lasso. Science Journal of Applied Mathematics and Statistics. Vol. 3, No. 6, 2015, pp. 293-297. doi: 10.11648/j.sjams.20150306.19
Abstract: The estimation of lasso is important problem of high dimensional data; the optimal estimation’s formula of lasso is unsolved riddle of high dimensional data. In order to solve this problem, we give the structure of lasso estimation by using mathematical method in the orthogonal design. The optimal estimation’s formula of lasso is solved in the orthogonal design, it is pointed out that there is a gradual process of dimension reduction by using method of lasso.
Keywords: Lasso, Estimation, Solution
1. Introduction
Tibshirani (1996) propose a new technique, called lasso. It shrinks coefficients and set others to 0, and hence tries to retain the good features of both ridge regression and subset selection. The lasso estimate has ‘soft threshold’ estimator by Donoho and Johnstone (1994). Fan and Li (2001) propose SCAD that the penalty functions is the smoothly clipped absolute deviation. Knight and Fu (2000) research asymptotic for lasso-type estimation. Efron et al. (2004) propose a new model selection algorithm, called Least Angle Regression (LARS); a simple modification of the LARS algorithm may implement the lasso. Because algorithm of LARS is very fast, making the method of lasso is popular in the world. Zou and Hastie (2005) propose the elastic net, real world data and a simulation study show the elastic net often outperforms the lasso. An algorithm called LARS-EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lasso. Tibshiani et al. (2005) proposed the ‘fused lasso’, the fused lasso penalizes the L_{1}-norm of both the coefficients and their successive differences. The technique is also extended to the ‘hinge’ loss function that underlies the support vector classifier. Wasserman and Roeder (2009) doing variable selection in the high-dimensional models, and consider three screening methods: the lasso, marginal regression, and forward stepwise regression. Zou and Zhang (2009) research the adaptive elastic-net with diverging number of parameters. Austin et al. (2013) study penalized regression and risk prediction in genome-wide association studies by lasso. Wu et al. (2014) proposes the nonnegative-lasso method for variable selection in high dimensional sparse linear models with the nonnegative constraints on the coefficients. This method is an extension of lasso. Bunea et al. (2013) introduce and study the Group Square-Root Lasso (GSRL) method for estimation in high dimensional sparse regression models with group structure. Ahrens and Bhattacharjee (2015) exploit the lasso estimator and mimics two-step least squares to account for endogeneity of the spatial lag.
Related research of lasso is very much, it is inconvenient one by one in this narrative. The optimal estimation of lasso is unsolved riddle of lasso.
For example, if we use the lasso method to select five variables from ten variables, because of the tuning parameter is not unique. How to choose the tuning parameters to get the best estimate of lasso? We refer to the literature and found that this problem has not been solved. After careful deliberation, we solved this problem.
2. Some Definition
Suppose that we have data , where are the responses, and are the predictor variables. We assume the observations are independent, the are standardized so that:
, .
Let, we can loss of generality that, and the tuning parameter.
The lasso estimate is defined by
(1)
Ridge regression estimate is defined by
(2)
Let be the design matrix with ijth entry, and suppose that, denotes the identity matrix. Let be the full least squares estimate.
The solution to equation (1) are easily shown to be
(3)
Where is determined by the condition.
(3) is the soft threshold estimator.
When , the solution to equation (2) are easily shown to be:
When, the solution to equation (2)are easily shown to be
(4)
Figure 1 and Figure 2 provides some insight for the case.
When, the criterion equals the quadratic function. The circular contours of this function are shown by the full curves in Figure.2; they are centered at the OLS estimates , the constraint region is the rotated square. The lasso solution is the first place that the contours touch the square, and this will sometimes occur at a corner, corresponding to a zero coefficient.
3. Some Result
Lemma 1: Let,, , and
(5)
There exist j, make
(6)
(7)
Let is solution of (7), then
(8)
Proof: According to (5), (7) is equivalent to
(9)
We obtain solution of (9):
Theorem 1: Let, we can loss of generality that, and the tuning parameter. Let be the least squares estimate. The lasso estimate is defined by (1). Then, we proved that some coefficients become 0 by method of lasso.
Proof: when, (1) equivalent to (7)
1. when,
(1) When, the coefficient can not become 0 by method of lasso.
(2) When, , .
As figure.1, the line denotes, the point denotes the least squares estimate, the point above the line , the rotated square denotes the constraint region, equivalent to the shortest distance point of from the rotated square to the point .
Obviously is the shortest distance between the point and the rotated square. The point of the rotated square is the nearest point of the point.
We may assume that the coordinates of the point is, the lasso estimate is.
We proved that a coefficient become 0 by method of lasso.
(3) Suppose, , , , .
As shown figure 3, the point denotes the least squares estimate, the point below the line, the rotated square denotes the constraint region, the line denotes.
equivalent to the shortest distance point of from the rotated square to the point.
Obviously, is the shortest distance between the point and the rotated square, the point of the rotated square is the nearest point of the point.
We may assume that the coordinates of the point is. The lasso estimate is. We proved that some coefficients become 0 by method of lasso.
On the other two cases:, ; , . Similarly the two cases can be proved.
Thus, when, we proved that some coefficient become 0 by method of lasso.
2. When
(1) There are equal numbers in, equal number of as one factor; we proved that some coefficients become 0 by method of lasso.
(2)
According to Lemma 1, solution of (1) is
In the case of, we proved that some coefficients become 0 by method of lasso.
(3), we suppose, other parameters are bigger than 0. Let, we consider by symmetry of, we proved that some coefficients become 0 by method of lasso. The other conditions of can be proved similarly. Thus theorem 1 is proven.
Inference 1: Let,, , There exist , make
is solution of (7)
Then, when,
is the optimal estimation of lasso.
Example let, . What is the optimal estimation of lasso?
Proof: the least squares estimate, let, then the lasso estimate:
When, then the optimal estimation of lasso:
When, let, then the lasso estimate:
,,
When, let, then the lasso estimate
,
When, then the optimal estimation of lasso:
When , let, then the lasso estimate
, then the optimal estimation of lasso:
4. Conclusion
The lasso estimate has ‘soft threshold’ estimator by Donoho and Johnstone, We give a new estimate of the lasso estimation, and we obtained the following conclusions with the new estimates and examples:
1. There is a gradual process of dimension reduction by using method of lasso, p variables of lasso can only get rid of one variable using p-dimensional data of lasso, If you want to get rid of the second variables, and you must use p-1 dimensional data of lasso, Present algorithm of lasso must be modified.
2. The calculation formula of the optimal Lasso is found.
3. Making a historic contribution to the computation of high dimensional data.
Acknowledgements
I would like to express my gratitude to all those who helped me during the writing of this paper.
References