Science Journal of Applied Mathematics and Statistics
Volume 3, Issue 6, December 2015, Pages: 293-297

The Optimal Estimation of Lasso

Huiyi Xia

Department of Mathematics and Computer Science, Chizhou University, Anhui, China

Huiyi Xia. The Optimal Estimation of Lasso. Science Journal of Applied Mathematics and Statistics. Vol. 3, No. 6, 2015, pp. 293-297. doi: 10.11648/j.sjams.20150306.19

Abstract: The estimation of lasso is important problem of high dimensional data; the optimal estimation’s formula of lasso is unsolved riddle of high dimensional data. In order to solve this problem, we give the structure of lasso estimation by using mathematical method in the orthogonal design. The optimal estimation’s formula of lasso is solved in the orthogonal design, it is pointed out that there is a gradual process of dimension reduction by using method of lasso.

Keywords: Lasso, Estimation, Solution

Contents

1. Introduction

Tibshirani (1996) propose a new technique, called lasso. It shrinks coefficients and set others to 0, and hence tries to retain the good features of both ridge regression and subset selection. The lasso estimate has ‘soft threshold’ estimator by Donoho and Johnstone (1994). Fan and Li (2001) propose SCAD that the penalty functions is the smoothly clipped absolute deviation. Knight and Fu (2000) research asymptotic for lasso-type estimation. Efron et al. (2004) propose a new model selection algorithm, called Least Angle Regression (LARS); a simple modification of the LARS algorithm may implement the lasso. Because algorithm of LARS is very fast, making the method of lasso is popular in the world. Zou and Hastie (2005) propose the elastic net, real world data and a simulation study show the elastic net often outperforms the lasso. An algorithm called LARS-EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lasso. Tibshiani et al. (2005) proposed the ‘fused lasso’, the fused lasso penalizes the L1-norm of both the coefficients and their successive differences. The technique is also extended to the ‘hinge’ loss function that underlies the support vector classifier. Wasserman and Roeder (2009) doing variable selection in the high-dimensional models, and consider three screening methods: the lasso, marginal regression, and forward stepwise regression. Zou and Zhang (2009) research the adaptive elastic-net with diverging number of parameters. Austin et al. (2013) study penalized regression and risk prediction in genome-wide association studies by lasso. Wu et al. (2014) proposes the nonnegative-lasso method for variable selection in high dimensional sparse linear models with the nonnegative constraints on the coefficients. This method is an extension of lasso. Bunea et al. (2013) introduce and study the Group Square-Root Lasso (GSRL) method for estimation in high dimensional sparse regression models with group structure. Ahrens and Bhattacharjee (2015) exploit the lasso estimator and mimics two-step least squares to account for endogeneity of the spatial lag.

Related research of lasso is very much, it is inconvenient one by one in this narrative. The optimal estimation of lasso is unsolved riddle of lasso.

For example, if we use the lasso method to select five variables from ten variables, because of the tuning parameter is not unique. How to choose the tuning parameters to get the best estimate of lasso? We refer to the literature and found that this problem has not been solved. After careful deliberation, we solved this problem.

2. Some Definition

Suppose that we have data , where  are the responses, and are the predictor variables. We assume the observations are independent, the  are standardized so that:

, .

Let, we can loss of generality that, and the tuning parameter.

The lasso estimate  is defined by

(1)

Ridge regression estimate  is defined by

(2)

Let be the design matrix with ijth entry, and suppose that,  denotes the identity matrix. Let  be the full least squares estimate.

The solution to equation (1) are easily shown to be

(3)

Where  is determined by the condition.

(3) is the soft threshold estimator.

Figure 1. The picture 1 of lasso.

Figure 2. The picture of ridge regression.

When , the solution to equation (2) are easily shown to be:

When, the solution to equation (2)are easily shown to be

(4)

Figure 1 and Figure 2 provides some insight for the case.

When, the criterion  equals the quadratic function. The circular contours of this function are shown by the full curves in Figure.2; they are centered at the OLS estimates , the constraint region is the rotated square. The lasso solution is the first place that the contours touch the square, and this will sometimes occur at a corner, corresponding to a zero coefficient.

3. Some Result

Lemma 1: Let,, , and

(5)

There exist j, make

(6)

(7)

Let is solution of (7), then

(8)

Proof: According to (5), (7) is equivalent to

(9)

We obtain solution of (9):

Theorem 1: Let, we can loss of generality that, and the tuning parameter. Let  be the least squares estimate. The lasso estimate  is defined by (1). Then, we proved that some coefficients become 0 by method of lasso.

Proof: when, (1) equivalent to (7)

1. when,

(1) When, the coefficient can not become 0 by method of lasso.

(2) When, , .

As figure.1, the line denotes, the point denotes the least squares estimate, the point  above the line , the rotated square denotes the constraint region,  equivalent to the shortest distance point of from the rotated square  to the point .

Obviously is the shortest distance between the point  and the rotated square. The point of the rotated square is the nearest point of the point.

We may assume that the coordinates of the point  is, the lasso estimate is.

We proved that a coefficient become 0 by method of lasso.

(3) Suppose, , , , .

As shown figure 3, the point  denotes the least squares estimate, the point  below the line, the rotated square denotes the constraint region, the line  denotes.

equivalent to the shortest distance point of from the rotated square  to the point.

Obviously,  is the shortest distance between the point  and the rotated square, the point of the rotated square is the nearest point of the point.

Figure 3. The picture 2 of lasso.

We may assume that the coordinates of the point  is. The lasso estimate is. We proved that some coefficients become 0 by method of lasso.

On the other two cases:, ; , . Similarly the two cases can be proved.

Thus, when, we proved that some coefficient become 0 by method of lasso.

2. When

(1) There are equal numbers in, equal number of  as one factor; we proved that some coefficients become 0 by method of lasso.

(2)

According to Lemma 1, solution of (1) is

In the case of, we proved that some coefficients become 0 by method of lasso.

(3), we suppose, other parameters are bigger than 0. Let, we consider by symmetry of, we proved that some coefficients become 0 by method of lasso. The other conditions of  can be proved similarly. Thus theorem 1 is proven.

Inference 1: Let,, , There exist , make

is solution of (7)

Then, when,

is the optimal estimation of lasso.

Example let, . What is the optimal estimation of lasso?

Proof: the least squares estimate, let, then the lasso estimate:

When, then the optimal estimation of lasso:

When, let, then the lasso estimate:

,,

When, let, then the lasso estimate

,

When, then the optimal estimation of lasso:

When , let, then the lasso estimate

, then the optimal estimation of lasso:

4. Conclusion

The lasso estimate has ‘soft threshold’ estimator by Donoho and Johnstone, We give a new estimate of the lasso estimation, and we obtained the following conclusions with the new estimates and examples:

1.  There is a gradual process of dimension reduction by using method of lasso, p variables of lasso can only get rid of one variable using p-dimensional data of lasso, If you want to get rid of the second variables, and you must use p-1 dimensional data of lasso, Present algorithm of lasso must be modified.

2.  The calculation formula of the optimal Lasso is found.

3.  Making a historic contribution to the computation of high dimensional data.

Acknowledgements

I would like to express my gratitude to all those who helped me during the writing of this paper.

References

1. R. Tibshirani. "Regression Shrinkage and Selection Via the Lasso," Journal of the Royal Statistical Society. 1996, Series B, 58(1), pp: 267-288.
2. B. Efron, T. Hastie, I. "Johnstone and R. Tibshirani. Least Angle Regression," The Annals of Statistics 2004, Vol. 32. No. 2, pp: 407-499.
3. J. Fan and R. Li. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association. 2001, Vol. 96, No. 456, pp: 1348-1360.
4. K. Knight, W. Fu. "Asymptotics for Lasso-Type Estimators," The Annals of Statistics. 2000, Vol.28, No. 5, pp: 1356-1378.
5. H. Zou, H. Zhang, "On the Adaptive Elastic-net with a Diverging Number of Parameters" The Annals of Statistics. 2009, 37(4), pp: 1733–1751.
6. D. Donohu, I. Johnstone, "Ideal spatial adaption by wavelet shinkage," Biometrica. 1994, 81, pp: 425-455.
7. H. Zou, T. Hastie, "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society. 2005, Series B, 67(2), pp: 301-320.
8. R. Tibshirani, M. Saunders, S. Rossrt, J. Zhu, K. Knight, "Sparsity and smoothness via the fused lasso," Journal of the Royal Statistical Society. 2005, Series B, 67(1), pp: 91-108.
9. L. Wasserman, K. Roeder, "HIGH-DIMENSIONAL VARIABLE SELECTION," The Annals of Statistics. 2009, 37(5A), pp: 2718–2201.
10. E. Austin, W. Pan, X. Shen, "Penalized Regression and Risk Prediction in Genome-Wide Association Studies," Stat Anal Data Min. 2013, 6(4), pp: 1: 23.
11. L. Wu, Y. Yang, H. Liu, "Nonnegative-lasso and in index tracking," Computational Statistics and Data Analysis. 2014, 70, pp: 116-126.
12. F. Bunea, J. Leder, Y. She, "The Group Square-Root Lasso: Theoretical Properties and Fast Algorithms," Information Theory IEEE Transactions on. 2013, 60(2), pp: 1313-1325.
13. A. Ahrens, A. Bhattacharjee, "Two-Step Lasso Estimation of the Spatial weighs Matrix," Econometrics. 2015, 3, pp: 128-155.

 Contents 1. 2. 3. 4.
Article Tools