B-spline Speckman Estimator of Partially Linear Model

The partially linear model (PLM) is one of semiparametric regression models; since it has both parametric (more than one) and nonparametric (only one) components in the same model, so this model is more flexible than the linear regression models containing only parametric components. In the literature, there are several estimators are proposed for this model; where the main difference between these estimators is the estimation method used to estimate the nonparametric component, since the parametric component is estimated by least squares method mostly. The Speckman estimator is one of the commonly used for estimating the parameters of the PLM, this estimator based on kernel smoothing approach to estimate nonparametric component in the model. According to the papers in nonparametric regression, in general, the spline smoothing approach is more efficient than kernel smoothing approach. Therefore, we suggested, in this paper, using the basis spline (Bspline) smoothing approach to estimate nonparametric component in the model instead of the kernel smoothing approach. To study the performance of the new estimator and compare it with other estimators, we conducted a Monte Carlo simulation study. The results of our simulation study confirmed that the proposed estimator was the best, because it has the lowest mean squared error.


Introduction
Linear regression modelling is a good form for linking variables because in general the parameters have some kind of meaning or interpretation. Nevertheless, it is known that the main drawback of the linear regression models is lacking flexibility. In practice, this fact causes that some interesting relationships cannot be modelled by means of this class of models. A way to avoid that drawback is to add to the linear regression model a nonparametric component. The resulting model known as partially linear model (PLM) was introduced by Engle et al. [1] to study the effect of weather on electricity demand. The PLM is defined by: = + + ; = 1,2, … , , where denotes the dependent variable, = , … , and are the independent variables, • is the nonparametric part in the model, = , … , is the vector of regression parameters of the parametric part in the model, and the random errors , … , are independent and identically distributed, and | , = 0; | , = . Note that the intercept term has been omitted from the parametric component without loss of any generality, since the first point on a nonparametric regression line plays the role of an intercept. This model has gained great popularity since it was first introduced by Engle et al. [1] and has been widely applied in economics, social, biological sciences, and so on.
The PLM can be reduced to the classical liner regression model if the nonparametric component is equal zero. The goal is to estimate the unknown parameter vector and nonparametric function from the data { , , }.
Recently, Abonazel et al. [6] modified the Speckman estimator by using the spline smoothing approach, and they showed that their estimator is more efficient than traditional Speckman estimator that based on the kernel smoothing approach.
In this paper, we will suggest using the basis spline (Bspline) to estimate a nonparametric component in the model. And we will compare the new estimator with the traditional Speckman's [8] and Abonazel's [6] estimators.
The rest of the paper is organized as follows. In the next two sections, we introduce the traditional Speckman's [8] and Abonazel's [6] estimators respectively. Our proposed estimator is presented in section 4. While in section 5, the Monte Carlo simulation study is conducted to compare the performance of the three estimators. The concluding remarks are included in section 6.

Spline Smoothing Estimator
Abonazel et al. [6] modified the traditional Speckman estimator by using the spline smoothing approach to estimate the nonparametric component in PLM. According to their approach, the fitted values of are given by minimizing the penalized sum of squares (PSS): where C is the estimated parametric component that based on the smoother spline matrix , O , depending on smoothing parameter H and the knots points. And the nonparametric component can be estimated by where C O is a natural cubic spline with knots at ξ , . . . , ξ Q for a fixed H > 0, and , O is a well-known positive-definite smoother matrix which depends on H and the knot points. To solve equation (7), an iterative algorithm is required.
It note that the second term 3I [ K L ] N 9 is the penalty (or regularization) term, where • refers to second derivatives, and T and U are the minimum and maximum values of respectively. The first term in equation (7) measures the closeness to the data. While the second term in same equation penalized the curvature in the function, but it is difficult to solve mathematically, so Green and Silverman [17] solved it with the assumption that the function is a natural cubic spline. This might seem an over-parameterized model, however, the penalty term ensures that the coefficients are shrunk towards linearity, limiting the number of degrees of freedom used. Let T, U be an interval and let ξ , … , ξ Q be V points such that T < ξ < ⋯ < ξ Q < U. A continuous function on T, U is a cubic spline with knots { ξ , … , ξ Y } if a) is a cubic polynomial over the intervals {(ξ , ξ , …}. b) has continuous first and second derivatives at the knots. Based on the above the natural cubic spline can be defined as a polynomial spline : [a, b] → ℝ of degree three if a = b = 0. Natural cubic spline are cubic spline with the constraint that they are linear in their tails beyond the boundary knots a, ξ and ξ Y , b . In general, the placement of the knots and the determination of the penalty are very important for a spline smoothing where the number of knots is equal the number of observations. While in penalized spline be the number of knots is less than the number of observations, see [18,19] for more details.

Proposed Estimator
Now we suggest using the B-spline approach instead of the kernel approach to fit the nonparametric component in this model. A B-spline is a spline function that has minimal support with respect to a given degree, smoothness, and domain partition. De Boor [20] shows that every spline function of a given degree, smoothness, and domain partition can be uniquely represented as a linear combination of Bsplines of that same degree and smoothness, and over that same partition.
Beginning, let _`= T and _ Qa = U where the knots from _ to _ Q is called inner knots while T and U are called boundary knots. Define new knots b < ⋯ < b c such that b ≤ b ≤ ⋯ ≤ b c ≤ _`, b eac = _ e for e = 1, … , V, and _ Qa ≤ b Qaca ≤ ⋯ ≤ b Qa c . The choice of extra knots is arbitrary; usually one takes b = b = ⋯ = b c = _` and _ Qa = b Qaca = ⋯ = b Qa c (see Wasserman [19]). Given a set of V knots, the B-spline basis function recursively can be defined by De Boor [20] introduced an algorithm to compute B-spline of any degree from B-spline of lower degree. Because a Bspline basis function in equation (8) is just a constant on one interval between two knots, it is simple to compute B-spline of any degree. His algorithm also works for any placement of knots (i.e. equidistant knots or not equidistant knots). The  (9) is equal zero, then the function is defined to be zero, Note that additional 2l + 2 knots are necessary for constructing the full B-spline basis of degree l. Eilers and Marx [21] introduced the combination of Bsplines and difference penalties which they called penalized B-spline (P-spline). The P-spline estimation is given by minimizes the following function: where f is a × V + l + 1 matrix contains B-splines that given by (9), t = t , … , t Qana is a parameter vector of the spline function, and u v has dimension − N × . In practice, d = 2 or 3 is commonly used, see [22][23][24]. For example, if N = 1 or N = 2, the u and u matrices can be designed as follows: Minimizing PPSS leads to where | = #f f + H u v u v $ 6 f is the hat matrix. Using equation (10), we can estimate the conditional expectation ' ( and ' ) : The modified variables " * }~ and ! * }~ are: Now we can estimate the parametric and nonparametric component: C }~ = " * }~ " * }~ 6 " * }~ ! * }~, We can summarize our proposed estimator in the following algorithm: Step 1: Construct of B-splines as in (9).
Step 2: Estimate the parameter vector of the spline function t by (10).

Monto Carlo Simulation Study
In this section, we will investigate the performance of the three estimators: the spline smoothing estimator, kernel estimator (the used kernel function is biweight), and proposed (B-spline) estimator, by conduct a Monte Carlo simulation study. R software is used to perform our simulation study. For information about how to make Monte Carlo simulation studies using R, see [25,26].
In our simulation study, Monte Carlo experiments were carried out based on equation (1). The simulated model is generated as follows: 1. The number of parametric coefficients is =2, 4, and 8; where % = 1 ∀% = 1, … , . The results of simulation are recorded in tables 1-9. These tables present the average of MSE (AMSE) for C and C using error terms with different standard deviations, different sample sizes, different shapes of the nonparametric component, and different number of explanatory variables. From tables 1-9, we can summarize some effects for the kernel, spline, and B-spline estimators in the following points: a) As increases, the AMSEs decrease. b) As increases, the AMSEs increase. c) As increases, the AMSEs increase.
In general, we can conclude that the AMSEs of B-spline estimator are smaller than the AMSEs of kernel and spline estimators in all simulation situations. But it notes that, for the parametric component, the AMSEs of the kernel, spline, and B-spline estimators are relatively close.
Graphically, we illustrate the degree of goodness of fit of the kernel, spline, and B-spline estimators for the three nonparametric functions via different simulated PLMs. These models are generated based on different and , while = 0.5. The fitted curves of the estimators for the three nonparametric functions are shown in figures 1-3, respectively. From figure 1, we find that the fitted curve of B-spline estimator is closer to the true curve than kernel and spline estimators. The same results can be concluded from other figures. This means that B-spline estimator performs better regardless of the form of nonparametric function.

Conclusion
In this paper, we proposed new estimator for the PLM based on B-spline approach. Moreover, the performance of our estimator and Speckman's [8] and Abonazel's [6] estimators are investigated by a Monte Carlo simulation study. The simulation study is conducted to evaluate and compare the performance of these estimators (that based on kernel smoothing, spline smoothing, and B-spline smoothing) under different situations (such as: different shapes of the nonparametric component, different number of parametric variables, different sample sizes, and different standard deviations of error term). The simulation results confirm that our proposed estimator is more efficient than other estimators.