Convergence of Online Gradient Method for Pi-sigma Neural Networks with Inner-penalty Terms
Kh. Sh. Mohamed1, 2, Xiong Yan3, Y. Sh. Mohammed4, 5, Abd-Elmoniem A. Elzain5, 6, Habtamu Z. A.2,
Abdrhaman M. Adam2
1Mathematical Department, College of Science, Dalanj University, Dalanj, Sudan
2School of Mathematical Sciences, Dalian University of Technology, Dalian, China
3School of Science, Liaoning University of Science & Technology, Anshan, China
4Physics Department, College of Education, Dalanj University, Dalanj, Sudan
5Department of Physics, College of Science & Art, Qassim University, Oklat Al- Skoor, Saudi Arabia
6Department Department of Physics, University of Kassala, Kassala, Sudan
To cite this article:
Kh. Sh. Mohamed, Xiong Yan, Y. Sh. Mohammed, Abd-Elmoniem A. Elzain, Habtamu Z. A., Abdrhaman M. Adam. Convergence of Online Gradient Method for Pi-sigma Neural Networks with Inner-penalty Terms. American Journal of Neural Networks and Applications. Vol. 2, No. 1, 2016, pp. 1-5. doi: 10.11648/j.ajnna.20160201.11
Received: March 14, 2016; Accepted: March 30, 2016; Published: May 10, 2016
Abstract: This paper investigates an online gradient method with inner- penalty for a novel feed forward network it is called pi-sigma network. This network utilizes product cells as the output units to indirectly incorporate the capabilities of higher-order networks while using a fewer number of weights and processing units. Penalty term methods have been widely used to improve the generalization performance of feed forward neural networks and to control the magnitude of the network weights. The monotonicity of the error function and weight boundedness with inner- penalty term and both weak and strong convergence theorems in the training iteration are proved.
Keyword: Convergence, Pi-sigma Network, Online Gradient Method, Inner-penalty, Boundedness
A novel higher order feedforward polynomial neural network is known to provide inherently more powerful mapping abilities than traditional feed forward neural network called the pi-sigma network (PSN) . This network utilizes product cells as the output units to indirectly incorporate the capabilities of higher-order networks while using a fewer number of weights and processing units. The neural networks consisting of the PSN modules has been used effectively in pattern classification and approximation problems [1, 4, 10, 11].There are two ways of training to updating weight: The first approach, batch (offline) training,the weights are modified after each training pattern is presented to the network. Second approach, online training, the weights updating immediately after each training sample is fed see . The penalty term is often introduced into the network training algorithms has been widely used so as to control the magnitude of the weights and to improve the generalization performance of the network [6, 8], here the generalization performance refers to the capacity of a neural network to give correct outputs for untrained data. Specially cause, in the second approach the training weights updating become very large and over-fitting tends to occur, by adding the penalty term in into the cost function, when use second approach has been successfully application see [3, 7, 12, 14], which acts as a brute-force to drive unnecessary weights to zero and to prevent the weights from taking too large in the training process. In the work area of penalty term at the same of the inner-penalty term (IP), which have worked to reduce the magnitude of the network weights with efficiency improve the generalization performance of the network [5, 9, 17]. In this paper, we prove the (strong and weak) convergence of the online gradient with inner penalty and the monotonicity of the error function and the weight sequence are uniformly bounded during the training procedure with inner-penalty.
The rest of this paper is organized as follows. The neural network structure and the online gradient method with inner-penalty are described in Section 2. The preliminary lemmas are disruption in Section 3. The convergence results are presented and the rigorous proofs of the main results are provided in Section 4. Finally, in Section 5 we conclusions this study.
PSN is a higher order feed forward polynomial neural network consisting of a single hidden layer. The hidden layer has summing units where as the output layer has product units. PSN, which has a three-layer network consisting ofinput units,summation units, and 1 product layers. Letthe weight vectors connecting the input and summing units, and write . We have included a special input unit corresponding to the biases with fixed value-1. The structure of PSN is shown in Figure 1.
Where and Assume is a given activation function. For an input , the output of the network is
The network supplied with a given set of training samples . The error function with a inner penalty given by
Where is a inner penalty coefficient and The gradient function is given by
Given an initial weight , the online method with inner penalty updates them iteratively by the form
Where is the learning rate in the training cycle.We denote by the usual Euclidean norm and the corresponding derived matrix norm and the following Assumption is imposed throughout this paper.
The learning rate and penalty parameter are chosen to satisfy the condition:
are contained in bounded closed region , and there are exist points in set .
3. Preliminary Lemmas
The next lemma present the montonicity of the sequence . It is essential for the proof of weakly convergence of PSN with penalty, presented in the following Theorems. For sake of description, we denote
To begin with, first we present a few lemmas as preparation to prove Theorems
Lemma 1.Let Assumption 1~2 are valid, there hold
Proof .By Assumption 2and Cauchy- Schwartz inequality, we have
where. Similarly, we get
Here, . This is completes the proof.
Lemma 2.Suppose Assumptions 1~3 are satisfied, and the weight sequenceis generated by (4) ~ (5), then
Proof. Applying Taylor’s formula to extend at , we have
where are on the line segment between andAfter dealing with (14) by accumulation for we obtain from (2), (4), (5) and Taylor’s formula we obtain
By Assumption 1, (4) ~ (5), Lemma 1and the mean value theorem gives
Thus with (1) gives
Here, . By (4), (9) in Lemma 1 and Cauchy- Schwartz inequality, we have
where . It follows from Assumption 1~2, (2) and Taylor’s formula, we obtain
where By Assumption (2) and (4) leads
where. Collate (20) ~(24) into (1) gives
This is completes the proof.
Lemma 3.Suppose that is continuous and differentiable on a compact set and that has only finite number of point. If a sequence satisfies then , Then there exists a point such that
Proof. This result is basically the same as Theorem 14.1.5 in , and the detailed proof is thus omitted.
4. Convergence Theorems
Now, we can elucidate and proofs the convergence theorems, which we needed
Theorem 1.(Monotonicity): Let Assumption 1~3 are valid and the weight sequencebe generated by (4) ~ (5),then
By Assumption 3,which satisfies
Thus with (28) and Lemma 2, we have
This completes the proof of the Theorem 1.
Theorem 2. (Boundednss): Suppose that Assumption of Theorem 1 are valid, the weight sequencebe generated by (4) ~ (5) are uniformly bounded.
Proof. By Assumption 1, and Theorem 1, we have
From (2), (30) gives
By (4) ~ (5), we have
Let the second part of above equation be , Denote and be the orthogonal complement space of . Denote the second part of (33) by , obviously . we divide into , where and . Then . Applying this to (33) we have
Suppose is a base of the space . There are such that . Then we get
Is a base, the coefficient determinant equal to zero, and the system of the linear equations has a unique solution. Assume that the coefficient determinant equals to. Then the solution is as follows
Then the solution is as follows
Let the maximum absolute value of all the sub-determinant with rank of the coefficient determinant is , then . By (34) we have . Denote , then
That is are bounded uniformly bounded. So from (29), we know are uniformly bounded. In all, we getare uniformly bounded, i.e., there exist a bounded closed region such that .
Theorem 3. (Weak convergence): Suppose that Assumption 1~3 are valid and the weight sequencebe generated by (4) ~ (5), then
Furthermore, if Assumption 4 is also valid, we have the strong convergence: There exists such that
Proof .By (28) and setting such that , we have
Since for any . We set
Combining (3) ~ (5), immediately gives the weak convergence result:
Next we prove the strong convergence it follows from (4)~(5) and (42) that leads
Note that the error function defined in (2) is continuously differentiable. By (43), Assumptions 4 and Lemma 3, immediately get the desired result. This completes the proof.
Through our study of this paper, the monotoncity of the error function in formula (2) and the weight sequence boundedness via formula (4) ~ (5) for the online gradient method with inner-penalty are presented, under those condition both weakly and strongly convergence theorems are proved.
We gratefully acknowledge Dalanj University for supporting this research. And our thanks to the anonymous reviewers and the editors for their previous helpful comments and valuable suggestion for their kind helps during the period of the research.