Convergence of Online Gradient Method for Pi-sigma Neural Networks with Inner-penalty Terms
Kh. Sh. Mohamed^{1,}^{ 2}, Xiong Yan^{3}, Y. Sh. Mohammed^{4,}^{ 5}, Abd-Elmoniem A. Elzain^{5,}^{ 6}, Habtamu Z. A.^{2},
Abdrhaman M. Adam^{2}
^{1}Mathematical Department, College of Science, Dalanj University, Dalanj, Sudan
^{2}School of Mathematical Sciences, Dalian University of Technology, Dalian, China
^{3}School of Science, Liaoning University of Science & Technology, Anshan, China
^{4}Physics Department, College of Education, Dalanj University, Dalanj, Sudan
^{5}Department of Physics, College of Science & Art, Qassim University, Oklat Al- Skoor, Saudi Arabia
^{6}Department Department of Physics, University of Kassala, Kassala, Sudan
Email address:
To cite this article:
Kh. Sh. Mohamed, Xiong Yan, Y. Sh. Mohammed, Abd-Elmoniem A. Elzain, Habtamu Z. A., Abdrhaman M. Adam. Convergence of Online Gradient Method for Pi-sigma Neural Networks with Inner-penalty Terms. American Journal of Neural Networks and Applications. Vol. 2, No. 1, 2016, pp. 1-5. doi: 10.11648/j.ajnna.20160201.11
Received: March 14, 2016; Accepted: March 30, 2016; Published: May 10, 2016
Abstract: This paper investigates an online gradient method with inner- penalty for a novel feed forward network it is called pi-sigma network. This network utilizes product cells as the output units to indirectly incorporate the capabilities of higher-order networks while using a fewer number of weights and processing units. Penalty term methods have been widely used to improve the generalization performance of feed forward neural networks and to control the magnitude of the network weights. The monotonicity of the error function and weight boundedness with inner- penalty term and both weak and strong convergence theorems in the training iteration are proved.
Keyword: Convergence, Pi-sigma Network, Online Gradient Method, Inner-penalty, Boundedness
1. Introduction
A novel higher order feedforward polynomial neural network is known to provide inherently more powerful mapping abilities than traditional feed forward neural network called the pi-sigma network (PSN) [2]. This network utilizes product cells as the output units to indirectly incorporate the capabilities of higher-order networks while using a fewer number of weights and processing units. The neural networks consisting of the PSN modules has been used effectively in pattern classification and approximation problems [1, 4, 10, 11].There are two ways of training to updating weight: The first approach, batch (offline) training[18],the weights are modified after each training pattern is presented to the network. Second approach, online training, the weights updating immediately after each training sample is fed see [13]. The penalty term is often introduced into the network training algorithms has been widely used so as to control the magnitude of the weights and to improve the generalization performance of the network [6, 8], here the generalization performance refers to the capacity of a neural network to give correct outputs for untrained data. Specially cause, in the second approach the training weights updating become very large and over-fitting tends to occur, by adding the penalty term in into the cost function, when use second approach has been successfully application see [3, 7, 12, 14], which acts as a brute-force to drive unnecessary weights to zero and to prevent the weights from taking too large in the training process. In the work area of penalty term at the same of the inner-penalty term (IP), which have worked to reduce the magnitude of the network weights with efficiency improve the generalization performance of the network [5, 9, 17]. In this paper, we prove the (strong and weak) convergence of the online gradient with inner penalty and the monotonicity of the error function and the weight sequence are uniformly bounded during the training procedure with inner-penalty.
The rest of this paper is organized as follows. The neural network structure and the online gradient method with inner-penalty are described in Section 2. The preliminary lemmas are disruption in Section 3. The convergence results are presented and the rigorous proofs of the main results are provided in Section 4. Finally, in Section 5 we conclusions this study.
2. PSN-Algorithm
PSN is a higher order feed forward polynomial neural network consisting of a single hidden layer. The hidden layer has summing units where as the output layer has product units. PSN, which has a three-layer network consisting ofinput units,summation units, and 1 product layers. Letthe weight vectors connecting the input and summing units, and write . We have included a special input unit corresponding to the biases with fixed value-1. The structure of PSN is shown in Figure 1.
Where and Assume is a given activation function. For an input , the output of the network is
(1)
The network supplied with a given set of training samples . The error function with a inner penalty given by
(2)
Where is a inner penalty coefficient and The gradient function is given by
(3)
Given an initial weight , the online method with inner penalty updates them iteratively by the form
(4)
(5)
Where is the learning rate in the training cycle.We denote by the usual Euclidean norm and the corresponding derived matrix norm and the following Assumption is imposed throughout this paper.
Assumption 1.
Assumption 2.
Assumption 3.
The learning rate and penalty parameter are chosen to satisfy the condition:
Assumption 4.
are contained in bounded closed region , and there are exist points in set .
3. Preliminary Lemmas
The next lemma present the montonicity of the sequence . It is essential for the proof of weakly convergence of PSN with penalty, presented in the following Theorems. For sake of description, we denote
(6)
(7)
(8)
To begin with, first we present a few lemmas as preparation to prove Theorems
Lemma 1.Let Assumption 1~2 are valid, there hold
(9)
(10)
Proof .By Assumption 2and Cauchy- Schwartz inequality, we have
(11)
where. Similarly, we get
(12)
Here, . This is completes the proof.
Lemma 2.Suppose Assumptions 1~3 are satisfied, and the weight sequenceis generated by (4) ~ (5), then
(13)
Proof. Applying Taylor’s formula to extend at , we have
(14)
where are on the line segment between andAfter dealing with (14) by accumulation for we obtain from (2), (4), (5) and Taylor’s formula we obtain
(15)
where
(16)
(17)
(18)
(19)
By Assumption 1, (4) ~ (5), Lemma 1and the mean value theorem gives
(20)
Thus with (1) gives
(21)
Here, . By (4), (9) in Lemma 1 and Cauchy- Schwartz inequality, we have
(22)
where . It follows from Assumption 1~2, (2) and Taylor’s formula, we obtain
(23)
where By Assumption (2) and (4) leads
(24)
where. Collate (20) ~(24) into (1) gives
(25)
This is completes the proof.
Lemma 3.Suppose that is continuous and differentiable on a compact set and that has only finite number of point. If a sequence satisfies then , Then there exists a point such that
Proof. This result is basically the same as Theorem 14.1.5 in [16], and the detailed proof is thus omitted.
4. Convergence Theorems
Now, we can elucidate and proofs the convergence theorems, which we needed
Theorem 1.(Monotonicity): Let Assumption 1~3 are valid and the weight sequencebe generated by (4) ~ (5),then
(26)
Proof. Let
(27)
By Assumption 3,which satisfies
(28)
Thus with (28) and Lemma 2, we have
(29)
This completes the proof of the Theorem 1.
Theorem 2. (Boundednss): Suppose that Assumption of Theorem 1 are valid, the weight sequencebe generated by (4) ~ (5) are uniformly bounded.
Proof. By Assumption 1, and Theorem 1, we have
(30)
and
(31)
From (2), (30) gives
(32)
By (4) ~ (5), we have
(33)
Let the second part of above equation be , Denote and be the orthogonal complement space of . Denote the second part of (33) by , obviously . we divide into , where and . Then . Applying this to (33) we have
(34)
Suppose is a base of the space . There are such that . Then we get
(35)
Is a base, the coefficient determinant equal to zero, and the system of the linear equations has a unique solution. Assume that the coefficient determinant equals to. Then the solution is as follows
Then the solution is as follows
(36)
Let the maximum absolute value of all the sub-determinant with rank of the coefficient determinant is , then . By (34) we have . Denote , then
(37)
That is are bounded uniformly bounded. So from (29), we know are uniformly bounded. In all, we getare uniformly bounded, i.e., there exist a bounded closed region such that .
Theorem 3. (Weak convergence): Suppose that Assumption 1~3 are valid and the weight sequencebe generated by (4) ~ (5), then
(38)
Furthermore, if Assumption 4 is also valid, we have the strong convergence: There exists such that
(39)
Proof .By (28) and setting such that , we have
(40)
Since for any . We set
(41)
Combining (3) ~ (5), immediately gives the weak convergence result:
(42)
Next we prove the strong convergence it follows from (4)~(5) and (42) that leads
(43)
Note that the error function defined in (2) is continuously differentiable. By (43), Assumptions 4 and Lemma 3, immediately get the desired result. This completes the proof.
5. Conclusion
Through our study of this paper, the monotoncity of the error function in formula (2) and the weight sequence boundedness via formula (4) ~ (5) for the online gradient method with inner-penalty are presented, under those condition both weakly and strongly convergence theorems are proved.
Acknowledgment
We gratefully acknowledge Dalanj University for supporting this research. And our thanks to the anonymous reviewers and the editors for their previous helpful comments and valuable suggestion for their kind helps during the period of the research.
References