American Journal of Neural Networks and Applications
Volume 2, Issue 1, February 2016, Pages: 1-5

Convergence of Online Gradient Method for Pi-sigma Neural Networks with Inner-penalty Terms

Kh. Sh. Mohamed1, 2, Xiong Yan3, Y. Sh. Mohammed4, 5, Abd-Elmoniem A. Elzain5, 6, Habtamu Z. A.2,
Abdrhaman M. Adam2

1Mathematical Department, College of Science, Dalanj University, Dalanj, Sudan

2School of Mathematical Sciences, Dalian University of Technology, Dalian, China

3School of Science, Liaoning University of Science & Technology, Anshan, China

4Physics Department, College of Education, Dalanj University, Dalanj, Sudan

5Department of Physics, College of Science & Art, Qassim University, Oklat Al- Skoor, Saudi Arabia

6Department Department of Physics, University of Kassala, Kassala, Sudan

Email address:

(Kh. Sh. Mohamed)
(Xiong Yan)
(Y. Sh. Mohammed)
(Abd-Elmoniem A. E.)
(Habtamu Z. A)
(A. M. Adam)

To cite this article:

Kh. Sh. Mohamed, Xiong Yan, Y. Sh. Mohammed, Abd-Elmoniem A. Elzain, Habtamu Z. A., Abdrhaman M. Adam. Convergence of Online Gradient Method for Pi-sigma Neural Networks with Inner-penalty Terms. American Journal of Neural Networks and Applications. Vol. 2, No. 1, 2016, pp. 1-5. doi: 10.11648/j.ajnna.20160201.11

Received: March 14, 2016; Accepted: March 30, 2016; Published: May 10, 2016

Abstract: This paper investigates an online gradient method with inner- penalty for a novel feed forward network it is called pi-sigma network. This network utilizes product cells as the output units to indirectly incorporate the capabilities of higher-order networks while using a fewer number of weights and processing units. Penalty term methods have been widely used to improve the generalization performance of feed forward neural networks and to control the magnitude of the network weights. The monotonicity of the error function and weight boundedness with inner- penalty term and both weak and strong convergence theorems in the training iteration are proved.

Keyword: Convergence, Pi-sigma Network, Online Gradient Method, Inner-penalty, Boundedness

1. Introduction

A novel higher order feedforward polynomial neural network is known to provide inherently more powerful mapping abilities than traditional feed forward neural network called the pi-sigma network (PSN) [2]. This network utilizes product cells as the output units to indirectly incorporate the capabilities of higher-order networks while using a fewer number of weights and processing units. The neural networks consisting of the PSN modules has been used effectively in pattern classification and approximation problems [1, 4, 10, 11].There are two ways of training to updating weight: The first approach, batch (offline) training[18],the weights are modified after each training pattern is presented to the network. Second approach, online training, the weights updating immediately after each training sample is fed see [13]. The penalty term is often introduced into the network training algorithms has been widely used so as to control the magnitude of the weights and to improve the generalization performance of the network [6, 8], here the generalization performance refers to the capacity of a neural network to give correct outputs for untrained data. Specially cause, in the second approach the training weights updating become very large and over-fitting tends to occur, by adding the penalty term in into the cost function, when use second approach has been successfully application see [3, 7, 12, 14], which acts as a brute-force to drive unnecessary weights to zero and to prevent the weights from taking too large in the training process. In the work area of penalty term at the same of the inner-penalty term (IP), which have worked to reduce the magnitude of the network weights with efficiency improve the generalization performance of the network [5, 9, 17]. In this paper, we prove the (strong and weak) convergence of the online gradient with inner penalty and the monotonicity of the error function and the weight sequence are uniformly bounded during the training procedure with inner-penalty.

The rest of this paper is organized as follows. The neural network structure and the online gradient method with inner-penalty are described in Section 2. The preliminary lemmas are disruption in Section 3. The convergence results are presented and the rigorous proofs of the main results are provided in Section 4. Finally, in Section 5 we conclusions this study.

2. PSN-Algorithm

PSN is a higher order feed forward polynomial neural network consisting of a single hidden layer. The hidden layer has summing units where as the output layer has product units. PSN, which has a three-layer network consisting ofinput units,summation units, and 1 product layers. Letthe weight vectors connecting the input and summing units, and write . We have included a special input unit  corresponding to the biases  with fixed value-1. The structure of PSN is shown in Figure 1.

Figure 1. PSN structure with a single output

Where and  Assume  is a given activation function. For an input , the output of the network is


The network supplied with a given set of training samples . The error function with a inner penalty given by


Where is a inner penalty coefficient and The gradient function is given by


Given an initial weight , the online method with inner penalty updates them iteratively by the form



Where is the learning rate in the  training cycle.We denote by the usual Euclidean norm and the corresponding derived matrix norm and the following Assumption is imposed throughout this paper.

Assumption 1.

Assumption 2.

Assumption 3.

The learning rate and penalty parameter are chosen to satisfy the condition:

Assumption 4.

are contained in bounded closed region , and there are exist points in set .

3. Preliminary Lemmas

The next lemma present the montonicity of the sequence . It is essential for the proof of weakly convergence of PSN with penalty, presented in the following Theorems. For sake of  description, we denote




To begin with, first we present a few lemmas as preparation to prove Theorems

Lemma 1.Let Assumption 1~2 are valid, there hold



Proof .By Assumption 2and Cauchy- Schwartz inequality, we have


where. Similarly, we get


Here, . This is completes the proof.

Lemma 2.Suppose Assumptions 1~3 are satisfied, and the weight sequenceis generated by (4) ~ (5), then


Proof. Applying Taylor’s formula to extend  at , we have


where are on the line segment between andAfter dealing with (14) by accumulation  for  we obtain from (2), (4), (5) and Taylor’s formula we obtain







By Assumption 1, (4) ~ (5), Lemma 1and the mean value theorem gives


Thus with (1) gives



Here, . By (4), (9) in Lemma 1 and Cauchy- Schwartz inequality, we have



where .  It follows from Assumption 1~2, (2) and Taylor’s formula, we obtain


where By Assumption (2) and (4) leads


where. Collate (20) ~(24) into (1) gives


This is completes the proof.

Lemma 3.Suppose that  is continuous and differentiable on a compact set  and that  has only finite number of point. If a sequence  satisfies then , Then there exists a point  such that

Proof. This result is basically the same as Theorem 14.1.5 in [16], and the detailed proof is thus omitted.

4. Convergence Theorems

Now, we can elucidate and proofs the convergence theorems, which we needed

Theorem 1.(Monotonicity):  Let Assumption 1~3 are valid and the weight sequencebe generated by (4) ~ (5),then


Proof. Let


By Assumption 3,which satisfies


Thus with (28) and Lemma 2, we have



This completes the proof of the Theorem 1.

Theorem 2. (Boundednss): Suppose that Assumption of Theorem 1 are valid, the weight sequencebe generated by (4) ~ (5) are uniformly bounded.

Proof. By Assumption 1, and Theorem 1, we have




From (2), (30) gives


By (4) ~ (5), we have


Let the second part of above equation be , Denote  and  be the orthogonal complement space of .  Denote the second part of (33) by , obviously  . we divide  into , where  and . Then . Applying this to (33) we have


Suppose is a base of the space . There are  such that . Then  we get


Is a base, the coefficient determinant equal to zero, and the system of the linear equations has a unique solution. Assume that the coefficient determinant equals to. Then the solution is as follows

Then the solution is as follows


Let the maximum absolute value of all the sub-determinant with rank  of the coefficient determinant is , then . By (34) we have . Denote  , then


That is  are bounded uniformly bounded. So from (29), we know  are uniformly bounded. In all, we getare uniformly bounded, i.e., there exist a bounded closed region  such that .

Theorem 3. (Weak convergence): Suppose that Assumption 1~3 are valid and the weight sequencebe generated by (4) ~ (5), then


Furthermore, if Assumption  4  is also valid,  we have the strong convergence: There exists   such that


Proof .By (28) and setting  such that , we have



Since  for any . We set


Combining (3) ~ (5), immediately gives the weak convergence result:


Next we prove the strong convergence it follows from (4)~(5) and (42) that leads


Note that the error function  defined in (2) is continuously differentiable. By (43), Assumptions 4 and Lemma 3, immediately get the desired result. This completes the proof.

5. Conclusion

Through our study of this paper, the monotoncity of the error function in formula (2) and the weight sequence boundedness  via formula (4) ~ (5) for the online gradient method with inner-penalty are presented, under those condition both weakly and strongly convergence theorems are proved.


We gratefully acknowledge Dalanj University for supporting this research. And our thanks to the anonymous reviewers and the editors for their previous helpful comments and valuable suggestion for their kind helps during the period of the research.


  1. A J Hussaina and P Liatsisb, Recurrent pi-sigma networks for DPCM image coding. Neurocomputing, 55(2002) 363-382.
  2. Y Shin and J Ghosh, The pi-sigma network: An efficient higher-order neural network for pattern classification and function approximation. International Joint Conference on Neural Networks, 1(1991) 13–18.
  3. P L Bartlett, For valid generalization, the size of the weights is more important than the size of the network, Advances in Neural Information Processing Systems 9 (1997) 134–140.
  4. L J Jiang, F Xu and S R Piao, Application of pi-sigma neural network to real-time classification of seafloor sediments. Applied Acoustics, 24(2005) 346–350.
  5. R Reed, Pruning algorithms-a survey. IEEE Transactions on Neural Networks 8 (1997) 185–204.
  6. G Hinton, Connectionist learning procedures, Artificial Intelligence 40(1989)185-243.
  7. S Geman, E Bienenstock, R Doursat, Neural networks and the bias/variance dilemma,  Neural Computation 4 (1992) 1–58.
  8. S Loone and G Irwin, Improving neural network training solutions using regularisation,Neurocomputing37(2001)71-90.
  9. A S Weigend, D E Rumelhart and B A Huberman, Generalization by weight-elimination applied to currency exchange rate prediction. Proc. Intl Joint Conf. on Neural Networks 1(Seatle, 19916) 837-841.
  10. Y Shin and J Ghosh, Approximation of multivariate functions using ridge polynomial networks, International Joint Conference on Neural Networks 2 (1992) 380-385.
  11. M Sinha, K Kumar and P K Kalra, Some new neural network architectures with improved learning schemes. Soft Computing, 4 (2000) 214-223.
  12. R Setiono, A penalty-function approach for pruning feed forward neural networks, Neural Networks 9 (1997) 185–204.
  13. W Wu and Y S Xu, Deterministic convergence of an online gradient method for neural networks, Journal of Computational and Applied Mathematics 144 (1-2) (2002) 335-347.
  14. H S Zhang and W Wu, Boundedness and convergence of online gradient method with penalty for linear output feed forward neural networks, Neural Process Letters 29 (2009) 205–212.
  15. H F Lu, W Wu, C Zhang and X Yan, Convergence of Gradient Descent Algorithm for Pi- Sigma Neural Networks, Journal of Information and Computational Science 3: 3 (2006) 503-509.
  16. YX Yuan and WY Sun, Optimization Theory and Methods, Science Press, Beijing, 2001.
  17. J Kong and W Wu, Online gradient methods with a punishing term for neural networks. Northeast Math. J 1736(2001) 371-378.
  18. W Wu, G R Feng and X Z Li, Training multiple perceptrons via minimization of sum of ridge functions, Advances in Computational Mathematics 17 (2002) 331-347.

Article Tools
Follow on us
Science Publishing Group
NEW YORK, NY 10018
Tel: (001)347-688-8931