Distributed Subgradient Algorithm for Multi-Agent Convex Optimization with Global Inequality and Equality Constraints
Li Xiao^{*}, Junjie Bao, Xi Shi
Department of Mathematics and Information Engineering, Chongqing University of Education, Chongqing, PR China
Email address:
To cite this article:
Li Xiao, Junjie Bao, Xi Shi. Distributed Subgradient Algorithm for Multi-Agent Convex Optimization with Global Inequality and Equality Constraints. Applied and Computational Mathematics. Vol. 5, No. 5, 2016, pp. 213-229. doi: 10.11648/j.acm.20160505.15
Received: August 7, 2016; Accepted: October 5, 2016; Published: October 27, 2016
Abstract: In this paper, we present an improved subgradient algorithm for solving a general multi-agent convex optimization problem in a distributed way,where the agents are to jointly minimize a global objective function subject to a global inequality constraint, a global equality constraint and a global constraint set. The global objective function is a combination of local agent objective functions and the global constraint set isthe intersection of each agent local constraint set. Our motivation comes from networking applications where dual and primal-dual subgradient methods have attracted much attention in the design of decentralized network protocols. Our main focus is on constrained problems where the local constraint sets are identical. Thus, we propose a distributed primal-dual subgradient algorithm, which isbased on the description of the primal-dual optimal solutions as the saddle points of the penalty functions. We show that,the algorithm can be implemented over networks with changing topologies but satisfying a standard connectivity property, and allow the agents to asymptotically converge to optimal solution with optimal value of the optimization problem under the Slater’s condition.
Keywords: Consensus, Saddle Point, Distributed Optimization, Subgradient Algorithm
1. Introduction
In recent years, distributed optimization and control have developed rapidly, and have been welcomed in the fields of industry and national defense, including smart grid, sensor network, social network and information system (Cyber-Physical system). Distributed optimization problems of multi-agent systems appear different kinds of distributed processing issues such as distributed estimation, distributed motion planning, distributed resource allocation and distributed congestion control [1-12]. The main focus is to solve a distributed optimization problem where the global objective function is composed of a sum of local objective functions, each of which is only known by one agent. Distributed optimization problems were first studied systematically in [1] where the union of the graphs was assumed to be strongly connected among each time interval of a certain bounded length and the adjacency matrices were doubly stochastic. A distributed subgradient method was introduced to solve the distributed optimization and error bounds on the performance index functions were given. As a continuation of [1], a distributed subgradient projection algorithm was developed in [2] for distributed optimization where each agent was constrained to remain in a closed convex set and the paper gave corresponding convergence analysis on identical closed convex sets and on fully connected graphs with non-identical closed convex sets. Inspired by the works of [1, 2], the algorithms proposed in [1] and [2] were studied in the random environment [3] and [4], where the agents had the same state constraint. In [5], the communication topology was undirected and each possible communication link was functioning with a given probability. Thus, the expected communication topology is essentially fixed and undirected. Different from [1]-[5], a dual averaging subgradient algorithm was developed and analyzed for randomized graphs under the assumption that all agents remain in the same closed convex set in [6] and it was shown that the number of iterations were required by their algorithm scales inversely in the spectral gap of the network. Moreover, distributed optimization problems with asynchronous step-sizes or inequality-equality constraints or using other algorithms were studied in [7]-[12] and corresponding conditions were given to ensure the system converge to the optimal point or its neighborhood. However, as in [1]-[5], it was assumed in [6]-[12] that the state sets of agents to be identical or the objective function finally converge to only a neighborhood of the optimal set.
In this paper our work is to extend [14] to study the penalty primal-dual subgradient projection algorithm in a more general method. In [14], the authors solved a multi-agent convex optimization problem where the agents subject to a global inequality constraint, a global equality constrain and a global constraint set. In order to solved these constraints, the author in [14] presented two different distributed projection algorithms with three assumptions that the union of the graphs is assumed to be strongly connected among each time interval of a certain bounded length and the adjacency matrices were doubly stochastic and non-degeneracy. However, [14] guaranteed the edge weight matrices of graphs were doubly stochastic (i.e., for all and , and for all and ). Previous work did not perform well on the application of the distributed algorithms in multi-agent network.
Contributions: The subgradient algorithm (we proposed) is different with the approach proposed in [14] in properties and analysis. In our approach, the communication topology is without loss of generality. This paper does not recur to the assumption that the adjacency matrices are doubly stochastic, and we only require the network is weight-balanced, which makes our algorithm more practical. In this paper, we consider a general multi-agent optimization problem where the main focus is to minimize a global objective function which is a sum of local objective functions, subject to global constraints, including an inequality constraint, an equality constraint and a (state) constraint set. Each local objective function is convex and only known by one particular agent. On the other hand, the inequality (resp. equality) constraint is given by a convex (resp. affine) function and known by all agents. Each node has its own convex constraint set, and the global constraint set is defined as their intersection. Particularly, we assume that the local constraint sets are identical. Our main interest is in computing approximate saddle points of the Lagrangian function of a convex constrained optimization problem. To set the stage, we first study the computation of approximate saddle points (as opposed to asymptotically exact solutions) by using the subgradient method with a constant step-size. We consider constant step-size rule because of its simplicity and practical relevance, and because our interest is in generating approximate solutions in finite number of iterations.
The paper is organized as follows. In Section II, we give some basic preliminaries and concepts. Then, in Section III, we present our problem formulation as well as distributed consensus algorithm preliminaries. We then introduce the distributed penalty primal-dual subgradient algorithm with some supporting lemmas and continue with a convergence analysis of the algorithm in Section IV. Furthermore, the properties of the algorithm are explored by employing a numerical example in Section V. Finally, we conclude the paper with a discussion in Section VI.
2. Preliminaries and Notations
In this section, we first introduce some preliminary results about graph theory, the properties of the projection operation on a closed convex set and convex analysis (referring to [13], [14]).
A. Algebraic Graph Theory
The communication among different nodes in an information interplay network can be modeled as a weighted directed graph , where is the set of nodes with representing the th node, is the edge set, and is the weighted adjacency matrix of with nonnegative adjacency elements and zero diagonal elements. A directed edge implies that node can reach node or node can receive information from node . If an edge , then node is called a neighbor of node and . The neighbor node set of node is denoted by , while we define as the number of neighbors of node . The Laplacian matrix associated with the adjacency matrix is defined by ; , which ensures that . The Laplacian matrix has a zero eigenvalue, and the corresponding eigenvector is . Note that the Laplacian matrix of a directed graph is asymmetric. The in-degree and out-degree of node can be respectively defined by the Laplacian matrix as : and . A directed path from node to node is a sequence of edges in the directed graph with distinct nodes . A directed graph is strongly connected if for any two distinct nodes and in the set , there always exists a directed path from node to node . A graph is called an in-degrees (or out-degrees) balanced graph if the in-degrees (or out-degrees) of all nodes in the directed graph are equal. A directed graph with nodes is called a directed tree if it contains edges and there exists a root node with directed paths to every other node. A directed spanning tree of a directed graph is a directed tree that contains all the network nodes.
B. Basic Notations and Concepts
The following notion of saddle point plays a critical role in our paper.
Definition 1 (Saddle point): Consider a convex-concave function , where , and are closed convex subsets in and . We are interested in computing a saddle point of over the set , where a saddle point is defined as a vector pair that satisfies
, for all
In this paper, we do not assume the function at some points are not differentiable, and the subgradient plays the role of the gradient.
Definition 2 : For a given convex function and a point , a subgradient of the function at is a vector such that the following subgradient inequality holds for any :
Similarly, for a given concave function and a point , a supgradient of the function at is a vector such that the following supgradient inequality holds for any :
We use to denote the projection of a vector on a closed convex set , i.e.
In the subsequent development, the properties of the projection operation on a closed convex set play an important role. In particular, we use the projection inequality, i.e., for any vector
for all (1)
We also use the standard non-expansiveness property, i.e.
for any and (2)
In addition, we use the properties given in the following lemma.
Lemma 2.1: Let be a nonempty closed convex set in . Then, we have for any ,
(a) , for all .
(b) , for all .
Proof:
(a) Let be arbitrary. Then, for any , we have
By the projection inequality [cf. (1)], it follows that
implying
, for all
(b) For an arbitrary and for all , we have
By using the inequality of part (a), we obtain
, for all
Part (b) of the preceding lemma establishes a relation between the projection error vector and the feasible directions of the convex set at the projection vector.
The following notations besides those aforementioned will be used throughout this paper. denotes the set of all -dimensional real vector spaces. Given a set , we denote by its convex hull. We write or to denote the transpose of a vector or a matrix . We let the function denote the projection operator onto the non-negative orthant in . Denote and . For a vector , we denote , while is the standard Euclidean norm in the Euclidean space. In this paper, the quantities (e.g., functions, scalars and sets) associated with agent will be indexed by the superscript .
3. Problem Statement
We consider a multi-agent network model. The nodes connectively at time can be represented by a directed weighted graph , where is the set of activated edges at time , i.e., edge if and only if node can receive data from node , and is the adjacency matrix, in which is the weight assigned to the edge at time . Please note that the set is the set of edges with non-zero weights . In this paper the agents are to correspondingly solve the following optimization problem:
(3)
where is a convex objective function of agent , and is a nonempty, closed, compact and convex subset of . In particular, we study the cases where the local constraint sets are identical i.e., for each agent, and is a global decision vector. Assume that is only known by agent . The function is known by all the agents with each component , for , being convex. The inequality is component-wise; i.e., , for all , and represents a global inequality constraint. The function , represents a global equality constraint, and is known by all the agents. Let denote the optimal value of (3) and denote an optimal solution of (3). We assume that the optimal value to be finite. We also represent the optimal solution set by , i.e., . We will assume that in general is non-differentiable.
To generate optional solutions to the primal problem of Eq. (3), we consider optional solutions to its dual problem. Here, the dual problem is the one arising from penalty relaxation of the inequality constraints and equality constraints . Note that the primal problem (3) is trivially equivalent to the following:
with associated dual problem given by
Here is the penalty dual function defined by , where is the penalty function given by . We often refer to vector with as two multiplier. We denote the dual optimal value by and the dual optimal set by . We define the penalty function for each agent as follows: . In this way, we have that . We say that there is zero duality gap if the optimal value of the primal and the dual problems are equal, i.e., . As proven in the following lemma, the Slater’s condition in Assumption 3.1 ensures zero duality and the existence of penalty dual optimal solutions.
Assumption 3.1 (Slater’s Condition): There exists a vector such that and . And there exists at least one interior of , i.e. , problem (3) has finite optimal solution, and has nonempty interior point.
Lemma 3.1: Let the Slater condition holds, the values of and coincide, and is non-empty.
Proof: Define Lagrangian function as , with the associated dual problem defined by
(4)
Here, the dual function, . The dual optimal value of problem (7) is denote by and the set of dual optimal solutions is denoted by . Since is convex, and , for , are convex, and is finite and the Slater’s condition holds, we can conclude that and . We now proceed to characterize and . Pick any . Since , then
(5)
On the other hand, pick any . Then is feasible, i.e., and . It implies that holds for any and , and thus . Therefore, we have .
To prove the non-empty of , we pick any . From (5) and , we can see that and thus .
Throughout this paper, we use the following assumption for problem (3).
Assumption 3.2: Let the following conditions hold:
The set is closed and convex.
Each function is convex.
All functions have Lipschitz gradients with a constantfor all .
The gradients are bounded over the set , i.e., and there exists a constant such that for all and all .
When each has Lipschitz gradient with a constant , assumption 3.2(3) is satisfied with . When is compact, the Assumption 3.2(4) holds. We here make the following assumptions on the network communication graphs .
Assumption 3.3 (Non-degeneracy): There exists a constant such that , and , for , satisfies , for all .
Assumption 3.4 (Weight-balanced): is weight-balanced if , for all .
Assumption 3.5 (Periodical Strong Connectivity): There is a positive integer such that, for all , the directed graph is strongly connected.
Lemma 3.2 （Saddle-point Theorem）: The pair of is a saddle point of the function over if and only if it is a pair of primal and penalty dual optimal solutions and the following penalty minimax equality holds:
Based on this characterization, we will use the subgradient method of the following section for finding the saddle points of the penalty function. We denote , for each and we define the function as . Note that is convex in by using the fact that a nonnegative weighted sum of convex functions is convex. For each , we define the function as . It is easy to check that is concave in . Then the penalty function is the sum of convex-concave local functions.
Lemma 3.3 (Dynamic Average Consensus Algorithm) [21] : The following is a vector version of the first-order dynamic average consensus algorithm with :
We set for . The sequences of satisfy and . Suppose that periodical strong connectivity Assumption 3.5 holds. Assume that for all and all . Then for all .
Proof: Define
where is referred to as the reference signal (or input) of node at time .
We propose the First-Order Dynamic Average Consensus Algorithm below to reach the dynamic average consensus:
Let and be fixed. Then for every , there exists a real number such that for every integer , and , it holds that for
(6)
(7)
Without loss of generality, we only consider the case where , being identical with the proof for a general . Fixing some , it holds that
Let , we have that
(8)
Since (8) holds for all ,
(9)
Applying recursive method, it follows that
(10)
Since at every , we have that
(11)
where we are using the property of (10) in the last two inequalities. Applying repeatedly (11), we have that, for any integer , the following holds for
where .
Now we proceed by induction on . Suppose that (6) holds for some ; then we should show (6) for . By the induction hypothesis, we have that for all integer , there exists some such that the following holds for
Consequently, as in (11), we have
Following along the same lines as in (11), we obtain for all where and . This establishes (6) for . By induction, we have shown that (6) holds. The proof for (7) is analogous.
Let , then for any . By replacing and in (4) with and respectively. We have that for every
Similarly, we can see that
Combining the above two inequalities gives that
Denoting for an integer . From (9), we know that . Thus we have
where .
For any , let be the largest integer such that , and . Thus for all it follows that
(12)
Since and are input-to-output stable with ultimate bound ; i.e., there exist and such that
Choosing as initial state for all . Since , we can deduce that
(13)
It follows from (13) that and thus
Let , for any . The implementation of the Dynamic Average Consensus Algorithms ensures that . So we can conclude that
Thus, holds.
Consider the following Distributed projected subgradient algorithm proposed in [13]: Suppose is a closed and convex set. Let . Denote . The following is a slight modification of Lemma 8 and its proof in [13].
Lemma 3.4: Let the non-degeneracy Assumption 3.3, the weighted-balanced Assumption 3.4, and the periodic strong connectivity Assumption 3.5 hold. Then there exist and such that
Suppose is uniformly bounded for each , and , then we have .
4. Distributed Subgradient Methods
In this section, we introduce a distributed penalty primal-dual subgradient algorithm to solve the optimization problem (3), followed by its convergence properties.
Distributed Penalty Primal-Dual Subgradient Algorithm
We consider a set of agents. Each agent chooses any initial state , and . At any time , each agent computes the following convex combination:
and updates its estimates , and according to the following ways:
(14)
where the scalars are nonnegative weights and the positive scalars are step-sizes, is the projector onto the set . The vector is a subgradient of the agent ‘s penalty function at , where is the convex combination of dual estimates of agent and its neighbors’. keeps to the following rules:
Remark 4.1: Since , it follows that
where is the Laplacian matrix such that .
Proof: Modifying the second term on the right-hand side in the above formula, we then have
Let , one has
Since graph is balanced, then and . We can conclude that and hold under the condition that satisfies . Similarly, we obtain , and .
Assumption 4.1 (Step-size assumption): The step-sizes satisfy , , and , , .
In the following, we study the convergence behavior of the subgradient algorithm introduced in this section where the optimal solution and the optimal value is asymptotically agreed upon.
Theorem 4.1 (Convergence properties of the DPPDS algorithm): Consider the problem (3). Let the non-degeneracy Assumption 3.3, the weight-balanced Assumption 3.4 and the periodic strong connectivity Assumption 3.5 hold. Consider the sequences of and of the distributed penalty primal-dual subgradient algorithm, where the step-sizes satisfy the step-size Assumption 4.1. Then there exists a primal optimal solution such that for all . Furthermore, we have for all .
Remark 4.2: The distributed penalty primal-dual subgradient algorithm takes the equality constraint into account. The presence of the equality constrain can make unbounded. Therefore, unlike other subgradient algorithm, e.g., [15], [16], the distributed penalty primal-dual subgradient algorithm does not involve the dual projection steps onto compact sets. So we do not guarantee the subgradient not to be absolutely bounded, while the boundedness of subgradients is a standard assumption in the analysis of subgradient methods, e.g., see [6], [13], [17], [18], [19], [20]. The step-size of Assumption 3.1 is stronger than the more standard diminishing step-size scheme in [22] and this will correctly deal with the difficulty of the boundedness of . We give this condition in order to prove, in the absence of the boundedness of , the existence of a number of limits and summability of expansion toward Theorem 4.1. Finally, we adopt the penalty relaxation instead of the Lagrangian relaxation in this paper.
Remark 4.3 (Penalty subgradient inequality): Observe that and (due to the fact that is convex and ). Moreover, is a supgradient of ; i.e. the following penalty supgradient inequality holds for any and :
(15)
Proof: Observe that holds for all , , and is arbitrarily. Thus,
and
Followed by the properties of supgradient, we obtain
Remark 4.4: In this paper, we apply the harmonic series into our subgradient algorithm. It’s easy to check that satisfies the step-size Assumption 4.1 (for more details, one may refer to [14]).
A Convergence Analysis
In the following, we will prove convergence of the distributed penalty primal-dual subgradient algorithm. First, we rewrite our algorithm into the following form:
where is projection error described by
and , , are some local inputs. Denote the maximum deviations of dual estimates by and . We further denote the averages of primal and dual estimates as and . Since is compact, and and are continuous, there exist such that for all , it holds that for all , and . Since is a compact set and , , are convex, then it follows from Proposition 5.4.2 in [19] that there exist such that for all , we have that , and .
Lemma 4.1: Let . Consider the sequence defined by , where , and .
(a) If , then .
(b) If , then .
The proof of Lemma 4.1 can be referred to Lemma 5.1 in [14].
Lemma 4.2 (Diminishing and summable properties): Suppose the weighted-balanced Assumption 3.4 and the step-size Assumption 4.1 hold.
(a) It holds that , and the sequences of , and are summable.
(b) The sequences of , , , and are summable.
Proof: (a) Noticing that
Then, we show that
Recalling that ,. This implies that the following inequalities hold for all :
Then we deduce the following recursive estimate on . Repeatedly applying the above estimates yields that
(16)
where .
Similar arguments can be employed to show that
(17)
Since and , then we know that and . Noticing that
Then, the following estimate on holds:
(18)
Recalling that , and . Then we have . By (16), we obtain
It follows from the step-size Assumption 4.1 that . Similarly, one can show that . Multiplying both sides of by and square, then we deduce the following recursive estimate:
Then the summability of , and testifies that of .
(b) Noticing that
(19)
Then following from Lemma 3.4 with and , we have the summability of . Then is summable. Similarly, it holds that .
We now consider the evolution of . Recalling that . By Lemma 2.1, let , and , we get
Regrouping the above estimates, we obtain
With the above relation, from Lemma 3.4 with and , the following holds for some and :
(20)
Multiplying both side of (20) by and using (18), for all , it yields
By applying the relation of and sorting out, we get
Part (a) gives that is summable. Meanwhile, are bounded, and , then we can say that the first term on the right-hand side in the above estimate is summable. Recalling that , it’s easy to check that the second term is also summable. Part (a) gives that and is summable. Then according to the Lemma 7 in [13] with , ensure that the third term is summable. In summary, is summable. Following the same lines in (19), one can show the summability of . Similarly, and are summable.
Lemma 4.3 (Basic iteration relation): The following estimates hold for any and :
(21)
and
(22)
Proof: By Lemma 2.1, we can deduce that . Let , we have
Expanding and regrouping the above formula, we obtain
Owing to the subgradient inequality ,
it follows that:
Lemma 4.4 (Achieving consensus): Let assumption 3.3-3.5 holds. Consider the sequences of and of the distributed penalty primal-dual subgradient algorithm with the step-size sequence and the associated satisfy , . Then there exists such that for all . Furthermore, , and for all .
Proof: By Lemma 4.3, we see that
Owing to , one can show that
Since , taking the limit on in the above inequality, then it follows that:
and thus exists for any :
On the other hand, taking limits on both sides of the above inequality, we have and therefore we deduce that for all . It follows from Lemma 4.1 that for all . Combining this with the property, we can deduce that for all .
Since , is continuous, for , and , we can deduce that , , .
Claim 1: For any and , the sequences of and are summable
Proof: Observing that
Recalling that , we then have
By using the summability of and in part (b) of Lemma 4.2, we have that are summable. Similarly, the following estimates hold:
Due to , and holds for all , the following estimates hold:
Then the property of in part (b) in Lemma 4.2 implies the summability of the sequence and that of
.
Claim 2: Denote the weighted version of the local penalty function over as . The following property holds: .
Proof: Summing (21) over and replacing by , we can deduce that
(23)
The summability of in Part (b) implies that the right-hand side of (21) is finite as , and thus
(24)
On the other hand, is a saddle point of over . Since, then we have . Claim 1 and (24) renders that
and thus . On the other hand, (due to is convex) implies . Similarly, we have the following estimates . Thus .
Claim 3: Denote . And we denote the weighted version of the global penalty function over as
The following property holds: .
Proof: Noticing that
(25)
By using the boundedness of subgradients and the primal estimates, we can see that
Noticing that , and holds for all , the following estimates hold:
(26)
Then it follows from (b) in Lemma 4.2 that is summable. Notice that . Following the Claim 2, hence, .
Claim 4: The limit point in Lemma 4.4 is a primal optimal solution.
Proof: Let . By and , we get
(27)
This indicates that the sequence is non-decreasing in . Observing that is lower bounded by zero. Therefore, we give the following two cases:
Case 1: The sequence is upper bounded. Then is convergent in . Then it follows from Lemma 4.4 that for all . This implies that there exists such that for all . Recalling that in (27). Following a recursive step, we can get . Since and , we obtain . Since for all , we have for all , then and thus .
Case 2: The sequence is not upper bounded. Since is non-decreasing, then by . Recalling that and , then it follows from (a) in Lemma 4.1 that it is impossible that . Suppose that . Then we obtain
(28)
Taking limits on both sides of (28), then we get
Finally, we reacha contradiction, implying that .
In both cases, we obtain for any . Similarly, we can further prove . Since , then is feasible and thus . For another, since is a convex combination of and , then it follows from Claim 3 and (b) in Lemma 4.1 that:
Hence, we have and thus .
Lemma 4.5: It holds that .
Proof: Since , , then the following holds for any
The following can be proven by induction on for a fixed :
(29)
Let in (29) and recall that initial state for all . Then we obtain
(30)
From (30), we can obtain
(31)
Combining (31) with gives the desired result. Based on the above five Lemmas, we then finish the prove of Theorem 4.1.
5. Numerical Example
In this section, we study a simple numerical example to illustrate the effectiveness of the proposed distributed penalty primal-dual subgradient algorithm. Consider a network with five agents. Suppose each agent has a function , given by
where the global decision vector . The global inequality constraint function is given by , the global equality constraint function is give by and the global constraint set is given as: . are parameters of , whose values are randomly choosen from the intervals . Consider the optimization problem as follows:
(32)
We solve problem (32) by employing the distributed penalty primal-dual subgradient algorithm (14) with the step-size . Its simulation results are shown from Figs. 1 to 5. It can be seen from Fig. 1 that local input tends to when it achieves consensus. Fig. 2 shows the state evolutions of all five agents, which demonstrate that all agents’ takes iterates to asymptotically achieve consensus. The state evolutions of dual solution and are shown in Figs. 3 and 4, respectively. We can observe from Fig. 5 that all the agents asymptotically achieve the optimal value.
Fig. 1. Local input tends to when achieve consensus.
Fig. 2. Optimal solution of primal problem.
Fig. 3. Optimal solution of dual problem.
Fig. 4. Optimal solution of dual problem.
Fig. 5. Optimal solution of objective function .
6. Conclusion and Future Work
In this paper, we formulated a distributed optimization problem with local objective functions, a global equality, a global inequality and a global constraint set defined as the intersection of local constraint sets. In particular, we considered the local constraint sets to be identical. Then, we proposed a distributed penalty primal-dual subgradient algorithm for the constrained optimization with a convergence analysis. Moreover, we employed a numerical example to show that the algorithm was asymptotically converge to primal solutions and optimal values. Future workmay aim at the analysis that the local constraint sets of each agent are imparities. Also, we will pay attention to the convergence rates of the algorithms in this paper.
Acknowledgements
This work described in this paper was supported in part by the Natural Science Foundation Project of Chongqing CSTC under grant cstc2014jcyjA40041, in part by the Scientific and Technological Research Program of Chongqing Municipal Education Commission under KJ1501401.
References