Research on Innovating and Applying Evolutionary Algorithms Based Hierarchical Clustering and Multiple Paths Routing for Guaranteed Quality of Service on Service Based Routing
Nguyen Thanh Long^{1}, Nguyen Duc Thuy^{2}, Pham Huy Hoang^{3}
^{1}Software development division III, Informatics Center of Hanoi Telecommunications, Hoan Kiem, Hanoi, Vietnam
^{2}Center for applied research and technology development, Research institute of Posts and telecommunications, Hanoi, VietNam
^{3}Information technology institute, Ha Noi University of Science Technology, Hanoi, Vietnam
Email address:
To cite this article:
Nguyen Thanh Long, Nguyen Duc Thuy, Pham Huy Hoang. Research on Innovating and Applying Evolutionary Algorithms Based Hierarchical Clustering and Multiple Paths Routing for Guaranteed Quality of Service on Service Based Routing. Internet of Things and Cloud Computing. Vol. 3, No. 3, 2015, pp. 22-28. doi: 10.11648/j.iotcc.s.2015030601.12
Abstract: In Service Based Routing (SBR), data is transmitted from a source node to destination nodes are not depended on destination addresses. Hence, it is comfortable with new advanced technology as cloud computing and also flexible and reliable. Genetic and Queen-Bee algorithms (GA, QB) are artificial intelligence techniques for combinatorial optimization problems solving based on some natural rules. In which GA is a branch of the evolutionary strategies that uses some principles of evolution theory, such as natural selection, mutation and crossover. QB is performed based on GA for energy saving. The tree is an effective data structure that can be used to organize a hierarchical clustering network with fast establishing, updating, tuning algorithms. The usage of the Greedy algorithm to find cyclic routes or multiple paths on each trunk by multiple criterions to transmit data effectively.
Keywords: Service, Routing, Multi-Paths, Bandwidth, Energy, MANET, Cluster, Clustering, Cluster Head, Genetic, Queen Bee, Greedy
1. Introduction
1.1. Service Based Routing Model
SBR defines the routing based on the messages it carries. It classifies messages based on message content or its key-value pairs or regular expressions on its content. Therefore, in SBR, receiver determines transmission of messages, not by sender.
1.2. Cluster Based Manet
MANET is established from nodes that usually move and change very fast; no node has the role as a gateway or router for relaying network traffic between network nodes. Based on clustering technique, the whole network will be divided into sub-networks or clusters. In every cluster, it needs to find out a node that manages this cluster that is called cluster head. Routing in main network is performed mainly by network backbone of some cluster heads.
1.3. Concepts of Network Clustering
Network Cluster: A cluster is a group of some related nodes in a network; there is at least one path between any pair of nodes that satisfies conditions of quality of service in time [4].
a). Central Clustering Technique
Each cluster has an array of cluster head nodes (CH) that are individually responsible to manage its cluster, control the communication between cluster nodes. In case of having one cluster head, so if any error happens on CH that causes this cluster of the network stops it’s processing.
b). Distributed Clustering Technique
No node controls network centrally; network node manages its resources itself. It is usually used as backup case of central clustering or when network nodes are distributed over large space so if using central clustering will increase overhead dramatically.
c). Combined Clustering Technique
This technique combines two above clustering techniques. In some case, it uses central clustering, in some other case, it uses distributed clustering. For example, in hierarchical network clustering, in one basically cluster, it uses central clustering, and in an upper level cluster, it can use distributed clustering for reducing control messages or overhead.
d). Routing Technique in Clustered Network
In clustered network, the routing function is divided into two smaller functions: i) inner cluster routing; ii) inter-cluster routing. Each cluster will only maintain the routes that meet the quality of service.
1.4. Overview of Evolutionary Algorithms
Evolutionary algorithms (EAs) has been studied, experimented and applied in many fields, it solves many problems faster than traditional methods especially in solving optimization problems.
2. Some Basical Concepts
2.1. Evolutionary Algorithm’s Concepts
2.1.1. Definition
An evolutionary algorithm (EA) is a generic population based meta-heuristic optimization algorithm that consists of some of evolutionary computations.
EA used mechanisms based on biological evolution, such as selection, reproduction, mutation and recombination.
EA makes the best solution based on selecting some candidate solutions using a fitness function then applying evolutionary operations.
2.1.2. Artificial Evolution (AE)
AE is a process that involves individual EAs. In this paper, introduces the two AE algorithms that are the Genetic Algorithm (GA) and Queen-bee algorithm (QB) for network clustering in service based routing in MANET.
2.2. Hierarchical Network Clustering
2.2.1. Hierarchical Clustering Concepts
It defines a clustered network in hierarchical clustering by a recursive expression: N = {, , …, }, in that =, k is an integer variable that is a number of sub-clusters in each cluster . Can be a sub-cluster or a node. This cluster model will establish a tree network topology with multiple levels of clusters; they establish a hierarchical tree structure. At the root level, this is main cluster: root C has some children {}, each child is a node or a smaller cluster. In turn, each child cluster has some its child clusters.
R^{+} tree can be defined by a recursive expression:
={, , …, }, is a sub tree or a leaf node, High () = High() = … = High () = High() - 1.
Tree likes many other tree structures; it has a root, tree branches and leaves. In order to keep the tree being balance, each inner node of the tree has a number of child nodes in the range [m, M]. In that, the pair of m and M is the lower and upper bounds of the number of its children. There are three kinds of nodes in this tree: I) Leaf node: stores an array of cluster heads of a basically cluster at the lowest level. At one time it chooses one being the cluster’s cluster head. Each cluster head is chosen by a fuzzy logic controller (FLC) by assessing some its metrics [8]. Each cluster head is represented by a component of two parts: node identifier ID and a filter F that is composed of some logical predicates. F is a logic condition to choose a cluster head; ii) Inner/ root node: stores information of an upper level cluster that consists of a set of cluster heads and its child clusters. Its structure is similar to a leaf node, but its children are child clusters with smaller size. The algorithms are used to insert, update, delete a node in this tree are similar to which of R+ tree applied for forwarding technique [7]. All members of each cluster are also chosen by FLC. When a node takes part MANET, its information will be updated to R+ tree. It will join a basically subnet, these network topology will be replied back to the node.
2.3. Multiple Paths Routing Model
Assume that a network is defined by a weighted graph G(T)={V(T), L(T)}, V(T) and L(T) are vertex and edge sets. Dividing network into n clusters: G = {, , …, }, in each use an inner routing protocol to find a set of routes: RS = {, , …, }, is the route set between the cluster head CH and a node member i. In order to find routes between two nodes (p, r) that belong to two clusters (, ), use the formula:
= x x (1)
Where x is the Descartes operator of these two sets. Therefore we have multiple paths between node p and r.
If number of routes in , , are , , respectively, the number of routes from p up to r is:
(2)
The method is used to create multiple paths on each branch of route that can be Greedy algorithm as introducing in the next section. Assuming that the route consists of multiple trunks, each trunk is belonged to a cluster of the network. On each cluster, we find all the cyclic sub-routes by the Greedy algorithm. By using Genetic algorithm, there are a pair of the best routes that will be found after the algorithm’s execution ending. So using the mutation operator to get one route for uploading or sending and another route for downloading or replying to make a cyclic route from a source to all destination nodes. It may be reduced cost and latency for a transmission session.
3. Use Genetic Algorithm for Hierarchical Network Clustering
3.1. Definition
Genetic algorithm (GA) is a well-known adaptive heuristic search algorithm based on the evolutionary ideas of natural selection and genetic operations. The GA is designed based on the principles of Charles Darwin by the use of the fittest survival. It represents random and intelligent search within a defined search space to find the optimal solution to solve a problem.
3.2. Some Concepts of Genetic Algorithm
1) Chromosome: a chromosome C contains a genome, which is usually represented by a bit string. This bit string represents nodes of each cluster, in that each node is represented by a binary digit 1 or 0. Cluster head node is represented by 1 and member node is represented by 0.
2) Population: a population P contains a set of some chromosomes: P = .
3) Operating principles:
Genetic algorithm operates based on hill-climbing algorithm that uses local searches. GA is modeled loosely on the principles of the evolution theory via natural selection.
Following is the process of executing GA:
1. It randomly generates an initial population called by M(0), 0 is the order of the first round;
2. Execute the following steps until a satisfied solution is obtained or the execution time expired;
3. Assume current round being K, M(K) is the current population, compute and save the fitness value u(m) for each individual m in M(r);
4. Define the selection probability value p(m) for each individual m, therefore p(m) is proportional to u(m). Calculate p(m) by the formula:
p(m) = (3)
5. Generate M(r+1) by probabilistically choosing individuals from the set M(r) via genetic operators:
Selection: select the n best individuals based on their selection probabilities p (m), get the set S. Usually choose n=2;
Crossover: execute this operator with probability , use the crossover operator to get the set S’: {, } = Crossover(, ): {, G_{C}, } {, G_{C}, } = {(, G_{C}, ), (, G_{C}, )} = {, }, G_{C }is a common gene and is crossover operator; (iii) Mutation: execute this operator with probability :
= Mutation () with= {, g, }. From gene g, eliminate low residual energy nodes, get .
Remove cyclic routes from and finally add to .
So get the next population: M(r+1) = (M(r) S) .
After the GA finishes, it gains M () is the result of the problem.
3.3. Use Genetic Algorithm for Hierarchical Network Clustering
We can define a cluster in the hierarchical clustering network by a recursive expression: C = {, , …, }, k is an integer: the number of clusters. In the first round, root cluster C has some child clusters, then each child cluster has a number of components. In turn, each can be a node/ cluster.
a) At the first level (0), a cluster consists of nodes’ identities that are represented by a sequence of integers. Each cluster is seen as a chromosome in GA. With the first n identities are of its headset, the last identities are its cluster members.
b) A population has several chromosomes; some chromosomes that have the best fitness will be used to generate the next population.
c) Refer to HCR [3] and ICRP: in this paper, use CH itself in the role of the base station for forwarding messages between clusters. On local cluster, use a proactive routing protocol such as OLSR for nodes communicating to establish clusters based on GA. Choose headset size, fitness function for clusters automatically tuning. For finding route to transmit messages by using the pair of message RREQ and RREP [1]. The headset can operate by round robin method or by combining two or more CHs for increasing bandwidth and availability.
d) Establish the fitness function:
Fitness function to estimate the fitness of a chromosome is designed based on some input parameters such as: (i) C: the cluster distance, the sum of all distances from nodes to CH; (ii) SD: it is standard deviation within cluster distances; (iii) E: it is estimated transfer energy that is consumed to transmit messages through the current cluster; (iv) T: the number of transmissions that pass through the cluster. The output of the F is an energy evaluation to transfer messages: F=u(m), in which m is the cluster, based on this to calculate: p(m)= by Eq. 1.
if =0: this cluster is needed to regulate; if = 1: this cluster is well organized.
e) Some needed parameters for the fitness function:
i. Direct distance metric to the CH: D = , is distance from a node member i to the cluster head C, n is the number of nodes.
ii. Standard deviation metric:
SD = (4)
Where, h is the number of clusters in the network and
(5)
so is the average cluster distance.
iii. Consumed energy metric: E is computed by the energy consumed to transfer the aggregated message between cluster members and cluster head and backwards by the formula:
E = + n (6)
Where is the energy consumed to transmit messages from a member node i to the cluster head. is energy consumed to receive a message from a cluster member, n is the number nodes of the cluster.
iv. The number of transmissions: T, for each data transmission stage, the cluster head assigns the number of transmissions denoted by T. The value of T is adjusted according to the network conditions and current energy levels.
f) The fitness function: The chromosome fitness is F, which is a function of all above parameters, that is defined by the function:
F = , {D, SD, E, T} (7)
The initial fitness parameters are assigned the arbitrary weights {}. After that, at each round the best fit chromosome is evaluated and the weights for fitness parameters are updated as follows:
(8)
where is the change in fitness parameter value of f and .
GA evaluates all clusters after each round and finds the best fittest cluster depend on its parameters.
3.4. Advantages
a) In generalization, it is easy to handle arbitrary kinds of constraints and objectives; every component is weighted by a fitness function adapting the GA scheduler in deciding solution on a very wide range of possible overall objectives.
b) Energy optimization.
4. Queen-Bee Algorithms Based on GA for Energy Efficient Clusters
The Queen-Bee algorithm (QB) [2] is based on GA in addition with some parameters and operations: i) the process for choosing headset P (t) that consists of n elements. As in fact, it divides a whole bee group into sub-groups; each sub-group has a Queen-bee that manages its own group. At first, it chooses a random set of Queen Bees that are CHs of some clusters. We get bees into each cluster by the same algorithm of the above GA algorithm. We get some clusters of the network. In each round of QB algorithm, in each cluster the Queen bee and bees communicate with each other by inner routing protocol. After the predefined time duration, assesses each cluster by calculating fitness function for each cluster. Choose the two best clusters for applying some genetic operations. Then make the new generation for the next round of the algorithm. The set P (t) is established by couples of a queen-bee Iq(t-1) and a selected bee Im(t-1) and ii) some clusters are mutated by strong mutation probability in every generation process. The QB is proved to reduce energy consumption, so increase the network lifetime. Like GA, QB adds some very important parameters as: P’m: strong mutation probability, Iq: a queen-bee, Im: selected bees. The QB is presented as below:
1) Assign t := 0;
2) initialize P(t)
3) evaluate P(t)
4) while (not termination-condition)
5) do {
a. t := t+1;
b. select P(t) from P(t-1)
c. P(t) = {Iq (t-1), Im (t-1)} /*new in QB*/
d. recombine P(t) from (b) and (c)
e. Choose bees for each cluster by the same algorithm as in Genetic Algorithm
f. do crossover
g. do mutation /*(new in QB)*/
h. for i = 1 to n
i. if (i ≤ (σ x n))
do mutation with Pm
ii. else
do mutation with P’m
i. end for
j. evaluate P (t) }
In which: t: time, n: population size, p: populations, σ: normal mutation rate, Pm: normal mutation probability. Generally, Pm is less than 0.1 and P’m is greater than Pm. Queen-bee evolution is similar to neutrality in that the queen-bee, the fittest individual in the current generation, will crossbreed with the other bees be selected as parents by the selection algorithm. The offsprings mainly depend on the crossover operation and the fittest individual. As the result, it also increases the probability of premature convergence.
5. Use Greedy Algorithm to Assess GA in Each Round
In each round, on each cluster when its occupied space that may be calculated by GPS coordination system is not large in accordance with multiple paths routing as introduced above. It is easier to use the greedy algorithm to evaluate the population after evaluating it by the fitness function based on some conditions.
5.1. Greedy Algorithm Concepts
The greedy algorithm has the typical characteristics that are the same as GA. It makes offsprings based on a set of some candidates that are used to choose the best solution with some functions: i) a selection function that chooses the local best solution in each round of the round loop to complement the temporary result set (S_{R}); ii) a feasibility function that assesses the feasibility of a member of the population to complement the candidate set; iii) a purpose function that is used to calculate the whole cost of a member of the population; iv) a evaluation function that assesses a candidate if it is the requested solution of the problem. This algorithm divides the global search region into many sub-regions and find a local optimal solution in each sub-region. It assesses the global optimal solution in each round, if it is satisfied optimal conditions then algorithm ends.
5.2. The Algorithm Operations
The main idea of this algorithm is to create some optimal cyclic routes [9] by two found parents so the found routes will be optimized in cost and bandwidth. Main idea is depended on the idea from paper [9]. Usually when a route from a source node to a destination node is found. At that time, the destination uses route information in RREQ message to create the RREP to reply back immediately to the source node. But in the new method, this destination node continuously forwards this RREQ to network. Until the predefined hop count reaches or the timeout expires that are not satisfied then when this RREQ is back to the source node. So it makes one cyclic route with guaranteed quality of service. After a predefined time interval, the source node will collect all the found routes. If no cyclic route is found then it uses normal route information to transfer data over network. In the case of using cyclic route, that it uses this route to transmit data to multiple destination nodes at one time. It avoids route-cutting situations by some traditional route finding protocols. For example, when a destination node is found in a cluster, it will find all another destination nodes in this cluster, finally making route reply message to reply back to the source node.
• Combine edges from two best parent’s solutions chosen by the fitness function.
• Create cycles by alternately selecting edges from two parents.
• Choose a subset of cycles based on some conditions.
• Generate a immediate solution by taking one parent and replacing all its edges that are in the subset of cycles of second parents’ edges in its subset of cycles. At this time, having a solution that has some sub-routes are not connected to the source node.
• Use greedy algorithms to optimize route partially. Its purpose is to make route that orginates from a source node to all destination nodes with minimum cost. It can use a greedy algorithm to choose a part of the route with the least cost.
(R)= (9)
5.3. Use Epsilon-Greedy Mutation Operator
In the paper [10], it has innovated the mutation operator for Genetic algorithm using a multiplicative variable , which is initialized by a constant value. After using crossover operator, GA executes the mutation operator normally on the given offspring. But in each round of GA, it performs mutation operator with a probability P that can be calculated by the algorithm 12 in [10]. At first, P is only depended on previous state of network information. After that the probability will depend on two elements: a part of previous network information and a part on randomly chosen information. Finally, its operation will only depend on randomly chosen information.
6. Simulation and Evaluation
6.1. Comparing between GA and QB
6.2. Simulation on Using R+ Tree for Hierarchical Clustering Routing
Setup tree to store network information, add node information to R by the algorithm in [7]. So add node to the fittest physical cluster at ’s leaf level then regulating tree. Made simulation on number of physical clusters changing in the range of . The number of members in each cluster varies in the range . In each simulation, we performed GA with crossover probability of = 0.5, mutation probability with = 0.1. In this paper, this process was simulated in 10 times, the results had been collected to draw on the below diagram:
Fig. 4. The simulation results of using and GA to make the optimal hierarchical clusters.
The above diagram shows that the time required to execute the process using tree and GA algorithm hasn’t been increased continuously, sometimes this time is reduced.
6.3. Simulate the GA Algorithm to Make Clusters
With some parameter defined above: population size in , at first chromosome size 30, number of nodes is 1000, headset size is 5. In each time of the execution of GA to make optimal clusters in 20 rounds with = 0.5 and = 0.1. In this simulation, execute 3000 times and make a diagram to represent the results:
The diagram shows that the time requires to execute the GA algorithm to make optimal clusters is rather stable. This time doesn’t increase when the number of clusters increase and when the number of times of execution increases.
6.4. Simulate to Find the Optimal Route by GA Algorithm
Population size: 20 to 30; : 0.2 to 0.9, : 0.05 to 0.2; (iii) Chromosome size: 20; Simulation: 1000 times:
Time doesn’t increase when amount of routes increase
7. Conclusion and Future Researches
In order to increase route bandwidth and availability, in this paper have mentioned the evolutionary algorithms for network clustering and genetic algorithms for finding optimal routes for transmitting data. Dividing large block of data into some smaller packages to transmit these packets on QOS multiple paths for increasing speed and reliability. It uses network clustering for reducing the energy required for transmission. Using the genetic algorithm to reduce cost and increase bandwidth to transmit data. Some above-mentioned simulations have proved all these results of these algorithms. In the near future, we will research some techniques to encode data based on data characteristics to increase security in MANET.
References