Innovating R Tree for Message Forwarding Technique and Hierarchical Network Clustering in Service Based Routing

: In service based routing (SBR) [5], the problem for storing forwarding table that includes searching predicates received from subscribers through subscription messages is an important job. When a content request happens, the subscriber will create a subscription message. The subscription message stores several kinds of information, the most important content is a filter that is a conjunction of some constraints [5]. A filter is denoted by F character, that has mathematical formula: F = (cid:1) C (cid:3) (cid:4)(cid:3)(cid:5)(cid:6) , in which C (cid:6) is service request, has the format: C (cid:6) = ‘Service_name = requested service name’. Every C (cid:3) (i= 2. . n ) is a constraint that is formed by three components: (Key, op, Value), Key is a keyword for searching, op is an operator, Value is searching condition. Key belongs to the set of name of properties of content messages that the requested service supplies. Op is an operator that depends on the type of data of the Key. Therefore the forwarding table is a set of filters which are received from all subscribers on networks. The algorithms for inserting, updating, deleting and finding filters that match content messages have been published by service providers are very important. In this paper, We mention a technique for forwarding technique on the basic of summary filter for storing and searching filter quickly. This technique is based on some previous researches. In section 8, give an algorithm for finding all network nodes that have matched filters with a content message. Section 7 introduces a cluster routing technique based on summary filter.


Subscription/ Un-Subscription Messages
The subscription/ un-subscription message consists of subscriber's address and its content request. In which, content request is a set of some constraints that have a common form: (key, operator, value). A constraint is denoted by C: C = (Key, Operator, Value).
In which: Key is a keyword to search content in content message. Operator determines the operation between a pair of a Key and a value. Value is a condition on a property of content message.
So each subscription/ un-subscription message contains a set of constraints that is called a filter that is denoted by F character. Each filter is presented by the formula: F = , n is number of constraints. Service based address of a network node is a set of some filters received from this node that is called a predicate that is denoted by P: P = . n is number of filters. The service based addresses are stored in service based routing tables. and binary search: (a) scan each property of a content message; (b) For each property [p], find all matched constraints with this p; (c) After this loop, get the set of some matched constraints: {C 1 , C 2 , …, C n }. (d) Use the SBR_Couting algorithms to find all nodes that have made filters that are satisfied; ii) By using summary filter, it will be done by algorithm that is introduced in section 7.

Summary Filter
Definition: Assume there are some filters: F 1 , F 2 , …, F n , the summary filter of these filters is a filter that is denoted by . The summary filter is determined by the formula: = . So this summary filter satifies: Therefore the value domain of covers all value domains of each filter: ⋑ F : i= 1. . . Operator ⋑ is introduced in the next section.

Level of a Filter
The number of constraints of a filter is the level of this filter. For example: F = : Level(F) = n.

Some Previous Researches
There some researches on summary filters, that focuses on increasing searching speed and network throughput. There are some researches use summary filters for network clustering [6]. They use summary filter for partitioning subscription messages on some searching nodes. It uses filter summary for maintaing the locality of subscription information. As in [6], they use inaccuracy summary filter to increase network thoughput by 200% compares to using accuracy summary filter. In their network, the node automatically adjusts the level of inaccuracy for reducing network congestion. On the other hand, a node can forward subscription messages to other nodes for avoiding repartitioning its filters. R tree have been used in some fields, for example, use R tree for spatial objects searching in Google Map. R tree also been used for storing, searching content based addresses in content based routing protocol.

Overarching Concept
Considering the overarching concept, assume that there are two filters and are content requirements of one service that arising from network. We suppose that covers by the notation: if all content messages satisfy conditions of F 2 then they satisfy . Symbolize by: In that: (4) For satisfying condition (2) must have: {K , K " , …, Suppose the constraint: P = [K op V], X is called the value domain of this constraint. If the set of values {v} are belonged to X that will satisfy the constraint that means the expression: Validate(v op V)=true.
If the condition (2) is true, at the same time with the conditions: n≤m and K ≡ K , op ≡ op , must have: That means all the keys and operators are contained in F 1 that are also contained in F 2 , domain value of each of constraint of covers domain value of each constraint of with the same key. We suppose that F 1 and F 2 overlap each other if the following terms are satisfied: Assume specific domain of each constraint has magnitude of |X|, so that specific domain of the filter has magnitude of:

Application of R Tree
When receiving a new subscription message from network interface (I) which requests service S, it has a filter that is denoted by the formula: F= . We have to check whether F is covered by an existing filter. If there is no filter, have to check whether it overlaps any existing filter. For doing this task efficiently have to apply and innovate R tree, in this chapter it is called R + , this tree nowadays is used in many different fields. For example R tree is applied for storing space objects in Google MAP, digital Map, System Paging file with high efficiently in storing and searching.

Structure of R + Tree
Structure of leaf node: it is a set S that consists of n elements, each element consists of two items P and FS: P is a pointer that points to a filter node F that is indexed in this R + tree and FS is a summary filter that covers filter F: FS ⋑ F. The filters are added to R + tree by the algorithm that will be presented in the next section. a) Each filter is indexed in R + tree that belonged to a filter node that consists of two components: IL and F, in which IL is a list of addresses of routers that have sent subscription messages which have the same filter F, F is a filter that consists of some constraints. b) Similar to the leaf node, the structure of an inner node is presented in Fig. 2: c) Each inner node stores a set of n elements, each element consists of two components: P and FS. P is a pointer that points to an inner or leaf node. FS is a summary filter that covers the whole set of n summary filters of the nodes pointed by P. This means: FS= . . The following diagram describes the covering relationship with n=6: Filter of node R covers its children's nodes' filters (R 1 to R 6 ):  = because when number of child nodes of one node reaches (M+1), We have to split this node into two nodes so that each node has an approximately equal number of child nodes. When number of child nodes of a node is less than m, have to remove this node and adjust this tree by an algorithm that will be introduce in the next section. 3. R + tree is a balance tree, this means that the height from any leaf node to the root node of the tree is equal. The proving is based on building tree. 4. Thus the height of the tree from a leaf node to the root node is satisfied the following un-equation system:

The Establishing Algorithm
At first R + tree has only one node, this is both leaf and root of tree, this node points to an indexed filter node that is created to store the filter of a new subscription message that has been received at current router. When adding a new filter F to R + tree, We have to find a leaf node N, add F to N. The process for finding node N begins from root node R: 1) If R has its child nodes are indexed filter nodes then: N=R; 2) If R has some child nodes. Assume that R consists of n components: S = { , , ..., }, C i = (Q , . ). For each : i) Find minimum filter F $ satisfies:  (2) for \ ]^ _ continuously until \ ]^ _ is leaf node. 5) Assign: N = \ ]^ _ . Check filter in each indexed filter node (`a , ) pointed by a component of N. If ≡ F, end the algorithm with conclusion: filter F has already existed in this W P tree, `a = `a ∪ cdde f . Otherwise: 6) Add new indexed filter node contains F to N.

Searching Algorithm
a) The searching algorithm finds a leaf node N that has a filter that conforms most to a given filter F.

Algorithm for Adding a Filter to o P Tree
This algorithm adds a new filter F to a W P tree: check if this filter exists in this W P tree. If it's true then add address of node that emitted F to address list of found indexed filter node and exit the algorithm. Otherwise find a leaf node in this tree to store the filter. The algorithm to insert a filter to a leaf node: 1) Create a filter node to store filter F and I, I is interface address of the router has emitted the subscription message which contained the filter F. Check the condition, whether N has no parent node: (a) If true, new node W q is created, this node has two components and . Assign: .FS=Summary( \ ), .P=\, .FS=Summary(\ ) and .P=\ . Now W q is the root of W P tree, finish algorithm (this is the proving that the root of W P tree has at least two children). (b) Otherwise, go to step (4). 4) N has a parent node, this node is denoted by Q r , Q r stores component r . r has two elements FS and P, P points to N. There are two cases: i) if \ hasn't been created in (3) then recalculate filter summary for N, assign to FS then go to step (6); ii) otherwise go to step (5). 5) Create a new component ^p of Q r , ^p has two elements P, FS. Execute two tasks: i) calculate filter summary of \ , assign to FS: ^p .FS=Summary(\ ); ii) assign P to \ : ^p .P=\ . Go to step (7). 6) Check filter summary of Q r , if it has changed, have to assign N=Q r , then go to step (4). Continue until N is root node. 7) Check number of components of Q r , if this number more than M, assign N=Q r , go to step (3). Continue until go to the root node then finish algorithm.

Algorithm for Deleting a Filter From R + Tree
Assume need to remove a filter F from R + tree. 1) Find F from R + tree by above searching algorithm. If F is found in the tree and leaf node \ contains F, continue next step. Otherwise end algorithm. 2) N has n components: { , , ..., }, for each component of N ( has two elements Q and . ): if . ∧ F ≡ F, go to next step.
3) Assume is filter of indexed filter node (\ ) pointed by Q . There are two case: (i) If ≡ F then remove the address of the node that have made the request from the list of addresses (IL) of \ . If this list is empty (IL≡ ∅): remove \ from this tree. Adjust W P by following regulating algorithm, end the algorithm. (ii) otherwise go to step (2).

Algorithm for Regulating R + Tree
1) Starting from the leaf node S from which a filter is removed. 2) If the number of components of S isn't satisfied the condition (3) then remove S from R + , add S to a set Q which stores temporarily all nodes have removed of W P and go to step (c). If it is satisfied then finish algorithm. 3) If S has parent node then assign S=S.parent. There are two cases: i) if S is not root of the tree then go to step (2); ii) otherwise add root of the tree to Q. 4) Get the last node S that has just been put to the set Q.
Scan the sub-trees of W P tree with root node is S for adding its filters to original W P tree by the above adding algorithm.

Processing Steps of the Algorithm
When number of components of one node is more than upper bound M, thus having to split this node into two nodes with each node has about ; <P = components. Assume current processing node is N: 1. Create new node \ , get ; <P = components from N to \ by one of two algorithms that are described more specific below. 2. If parent of N exists then add N 1 to the set of children of parent node of N and continue step (c); otherwise create new node that will be new root of tree R. Add N and N 1 to the set of children of R, finish the algorithm. 3. Assign N=N.Parent, if number of child nodes of N is more than M then go to step (1), otherwise finish the algorithm.

Algorithm Evaluation
The algorithm has maximum complexity in case of regulating tree from one leaf node to the root node of the tree. This is O[kst (\) * v], in which N is number nodes of the tree, K is complexity of the splitting algorithm.

Algorithm Evaluation
The exhaustive algorithm has complexity that is in proportion to the number of choices the pair of sub-sets . and . , so it is ! | Nw x y z}! .

Algorithm Specification
Assume S is a set of summary filters of current processing node that needs to be split: S={ , , ..., } satisfies equation: n=M+1. The requirement of algorithm is: splitting S into two subsets . and . with almost the same size. Require to choose two beginning elements for these two subsets by the following algorithm: Algorithm 1: 1. Give filter ]^ _ is temporary filter, jd , jd are two integers for storing two indexes of choosen filters, assign filter ]^ _ = ∅, jd =-1, jd =-1; 2. For each filter in the set S of filters execute (3); 3. For each filter € in the set S execute (4); 4. Calculate filter € that covers both filter and € : € = ∨ € ; calculate: F q = F ∖ F , F q = F ∖ F , if ~F q ∖ F q~>~F ƒ"$…~ (in which ∖ is filter subtract operator) then assign: jd =i, jd =j and ]^ _ = ~F q ∖ F q~.
When this algorithm finished, gain jd and jd are indexes of two filters ({ ` , ` } = { [ M , [ y }) for beginning filters of two sets . and . . Choose next elements for each set . and . € according to following algorithms: Algorithm 2: 1. Assume and are summary filters of . and . respectively, assign =∅, =∅. 2. For each filter of the set: S ∖ { ` , ` } execute (c). 3. Calculate F 1 '=( ∨ ) ∖ , F 2 '=( ∨ ) ∖ , in which denote ( ∨ ) is filter that is result of extending to cover . Go to step (d). 4. If ' > ' : put into S 2 and assign = ∨ (4). 5. If ' < ' : put into S 1 and assign = ∨ (5). 6. If ' = ' : put into a set that has the number of elements is less than the other set. If two sets have equal number of elements then put into a random set, recalculate (4) or (5) respectively. Go to (b) for the next item of S.

Evaluating Algorithm
The algorithm 1 has complexity of O(n 2 ), the algorithm 2 has complexity of O(n). Therefore the complexity of Quadratic algorithm is O(n 2 ).

Algorithm Specification
We have to split the set S of summary filters of the current processing node into two approximate equal sets . and . . Some assumptions: a) Each filter of S has a set of predicates, each predicate has a value domain D. b) Each value domain has lower bound ‡ and upper bound ‡ ˆ‰ , so that D=[ ‡ , ‡ ˆ‰ ]. The algorithm for finding two first elements of two subsets . and . a) For each key K, find maximum lower bound a ˆ‰ and minimum upper bound Š of all predicates of S which have key K. These bounds are belonged to two filters a ˆ‰ , Š respectively. b) For each K, also find the minimum lower bound a and maximum upper bound Š ˆ‰ of all predicates of S that have key K. These bounds are belonged to two filters a and Š ˆ‰ respectively. c) Calculate for each key: d) Choose two first elements of two subsets S 1 and S 2 from a pair of filters ( a , Š ˆ‰ ) which has ‹ that is maximum on all their keys. e) With (n-2) remaining filters, put each filter to one of two sets S 1 and S 2 in turn.

Evaluating Algorithm
The complexity of algorithm to choose two first filters of two sets S 1 and S 2 is rated by formula (| | * |.| ), | | is number of predicates of filter F, |.| is number of filters of S.
The complexity of algorithm to put (n-2) remaining filters to two sets S 1 and S 2 is O(n), n is number of filters of the set of filters S.

Cluster routing Concept
Each node builds it's own R + tree [6] when it receives subscription filter from network. The root of each tree will store filter summary of its routing table. Define a point (P) is centroid of the rectangle that is established from each filter summary. To cluster network based on centroid of subscription filter, assume that: i) the space of all filter is S, ii) S consists of some sub-spaces {S 1 , S 2 , ..., S n }. So when a node on a cluster receives one filter, it will forward this filter to appropriate cluster head based on point P of this filter.

Cluster Routing Performance
Use this cluster routing that will reduce time required to find matched subscriptions with each received content message. When a node receives a content message, this content message is a point on space S. It checks to find sub-space S that P belongs to and S i has cluster head (CH ) respectively. Then the node forwards this content message to CH . In CH all matched subscriptions will be found using above search algorithm. This cluster algorithm has complexity is O(n*m), n is number of sub-spaces and m is number of constraints of each filter, it is nearly a constant.

The Algorithm to Find Nodes Whose Filters Matched a Content Message
Define level of a filter is number of constraints of this filter. In SBR routing protocol, usually uses SBR_Counting to find nodes whose filters matched a content message (M). So the purpose of forwarding algorithm is to find filters that match M. In this section introduce an algorithm to find nodes have filters that match M based on R + tree: (i) start from root of R + tree, assign N to current node of tree. (  As the diagram shows that the time required to build R + does not increase continuously when number of filters increases. b) Build a R + tree and use the Quadratic algorithm for splitting node with the number of filters changes from 100000 to 290000 and delete filters, measure the time required in milli second to delete filters and regulate the tree to draw diagram based on the measured results:  As the diagram shows that the time required to delete filters and regulate R + does not increase continuously when number of filters increases.

Simulation for Applying R + Tree for Hierarchical
Network Clustering 1. Establish R + tree for hierarchical network clustering: The algorithms for establishing, updating R + to hierarchically cluster network as above algorithms for building R + to manage service based routing table. Some modifications: i) A leaf node \ • stores n components ( , , …, ), each component stores a pointer P to an indexed item and a summary filter FS of its child node. Each indexed item stores a pair of two elements node identifier and its filter. Leaf node stores a summary filter of its components and cluster head's ID of its group. So each leaf node give information about one basic group of the network; ii) Similarly, each inner or root node stores information about a upper group of the network. That is a cluster head's ID of its group, some components that have pointers to its child nodes and their summary filters; iii) Condition for inserting a node N to a child group GC of a current group G is: check each components of G, calculate the formula: FT = |(F • ' B • F|, ' is FS of GC, F is the filter of the inserted node. Put N to GC that gives FT is minimum result. Starting this algorithm at root node of R to a leaf node, at the leaf node actually insert N to this basic group and regulate the R tree to the root node for satisfying condition: number of elements of each group in the range [m, M]. 2. Simulate these algorithms of applying R + tree to cluster network Execute the process to insert node to the network until there are 1000 nodes. Perform this process by the regulation: when adding enough 10 nodes, then remove 1 node. Measure time in ms needed to add or remove a node, then collect the results to make a below graph: Figure 6. Simulation results of applying R + tree to insert/ remove node of clustered network.
As the graph shows that the time required to execute does not increase when the number of nodes increases. It is approximate one milli second. Sometimes this time increases for regulating R + tree, but it is lower than one second.

Future Developing
At the current time above algorithms have been simulated to get above operational results are on numeric value type of predicate, in the future applying above algorithms for value type of string.