Cooperative Transmission Scheme of Energy Harvesting Tags

: Energy harvesting tags with cooperative communication capabilities are recently emerging as a viable infrastructure for internet of things (IoT) applications. This letter studies the cooperative transmission strategy for a network of energy harvesting active networked tags (EnHANTs), that is adapted to the available energy resource and identification request. We consider a network of tags to communicate with the reader either directly or by cooperating with neighboring tags. We formulate the problem as a Markov decision process (MDP). The simulation results are provided to show the performance of the cooperative transmission policy under various energy harvesting scenarios.


Introduction
Cooperative communication and energy harvesting are practical solutions to overcome the battery and communication reliability problems in wireless devices. Energy-harvesting active networked tags (EnHANTs) are recently proposed as tiny devices that can be attached to common place objects [1], [2]. EnHANTs can also be applied to tracking and monitoring. They can communicate with one another and with EnHANT-friendly devices to cooperate and forward information to the intended destination. In order to effectively utilize the random energy resource and maintain reliability, efficient cooperative transmission scheduling needs to be designed. Wang et al. [3] proposed an optimal transmission policy for single link system. An EnHANT-equipped object might lack sufficient energy to respond directly to the reader when the reader is outside the communication range. In such a case, to sustain communication, the object can link its information to a neighboring object, in which the neighboring object forwards the information to the reader by using the concept of a relay [4] -- [6]. We consider an amplify and forward (AF) relaying because of its low level of complexity. In an AF, the relay simply amplifies and forwards the received data to the destination. In [7], the use of energy-harvesting nodes as AF cooperative relays that assist communication of source and destination was proposed. In [8], the authors studied energy efficient scheduling strategies for wireless sensor networks with energy-harvesting. They considered a case where a node may use either direct transmission or a cooperative relay transmission and formulated the problem using Markov decision process (MDP).
In this letter, we consider a network of EnHANTs in which a tag has two options (i.e. direct and cooperative) to communicate with the reader. The relay tag assists its neighbor without affecting its own transmission. We assume that energy detection technique and accurate synchronization timer is employed at each tag. We consider a cooperative transmission strategy that optimizes the long-term average throughput by taking into account both the identification request state and energy constrains.

A. Communication System Model
We consider a network of three tags (T 1 , T 2 , T 3 ) and a reader (R), in which each tag communicates with a reader by using the direct and the cooperative modes. Assume that AF relaying protocol is employed and all copies of the relayed signal of a given tag are combined using maximum ratio combining (MRC) [7] at the reader side so as to achieve diversity gains. Each tag can directly communicate with R if it has sufficient battery energy or it can communicate with the assistance of a neighboring tag if its energy cannot support direct communication. The neighboring tag cooperates only if its stored energy is sufficient to receive and relay the data.
A tag can have different battery states distinguished by three battery thresholds ( , , , i=1, 2, 3.) If the battery energy level is below the minimum threshold, i , T i can not respond to reader requests. This means that the battery energy is either empty or below this minimum value which is not enough for transmission of data. If the battery is above i and below a second threshold , then T i can respond to the reader's request only through the assistance of a neighboring tag since T i is out of reach of the reader. On the contrary, if the battery energy level is above and below a third threshold i , T i can independently communicate with the reader upon request. However, a tag whose battery is in this state has no relaying capability for neighboring tags. This is because it requires additional energy to receive the data from the neighboring tag and forward them to the reader. Unconventionally, transmitting is cheaper than receiving in terms of energy in EnHANTs. If a tag has a battery energy level above i , then it can relay data for a neighboring tag as well as it can communicate its own data with the reader.
It is assumed that communication takes place in time slotted fashion and battery energy parameter exchanges are conducted between tags and a reader before any transmission attempt. Assume each time interval comprises three equal time slots for ordered transmission of T i , i=1, 2, 3. A tag with highest battery energy level (above ) is selected to forward data for a neighboring tag during cooperation. Each tag transmits packets that contain information symbols to the reader. Without loss of generality, each transmitted packet can be represented in terms of L encoded PPM symbols. The m th encoded PPM symbol of a packet from T i can be represented as = [ , , , , … , , ], where , ∈{0, 1} is the n th data bit of the m th symbol and J is the number of information bits per encoded PPM symbol. We assume each tag uses ultra wideband PPM [3]. The received signal at R from T i in the direct mode is represented as = ϑ x + n , where n is the ambient Gaussian noise with zero mean and variance N 0 . ϑ is the channel coefficient from T i to R. Accordingly, the instantaneous SNRs of T i at R can be written as = / , where is the transmitted signal energy of . When T i lacks energy for direct communication, a neighboring tag T j , j = 1, 2, 3, and j ≠ i of sufficient energy can assist relaying T i 's information. The cooperative mode occurs during two equal transmission phases of T i 's time slot. During the first phase of T i 's time slot, T i broadcasts its signal to T j and R. During the second phase of T i 's time slot, T i becomes silent and T j relays the information it received from T i to R. The received signal vectors at T j and R due to the transmitted information from T i in the first phase are represented as ! = ϑ ! x + n ! and = ϑ x + n , respectively, where ϑ ! and ϑ are the channel coefficients from T i to T j and R, and n ! and n are the ambient Gaussian noises at T j and R with zero mean and equal variance N 0 . T j amplifies the information and forwards it to the reader. Accordingly, the received signal vector at R from T j in the second phase is Γ is the signal amplification factor of T j and written as Γ = $ ! /( ! + ), ϑ ! is the channel coefficient from T j to R, n ! is the ambient Gaussian noise and $ ! is the transmitted signal energy of ! . We assume that the reader decodes the information after combining the signals received from T i and T j by using MRC. Accordingly, the total end-toend SNR at R when T j is used as an AF relay can be written as [9], [10]: where ' ! = ! , / is the instantaneous SNR of T i at T j and ' = ! -! / is the instantaneous SNR of T j at R. Therefore, the SNR at the reader from T i can be written as follows using either of the modes: ' ; 012345 6703 ' ! ; 477832951:3 6703 (2) We consider the path loss effect based on the tags' position. Let 0 and 0 ! , i ≠ j be the distances of the T i R and T i T j links, respectively. Without loss of generality, we can model = 0 ;< and ! = 0 ! ;< , where η is the path loss exponent. We assume that the reader processes the received PPM signal by using a compressive sensing (CS) technique and signal detection methods [3], [11] to avoid the need to employ high sampling rate A/D converters. When using this detection method, the probability of mis-detecting a PPM symbol is expressed as follows [3]: Where @( ) = √ H I 3 JK L L 05, M and ' is determined using (2). The mis-detection error (3) occurs when the reader fails to successfully decode the data transmitted by a tag. In addition, the reader may not get any response when a tag lacks energy in its battery, leading to the probability of no-response error, 8 N = 1 {P Q ) R } , where T is the symbol energy of T i at the k th time interval and the indicator function 1 {X} is defined as 1 {X} = 1 if X is true, and 0 otherwise. Accordingly, at any given time interval k, the weighted of these two error probabilities results in communication error probability in EnHANTs is defined as follows [3]: where β ∈ [0, 1] is a weighting factor that constitutes the two errors under one performance metric. α k = 1 indicates a reader requests and α k = 0 indicates no request.

B. Energy Harvesting Dynamics
We assume that each tag has a finite rechargeable battery capacity and light harvesting device. The reader is assumed to have no power constraint. Let \ T be the battery energy of T i at the beginning of the k th time interval and ℎ T be the amount of energy harvested by T i during the k th time interval. We consider the random energy arrival process and assume that ℎ T takes discrete values from the set H = {H 1 , H 2 , ···, H D }. Let ^_ Q ,_ Q`a be the state transition probability from state h k to h k+1 . Denote ^b a ,^b L , … ,^c to be the steady state energy harvesting probability corresponding to H 1 , H 2 , ···, H D , respectively. The request of the reader at the beginning of the k th time interval is modeled as α k ∼ Bernoulli (r). The energy stored in the battery of T i at time interval k for use in the subsequent time interval is determined as: where g T = i T + j T • k T and \ h is the maximum battery capacity. j T = 1if T i cooperates, and 0 otherwise. k T includes energy to exchange state information before data transmission attempt and energy to transmit and receive data from neighboring tag during cooperation. Let l T = (g T a , g T L , g T m ) be the joint energy consumption of T 1 , T 2 and T 3 expressed in finite discrete values and A be the set of all possible joint energy consumption by all tags (i.e. n = (0, 9 , 9 , … , 9 p )).
, , , T and k T are determined based on the hardware design.

A. Performance Measure
Each time slot in the time interval k is assumed to be able to transmit a packet. Denote the state of the tags as r T = (\ T a , \ T L , \ T m , ℎ T a , ℎ T L , ℎ T m , [ T ) and s to be the set of all possible states. The transmission policy π is a mapping from the states s to the energy consumption n. Given the current state S k and the policy π: s → n, the packet throughput of tag T i at the k th time interval can be expressed as where R s is the symbol rate and 8 U is defined in (4). Accordingly, the long-term average throughput of the tags throughout all time-intervals is expressed as: and the optimization problem that maximizes the average packet throughput can be formulated as follows: •: maximize H… † •(z) subject to: battery state (5) This optimization problem can be computed using an MDP to obtain an optimal policy.

B. Markov Decision Process Formulation
, where s is the set of states, n is the set of actions, 8 h V• ' , • ' W denotes the transition probability from state s i to state s j when an action a i ∈ n is taken, and v h (• ' , • ' ) is the reward due to transitions from state s i to state s j when an action a i is taken. The goal of an MDP is to choose a policy π that assigns actions to each state and maximizes the average reward. At any time interval k, T 1 , T 2 and T 3 consume the joint energy A k to send their packets to the reader either directly or by cooperating. The joint choice A k causes a state change from S k to S k+1 , yielding the immediate reward of v " Q (r T , r T+ ) . v " Q (r T , r T+ ) is chosen to equal the throughput and R s is normalized to one. " denotes such normalized throughput corresponding to T i and can be expressed as follows: and the immediate joint reward is determined as follows: v " Q (r T , r T+ ) = " a V T a W + " L V T L W + " m ( T • ) (10) Thus, the infinite horizon average reward of the tags is expressed as: where A k = π (S k ). Comparing (7) and (11) shows that these equations are identical, excepting a scaling factor equal to R s . Therefore, the optimal cooperative transmission policy, π, can be solved as follows by using the MDP problem: • -: maximize H… † v '_ (z) subject to: battery state (5) Because all components of S k are discrete, a finite number of states exists. When the action A k is taken, the state S k yields a transition to any of the next possible states S k+1 . The state transition probability 8 " Q (r T , r T+ ) is the probability that the system will go to state S k+1 when action A k is taken at state S k during the k th time interval. Since the tags harvest energy independently and the reader randomly requests for information, the state transition probability can be defined as: The proposed model is a unichain MDP model [12]. There exists a deterministic, stationary policy that exhibits average reward criteria, yielding steady-state transition probability distribution. Thus the optimal policy π * : s → n can be determined by solving the optimality equation for an average expected reward criteria and is expressed as follows: where λ * is the optimal average reward and v * (S k+1 ) are the optimal rewards when starting at state r T+ = ž = V\ T+ a , \ T+ L , \ T+ m , Ÿ , Ÿ , Ÿ , 0W, … , V\ T+ a , \ T+ L , \ T+ m , Ÿ • , Ÿ • , Ÿ • , 1W = .
The relative value iteration (RVI) algorithm [12] can be applied to compute the optimal cooperative transmission policy for (14).

Numerical Results
We evaluate the performance of the optimal cooperative transmission policy by considering various energy harvesting scenarios. We assume that each message transmitted from the tags comprises L = 4 PPM symbols with a symbol modulation order of K = 32. The symbol energy of each tag    For performance comparison purpose, we consider the optimal direct transmission policy [3] and cooperative transmission policy for two tags [5]. Figure 1 provides comparisons of the average throughput performance of a tag when the steady-state probabilities of (H 1 , H 2 , H 3 ) for T 1 , T 2 , and T 3 are (0.33, 0.33, 0.33). All the tags harvest equal proportion of energy from the environment and can assist each other in forwarding data in this energy balanced scenario. In this scenario, the optimal cooperative policies for both two-tag and three-tag cases achieve equal average throughput performances per tag due to identical energy harvesting environment for all the tags. These performances are superior compared with performance of the noncooperative policies. Figure 2 shows second scenario when the steady-state probabilities of (H 1 , H 2 , H 3 ) for T 1 , T 2 , and T 3 are (0.33, 0.33, 0.33), (0.28, 0.44, 0.28) and (0.11, 0.39 ,0.50), respectively. T 3 has a better energy harvesting condition and assists both T 1 and T 2 . The average throughput per tag of {T 2 , T 3 } outperforms the performance of both {T 1 , T 2 , T 3 } and {T 1 , T 2 } mainly due to variation in the energy harvesting conditions of the tags. The performance of {T 1 , T 2 } is inferior compared to others due to worse harvesting conditions of T 1 and T 2 . In all cases, the cooperative transmission policy outperforms the direct transmission policy proposed by [3]. Figure 3 depicts the average throughput performance when the steady-state probabilities of (H 1 , H 2 , H 3 ) for T 1 , T 2 , and T 3 are 0.11, 0.39 and 0.50.
In this energy surplus scenario, all the tags are at good energy harvesting conditions and tags mainly communicate with the reader independently without helping each other. As a result, the average throughput performance per tag for the proposed three tags and two tags [5] are identical. The optimal direct policy is also close to the proposed policy due to good energy harvesting conditions of the tags. Figure 1 and Figure 3 show that the average throughput performance per tag of the cooperative transmission policy we proposed are identical for two different sizes of tags under the same energy harvesting conditions.

Conclusion
In this letter, we formulated the optimal cooperative transmission problem by considering the case of three tags, aiming to maximize the long-term average throughput of tags. We used MDP RVI algorithm to obtain numerical results under different energy harvesting scenarios. The proposed cooperative transmission policy showed better performance in terms of average throughput compared to the direct transmission policy. However, the joint cooperative performance of the three tags and two tags in terms of the average throughput per tag are identical under the same energy harvesting conditions. The results demonstrate that EnHANTs can jointly use the time varying energy resources efficiently and achieve improved communication reliability.