Methodology Article
Longestpath Algorithm to Solve Uncovering Problem of Hidden Markov Model
Loc Nguyen
Sunflower Soft Company, Ho Chi Minh City, Vietnam
Email address:
To cite this article:
Loc Nguyen. Longestpath Algorithm to Solve Uncovering Problem of Hidden Markov Model. Applied and Computational Mathematics. Special Issue: Some Novel Algorithms for Global Optimization and Relevant Subjects. Vol. 6, No. 41, 2017, pp. 3947. doi: 10.11648/j.acm.s.2017060401.13
Received: March 12, 2016; Accepted: March 14, 2016; Published: June 17, 2016
Abstract: Uncovering problem is one of three main problems of hidden Markov model (HMM), which aims to find out optimal state sequence that is most likely to produce a given observation sequence. Although Viterbi is the best algorithm to solve uncovering problem, I introduce a new viewpoint of how to solve HMM uncovering problem. The proposed algorithm is called longestpath algorithm in which the uncovering problem is modeled as a graph. So the essence of longestpath algorithm is to find out the longest path inside the graph. The optimal state sequence which is solution of uncovering problem is constructed from such path.
Keywords: Hidden Markov Model, Uncovering Problem, Longestpath Algorithm
1. Introduction to Hidden Markov Model (HMM)
Markov model (MM) is the statistical model which is used to model the stochastic process. MM is defined as below [1]:
• Given a finite set of state S = {s_{1}, s_{2},…, s_{n}} whose cardinality is n. Let ∏ be the initial state distribution where π_{i}∏ represents the probability that the stochastic process begins in state s_{i}. We have .
• The stochastic process is defined as a finite vector X=(x_{1}, x_{2},…, x_{T}) whose element x_{t} is a state at time point t. The process X is called state stochastic process and x_{t} S equals some state s_{i} S. X is also called state sequence. The state stochastic process X must meet fully the Markov property, namely, given previous state x_{t}_{–1} of process X, the conditional probability of current state x_{t} is only dependent on the previous state x_{t}_{–1}, not relevant to any further past state (x_{t}_{–2}, x_{t}_{–3},…, x_{1}). In other words, P(x_{t}  x_{t}_{–1}, x_{t}_{–2}, x_{t}_{–3},…, x_{1}) = P(x_{t}  x_{t}_{–1}) with note that P(.) also denotes probability in this article.
• At each time point, the process changes to the next state based on the transition probability distribution a_{ij}, which depends only on the previous state. So a_{ij} is the probability that the stochastic process changes current state s_{i} to next state s_{j}. It means that a_{ij} = P(x_{t}=s_{j}  x_{t–}_{1}=s_{i}) = P(x_{t+}_{1}=s_{j}  x_{t}=s_{i}). The probability of transitioning from any given state to some next state is 1, we have . All transition probabilities a_{ij} (s) constitute the transition probability matrix A. Note that A is n by n matrix because there are n distinct states.
Briefly, MM is the triple 〈S, A, ∏〉. In typical MM, states are observed directly by users and transition probabilities (A and ∏) are unique parameters. Otherwise, hidden Markov model (HMM) is similar to MM except that the underlying states become hidden from observer, they are hidden parameters. HMM adds more output parameters which are called observations. The HMM has further properties as below [1]:
• Suppose there is a finite set of possible observations Φ = {φ_{1}, φ_{2},…, φ_{m}} whose cardinality is m. There is the second stochastic process which produces observations correlating with hidden states. This process is called observable stochastic process, which is defined as a finite vector O = (o_{1}, o_{2},…, o_{T}) whose element o_{t} is an observation at time point t. Note that o_{t} Φ equals some φ_{k}. The process O is often known as observation sequence.
• There is a probability distribution of producing a given observation in each state. Let b_{i}(k) be the probability of observation φ_{k} when the state stochastic process is in state s_{i}. It means that b_{i}(k) = b_{i}(o_{t}=φ_{k}) = P(o_{t}=φ_{k}  x_{t}=s_{i}). The sum of probabilities of all observations which observed in a certain state is 1, we have . All probabilities of observations b_{i}(k) constitute the observation probability matrix B. It is convenient for us to use notation b_{ik} instead of notation b_{i}(k). Note that B is n by m matrix because there are n distinct states and m distinct observations.
Thus, HMM is the 5tuple ∆ = 〈S, Φ, A, B, ∏〉. Note that components S, Φ, A, B, and ∏ are often called parameters of HMM in which A, B, and ∏ are essential parameters. For example, there are some states of weather: sunny, cloudy, rainy [2, p. 1]. Suppose you need to predict how weather tomorrow is: sunny, cloudy or rainy since you know only observations about the humidity: dry, dryish, damp, soggy. We have S = {s_{1}=sunny, s_{2}=cloudy, s_{3}=rainy}, Φ = {φ_{1}=dry, φ_{2}=dryish, φ_{3}=damp, φ_{4}=soggy}. Transition probability matrix A is shown in table 1.
Weather current day (Time point t)  
sunny  cloudy  rainy  
Weather previous day (Time point t –1)  sunny  a_{11}=0.50  a_{12}=0.25  a_{13}=0.25 
cloudy  a_{21}=0.30  a_{22}=0.40  a_{23}=0.30  
rainy  a_{31}=0.25  a_{32}=0.25  a_{33}=0.50 
From table 1, we have a_{11}+a_{12}+a_{13}=1, a_{21}+a_{22}+a_{23}=1, a_{31}+a_{32}+a_{33}=1.
Initial state distribution specified as uniform distribution is shown in table 2.
sunny  cloudy  rainy 
π_{1}=0.33  π_{2}=0.33  π_{3}=0.33 
From table 2, we have π_{1}+π_{2}+π_{3}=1.
Observation probability matrix B is shown in table 3.
Humidity  
dry  dryish  damp  soggy  
Weather  sunny  b_{11}=0.60  b_{12}=0.20  b_{13}=0.15  b_{14}=0.05 
cloudy  b_{21}=0.25  b_{22}=0.25  b_{23}=0.25  b_{24}=0.25  
rainy  b_{31}=0.05  b_{32}=0.10  b_{33}=0.35  b_{34}=0.50 
From table 3, we have b_{11}+b_{12}+b_{13}+b_{14}=1, b_{21}+b_{22}+b_{23}+b_{24}=1, b_{31}+b_{32}+b_{33}+b_{34}=1.
There are three problems of HMM [1] [3, pp. 262266]:
1. Given HMM ∆ and an observation sequence O = {o_{1}, o_{2},…, o_{T}} where o_{t} Φ, how to calculate the probability P(O∆) of this observation sequence. This is evaluation problem.
2. Given HMM ∆ and an observation sequence O = {o_{1}, o_{2},…, o_{T}} where o_{t} Φ, how to find the state sequence X = {x_{1}, x_{2},…, x_{T}} where x_{t} S so that X is most likely to have produced the observation sequence O. This is uncovering problem.
3. Given HMM ∆ and an observation sequence O = {o_{1}, o_{2},…, o_{T}} where o_{t} Φ, how to adjust parameters of ∆ such as initial state distribution ∏, transition probability matrix A, and observation probability matrix B so that the quality of HMM ∆ is enhanced. This is learning problem.
This article focuses on the uncovering problem. Section 2 mentions some methods to solve the uncovering problem, in which Viterbi is the best method. Section 3 is the main one that proposes the longestpath algorithm.
2. HMM Uncovering Problem
According to uncovering problem, it is required to establish an optimal criterion so that the state sequence X = {x_{1}, x_{2},…, x_{T}} leads to maximizing such criterion. The simple criterion is the conditional probability of sequence X with respect to sequence O and model ∆, denoted P(XO,∆). We can apply bruteforce strategy: "go through all possible such X and pick the one leading to maximizing the criterion P(XO,∆)".
This strategy is impossible if the number of states and observations is huge. Another popular way is to establish a socalled individually optimal criterion [3, p. 263] which is described right later.
Let γ_{t}(i) be joint probability that the stochastic process is in state s_{i} at time point t with observation sequence O = {o_{1}, o_{2},…, o_{T}}, equation (1) specifies this probability based on forward variable α_{t} and backward variable β_{t}. Please read [3, pp. 262263] to comprehend α_{t} and β_{t}. The variable γ_{t}(i) is also called individually optimal criterion.
(1)
Because the probability is not relevant to state sequence X, it is possible to remove it from the optimization criterion. Thus, equation (2) specifies how to find out the optimal state x_{t} of X at time point t.
(2)
The procedure to find out state sequence X = {x_{1}, x_{2},…, x_{T}} based on individually optimal criterion is called individually optimal procedure that includes three steps, shown in table 4.
1. Initialization step: 
• Initializing α_{1}(i) = b_{i}(o_{1})π_{i} for all 
• Initializing β_{T}(i) = 1 for all 
2. Recurrence step: 
• Calculating all α_{t+}_{1}(i) for all and 
• Calculating all β_{t}(i) for all and t=T–1, t=T–2,…, t=1 
• Calculating all γ_{t}(i)=α_{t}(i)β_{t}(i) for all and 
• Determining optimal state xt of X at time point t is the one that maximizes γt(i) over all values si. 

3. Final step: The state sequence X = {x_{1}, x_{2},…, x_{T}} is totally determined when its partial states x_{t} (s) where are found in recurrence step. 
It is required to execute n+(5n^{2}–n)(T–1)+2nT operations for individually optimal procedure due to:
• There are n multiplications for calculating α_{1}(i) (s).
• The recurrence step runs over T–1 times. There are 2n^{2}(T–1) operations for determining α_{t+}_{1}(i) (s) over all and . There are (3n–1)n(T–1) operations for determining β_{t}(i) (s) over all and t=T–1, t=T–2,…, t=1. There are nT multiplications for determining γ_{t}(i)=α_{t}(i)β_{t}(i) over all and . There are nT comparisons for determining optimal state over all and . In general, there are 2n^{2}(T–1)+ (3n–1)n(T–1) + nT + nT = (5n^{2}–n)(T–1) + 2nT operations at the recurrence step.
Inside n + (5n^{2}–n)(T–1) + 2nT operations, there are n + (n+1)n(T–1) + 2n^{2}(T–1) + nT = (3n^{2}+n)(T–1) + nT + n multiplications and (n–1)n(T–1) + (n–1)n(T–1) = 2(n^{2}–n)(T–1) additions and nT comparisons.
The individually optimal criterion γ_{t}(i) does not reflect the whole probability of state sequence X given observation sequence O because it focuses only on how to find out each partially optimal state x_{t} at each time point t. Thus, the individually optimal procedure is heuristic method. Viterbi algorithm [3, p. 264] is alternative method that takes interest in the whole state sequence X by using joint probability P(X,OΔ) of state sequence and observation sequence as optimal criterion for determining state sequence X. Let δ_{t}(i) be the maximum joint probability of observation sequence O and state x_{t}=s_{i} over t–1 previous states. The quantity δ_{t}(i) is called joint optimal criterion at time point t, which is specified by (3).
(3)
The recurrence property of joint optimal criterion is specified by (4).
(4)
Given criterion δ_{t+}_{1}(j), the state x_{t+}_{1}=s_{j} that maximizes δ_{t+}_{1}(j) is stored in the backtracking state q_{t+}_{1}(j) that is specified by (5).
(5)
Note that index i is identified with state according to (5). The Viterbi algorithm based on joint optimal criterion δ_{t}(i) includes three steps described in table 5.
1. Initialization step: 
• Initializing δ_{1}(i) = b_{i}(o_{1})π_{i} for all 
• Initializing q_{1}(i) = 0 for all 
2. Recurrence step: 
• Calculating all for all and according to (4). 
• Keeping tracking optimal states for all and according to (5). 
3. State sequence backtracking step: The resulted state sequence X = {x1, x2,…, xT} is determined as follows: 
• The last state 
• Previous states are determined by backtracking: xt = qt+1(xt+1) for t=T–1, t=T–2,…, t=1. 
The total number of operations inside the Viterbi algorithm is 2n+(2n^{2}+n)(T–1) as follows:
• There are n multiplications for initializing n values δ_{1}(i) when each δ_{1}(i) requires 1 multiplication.
• There are (2n^{2}+n)(T–1) operations over the recurrence step because there are n(T–1) values δ_{t+}_{1}(j) and each δ_{t+}_{1}(j) requires n multiplications and n comparisons for maximizing plus 1 multiplication.
• There are n comparisons for constructing the state sequence X, .
Inside 2n+(2n^{2}+n)(T–1) operations, there are n+(n^{2}+n)(T–1) multiplications and n^{2}(T–1)+n comparisons. The number of operations with regard to Viterbi algorithm is smaller than the number of operations with regard to individually optimal procedure when individually optimal procedure requires (5n^{2}–n)(T–1)+2nT+n operations. Therefore, Viterbi algorithm is more effective than individually optimal procedure. Besides, individually optimal procedure does not reflect the whole probability of state sequence X given observation sequence O. The successive section describes longestpath algorithm which is a competitor of Viterbi.
3. Longestpath Algorithm to Solve HMM Uncovering Problem
Essentially, Viterbi algorithm maximizes the joint probability P(X, O∆) instead of maximizing the conditional probability P(XO, ∆). I propose socalled longestpath algorithm based on longest path of graph for solving uncovering problem. This algorithm that maintains using the conditional probability P(XO, ∆) as optimal criterion gives a viewpoint different from the viewpoint of Viterbi algorithm although it is easy for you to recognize that the ideology of the longestpath algorithm does not go beyond the ideology of Viterbi algorithm after you comprehend the longestpath algorithm. Following is description of longestpath algorithm.
The optimal criterion P(XO, ∆) of graphic method is:
(Due to multiplication rule)
(Due to Markov property: the probability of current state is only dependent on the probability of right previous state)
(Because an observation is only dependent on the time point when it is observed)
By recurrence calculation on probability
We have:
Applying Bayes’ rule into the probability , we have:
(Applying Bayes’ rule into the probability )
(Applying Bayes’ rule into the probability )
(Because an observation is only dependent on the time point when it is observed, )
Applying Bayes’ rule into the probability by another way, we have:
(Because an observation is only dependent on the time point when it is observed, )
(Applying multiplication rule into the probability )
Because we had
It implies that
We have
Where,
Because the constant c is independent from state transitions, maximizing the criterion P(XO, ∆) with regard to state transitions is the same to maximizing the product w_{1}w_{2}…w_{t}…w_{T}. Let ρ be this product and so, ρ is the optimal criterion of longestpath algorithm, rewritten by (6).
(6)
Where,
The essence of longestpath algorithm is to construct a graph and then, the algorithm finds out the longest path inside such graph with attention that the optimal criterion ρ represents length of every path inside the graph. There is an interesting thing that such length ρ is product of weights instead of sequence of additions as usual. The criterion ρ is function of state transitions and the longestpath algorithm aims to maximize ρ. Following is description of how to build up the graph.
Each w_{t} representing the influence of state x_{t} on the observation sequence O = {o_{1}, o_{2},…, o_{T}} at time point t is dependent on states x_{t}_{–1} and x_{t}. We will create a graph from these w_{t} (s). Because there are n possible values of x_{t}, the state x_{t} is decomposed into n nodes X_{t}_{1}, X_{t}_{2},…, X_{tn}. There are T time points, we have nT time nodes. Let X = {X_{0}, X_{11}, X_{12},…, X_{1n}, X_{21}, X_{22},…, X_{2n},…, X_{T}_{1}, X_{T}_{2},…, X_{Tn}} be a set of 1+nT nodes where X_{0} is null node. Firstly, we create n weighted arcs from node X_{0} to n nodes X_{11}, X_{12},…, X_{1n} at the first time point. These directed arcs are denoted W_{0111}, W_{0112},…, W_{011n} and their weights are also denoted W_{0111}, W_{0112},…, W_{011n}. These weights W_{011j} (s) at the first time point are calculated according to w_{1} (see (6)). Equation (7) determines W_{1j} (s).
(7)
Your attention please, it is conventional that W_{0i1j} is equal to W_{011j}, because the null node X_{0} has no state.
Moreover, these weights W_{011j} (s) are depicted by fig. 1.
For example, given weather HMM ∆ whose parameters A, B, and ∏ are specified in tables 1, 2, and 3, suppose observation sequence is O = {o_{1}=φ_{4}=soggy, o_{2}=φ_{1}=dry, o_{3}=φ_{2}=dryish}, we have 3 weights at the initial time point as follows:
For each node X_{(t–1)i} where t > 1, we create n weighted arcs from node X_{(t–1)i} to n nodes X_{t}_{1}, X_{t}_{2},…, X_{tn} at the time point t. These directed arcs are denoted W_{(t–1)it1}, W_{(t–1)it2},…, W_{(t–1)itn} and their weights are also denoted W_{(t–1)it1}, W_{(t–1)it2},…, W_{(t–1)itn}. These weights W_{(t–1)itj} at the time point t are calculated according to w_{t} (see (6)). Equation (8) determines W_{(t–1)itj}.
(8)
Moreover, these weights W_{(t–1)itj} (s) are depicted by fig. 2.
Going back given weather HMM ∆ whose parameters A, B, and ∏ are specified in tables 1, 2, and 3, suppose observation sequence is O = {o_{1}=φ_{4}=soggy, o_{2}=φ_{1}=dry, o_{3}=φ_{2}=dryish}, we have 18 weights from time point 1 to time point 3 as follows:
In general, there are (T–1)n^{2} weights from time point 1 to time point T. Moreover, there are n weights derived from null node X_{0} at time point 1. Let W be set of n+(T–1)n^{2} weights from null node X_{0} to nodes X_{T}_{1}, X_{T}_{2},…, X_{Tn} at the last time point T. Let G = <X, W> be the graph consisting of the set of nodes X = {X_{0}, X_{11}, X_{12},…, X_{1n}, X_{21}, X_{22},…, X_{2n},…, X_{T}_{1}, X_{T}_{2},…, X_{Tn}} be a set of n+(T–1)n^{2} weights W. The graph G is called state transition graph shown in fig. 3.
Please pay attention to a very important thing that both graph G and its weights are not determined before the longestpath algorithm is executed because there are a huge number of nodes and arcs. State transition graph shown in fig. 3 is only illustrative example. Going back given weather HMM ∆ whose parameters A, B, and ∏ are specified in tables 1, 2, and 3, suppose observation sequence is O = {o_{1}=φ_{4}=soggy, o_{2}=φ_{1}=dry, o_{3}=φ_{2}=dryish}, the state transition graph of this weather example is shown in fig. 4.
The ideology of the longestpath algorithm is to solve uncovering problem by finding the longest path of state transition graph where the whole length of every path is represented by the optimal criterion ρ (see (6)). In other words, the longestpath algorithm maximizes the optimal criterion ρ by finding the longest path. Let X = {x_{1}, x_{2},…, x_{T}} be the longest path of state transition graph and so, length of X is maximum value of the path length ρ. The path length ρ is calculated as product of weights W_{(t–1)itj} (s). By heuristic assumption, ρ is maximized locally by maximizing weights W_{(t–1)itj} (s) at each time point. The longestpath algorithm is described by pseudocode shown in table 6 with note that X is state sequence that is ultimate result of the longestpath algorithm.
The total number of operations inside the longestpath algorithm is 2n+4n(T–1) as follows:
• There are n multiplications for initializing n weights W_{0111}, W_{0112},…, W_{011n} when each weight W_{011j} requires 1 multiplication. There are n comparisons due to finding maximum weight index .
• There are 3n(T–1) multiplications over the loop inside the algorithm because there are n(T–1) weights W_{(t–1)jtk} over the loop and each W_{(t–1)jtk} requires 3 multiplications. There are n(T–1) comparisons over the loop inside the algorithm due to finding maximum weight indices: .
Inside 2n+4n(T–1) operations, there are n+3n(T–1) multiplications and n+n(T–1) comparisons.
The longestpath algorithm is similar to Viterbi algorithm (see table 5) with regard to the aspect that the path length ρ is calculated accumulatively but computational formulas and viewpoints of longestpath algorithm and Viterbi algorithm are different. The longestpath algorithm is more effective than Viterbi algorithm because it requires 2n+4n(T–1) operations while Viterbi algorithm executes 2n+(2n^{2}+n)(T–1) operations. However, longestpath algorithm does not produce the most accurate result because the path length ρ is maximized locally by maximizing weights W_{(t–1)itj} (s) at each time point, which leads that the resulted sequence X may not be global longest path. In general, the longestpath algorithm is heuristic algorithm that gives a new viewpoint of uncovering problem when applying graphic approach into solving uncovering problem.
Going back given weather HMM ∆ whose parameters A, B, and ∏ are specified in tables 1, 2, and 3, suppose observation sequence is O = {o_{1}=φ_{4}=soggy, o_{2}=φ_{1}=dry, o_{3}=φ_{2}=dryish}, the longestpath algorithm is applied to find out the optimal state sequence X = {x_{1}, x_{2}, x_{3}} as below.
At the first time point, we have:
At the second time point, we have:
At the third time point, we have:
As a result, the optimal state sequence is X = {x_{1}=rainy, x_{2}=sunny, x_{3}=sunny}. The result from the longestpath algorithm in this example is the same to the one from individually optimal procedure (see table 4) and Viterbi algorithm (see table 5).
The longestpath algorithm does not result out accurate state sequence X because it assumes that two successive nodes X_{(t–1)i} and X_{tj} are mutually independent, which leads that the path length ρ is maximized locally by maximizing weight W_{(t–1)itj} at each time point, while equation (6) indicates that the former node X_{(t–1)i} is dependent on the prior node X_{tj}. However, according to Markov property, two intermittent nodes X_{(t–1)i} and X_{(t+1)k} are conditional independent given the middle node X_{tj}. This observation is very important, which help us to enhance the accuracy of longestpath algorithm. The advanced longestpath algorithm divides the path represented by ρ into a set of 2weight intervals. Each 2weight interval includes two successive weights W_{(t–1)itj} and W_{ti}_{(t+1)j} corresponding three nodes X_{(t–1)i}, X_{tj}, and X_{(t+1)k} where the middle node X_{tj} is also called the midpoint of 2weight interval. The advanced longestpath algorithm maximizes the path ρ by maximizing every 2weight interval. Each 2weight interval has 2n^{2} connections (subpaths) because each weight W_{(t–1)itj} or W_{ti}_{(t+1)j} has n^{2} values. Fig. 5 depicts an example of 2weight interval.
The advanced longestpath algorithm is described by pseudocode shown in table 7.
X is initialized to be empty, . 
i = 1 
For t = 1 to T step 2 
// Note that time point t is increased by 2 as follows: 1, 3, 5,… 
Calculating n weights W(t–1)it1, W(t–1)it2,…, W(t–1)itn according to (7) and (8). 
For j = 1 to n 
Calculating n weights Wtj(t+1)1, Wtj(t+1)2,…, Wtj(t+1)n according to (8). 

End for 

Adding two states and to the longest path: 

End for 
Because two intermittent nodes X_{(t–1)i} and X_{(t+1)k} that are two endpoints of a 2weight interval are conditional independent given the midpoint X_{tj}, the essence of advanced longestpath algorithm is to adjust the midpoint of 2weight interval so as to maximize such 2weight interval.
The total number of operations inside the longestpath algorithm is (2n^{2}+1.5n)T as follows:
• There are n multiplications for determining weights W_{(t–1)it1}, W_{(t–1)it2},…, W_{(t–1)itn}. Shortly, there are nT/2 = 0.5nT multiplications over the whole algorithm because time point is increased by 2.
• There are 3n^{2} multiplications for determining n^{2} weights W_{tj}_{(t+1)l} (s) at each time point when each weight requires 3 multiplications. There are n multiplications for determining product . Shortly, there are (3n^{2}+n)T/2 = (1.5n^{2}+0.5n)T multiplications over the whole algorithm because time point is increased by 2.
• There are n^{2}+n comparisons for maximizing: and . Shortly, there are (n^{2}+n)T/2 = (0.5n^{2}+0.5n)T multiplications over the whole algorithm because time point is increased by 2.
Inside (2n^{2}+1.5n)T operations, there are (1.5n^{2}+n)T multiplications and (0.5n^{2}+0.5n)T comparisons. The advanced longestpath algorithm is not more effective than Viterbi algorithm because it requires (2n^{2}+1.5n)T operations while Viterbi algorithm executes 2n+(2n^{2}+n)(T–1) operations but it is more accurate than normal longestpath algorithm aforementioned in table 6.
Going back given weather HMM ∆ whose parameters A, B, and ∏ are specified in tables 1, 2, and 3, suppose observation sequence is O = {o_{1}=φ_{4}=soggy, o_{2}=φ_{1}=dry, o_{3}=φ_{2}=dryish}, the advanced longestpath algorithm is applied to find out the optimal state sequence X = {x_{1}, x_{2}, x_{3}} as follows:
At t=1, we have:
At t=3, we have:
As a result, the optimal state sequence is X = {x_{1}=rainy, x_{2}=sunny, x_{3}=sunny}, which is the same to the one from individually optimal procedure (see table 4), Viterbi algorithm (see table 5), and normal longestpath algorithm (see table 6). The resulted sequence X = {x_{1}=rainy, x_{2}=sunny, x_{3}=sunny} that is the longest path is drawn as bold line from node X_{0} to node X_{13} to node X_{21} to node X_{31} inside the state transition graph, as seen in following fig. 6.
4. Conclusion
The longestpath algorithm proposes a new viewpoint in which uncovering problem is modeled as a graph. The different viewpoint is derived from the fact that longestpath algorithm keeps the optimal criterion as maximizing the conditional probability P(XO, ∆) whereas Viterbi algorithm maximizes the joint probability P(X, O∆). Moreover the longestpath algorithm does not use recurrence technique like Viterbi does but this is the reason that longestpath algorithm is less effective than Viterbi although the ideology of longestpath algorithm is simpler than Viterbi. It only moves forward and optimizes every 2weight interval on the path. The way longestpath algorithm finds out longest path inside the graph shares the forward state transition with Viterbi algorithm. Therefore it is easy to recognize that the ideology of longestpath algorithm does not go beyond the ideology of Viterbi algorithm. However longestpath algorithm opens a potential research trend in improving solution of HMM uncovering problem when Viterbi algorithm is now the best algorithm with regard to theoretical methodology and we only enhance Viterbi by practical techniques. For example, authors [4] applied Hamming distance table into improving Viterbi. Authors [5] propose a fuzzy Viterbi search algorithm which is based on Choquet integrals and Sugeno fuzzy measures. Authors [6] extended Viterbi by using maximum likelihood estimate for the state sequence of a hidden Markov process. Authors [7] proposed an improved Viterbi algorithm based on secondorder hidden Markov model for Chinese word segmentation. Authors [8] applied temporal abstraction into speeding up Viterbi. According to authors [9], the Viterbi can be enhanced by parallelization technique in order to take advantages of multiple CPU (s). According to authors [10], fangled decoder helps Viterbi algorithm to consume less memory with no error detection capability. They [10] also proposed a new efficient fangled decoder with less complexity which decreases significantly the processing time of Viterbi along with 2 bit error correction capabilities. Authors [11] combined posterior decoding algorithm and Viterbi algorithm in order to produce the posteriorViterbi (PV). According to [11], "PV is a two step process: first the posterior probability of each state is computed and then the best posterior allowed path through the model is evaluated by a Viterbi algorithm". PV achieves strong points of both posterior decoding algorithm and Viterbi algorithm.
References