Methodology Article

Longest-path Algorithm to Solve Uncovering Problem of Hidden Markov Model

Loc Nguyen

Sunflower Soft Company, Ho Chi Minh City, Vietnam

Email address:

To cite this article:

Loc Nguyen. Longest-path Algorithm to Solve Uncovering Problem of Hidden Markov Model. Applied and Computational Mathematics. Special Issue: Some Novel Algorithms for Global Optimization and Relevant Subjects. Vol. 6, No. 4-1, 2017, pp. 39-47. doi: 10.11648/j.acm.s.2017060401.13

Received: March 12, 2016; Accepted: March 14, 2016; Published: June 17, 2016

Abstract: Uncovering problem is one of three main problems of hidden Markov model (HMM), which aims to find out optimal state sequence that is most likely to produce a given observation sequence. Although Viterbi is the best algorithm to solve uncovering problem, I introduce a new viewpoint of how to solve HMM uncovering problem. The proposed algorithm is called longest-path algorithm in which the uncovering problem is modeled as a graph. So the essence of longest-path algorithm is to find out the longest path inside the graph. The optimal state sequence which is solution of uncovering problem is constructed from such path.

Keywords: Hidden Markov Model, Uncovering Problem, Longest-path Algorithm

1. Introduction to Hidden Markov Model (HMM)

Markov model (MM) is the statistical model which is used to model the stochastic process. MM is defined as below [1]:

• Given a finite set of state S = {s1, s2,…, sn} whose cardinality is n. Let ∏ be the initial state distribution where πi∏ represents the probability that the stochastic process begins in state si. We have .

• The stochastic process is defined as a finite vector X=(x1, x2,…, xT) whose element xt is a state at time point t. The process X is called state stochastic process and xt S equals some state si S. X is also called state sequence. The state stochastic process X must meet fully the Markov property, namely, given previous state xt–1 of process X, the conditional probability of current state xt is only dependent on the previous state xt–1, not relevant to any further past state (xt–2, xt–3,…, x1). In other words, P(xt | xt–1, xt–2, xt–3,…, x1) = P(xt | xt–1) with note that P(.) also denotes probability in this article.

• At each time point, the process changes to the next state based on the transition probability distribution aij, which depends only on the previous state. So aij is the probability that the stochastic process changes current state si to next state sj. It means that aij = P(xt=sj | xt–1=si) = P(xt+1=sj | xt=si). The probability of transitioning from any given state to some next state is 1, we have . All transition probabilities aij (s) constitute the transition probability matrix A. Note that A is n by n matrix because there are n distinct states.

Briefly, MM is the triple 〈S, A, ∏〉. In typical MM, states are observed directly by users and transition probabilities (A and ∏) are unique parameters. Otherwise, hidden Markov model (HMM) is similar to MM except that the underlying states become hidden from observer, they are hidden parameters. HMM adds more output parameters which are called observations. The HMM has further properties as below [1]:

• Suppose there is a finite set of possible observations Φ = {φ1, φ2,…, φm} whose cardinality is m. There is the second stochastic process which produces observations correlating with hidden states. This process is called observable stochastic process, which is defined as a finite vector O = (o1, o2,…, oT) whose element ot is an observation at time point t. Note that ot Φ equals some φk. The process O is often known as observation sequence.

• There is a probability distribution of producing a given observation in each state. Let bi(k) be the probability of observation φk when the state stochastic process is in state si. It means that bi(k) = bi(ot=φk) = P(ot=φk | xt=si). The sum of probabilities of all observations which observed in a certain state is 1, we have . All probabilities of observations bi(k) constitute the observation probability matrix B. It is convenient for us to use notation bik instead of notation bi(k). Note that B is n by m matrix because there are n distinct states and m distinct observations.

Thus, HMM is the 5-tuple ∆ = 〈S, Φ, A, B, ∏〉. Note that components S, Φ, A, B, and ∏ are often called parameters of HMM in which A, B, and ∏ are essential parameters. For example, there are some states of weather: sunny, cloudy, rainy [2, p. 1]. Suppose you need to predict how weather tomorrow is: sunny, cloudy or rainy since you know only observations about the humidity: dry, dryish, damp, soggy. We have S = {s1=sunny, s2=cloudy, s3=rainy}, Φ = {φ1=dry, φ2=dryish, φ3=damp, φ4=soggy}. Transition probability matrix A is shown in table 1.

Table 1. Transition probability matrix A.

Weather current day (Time point t) | ||||

sunny | cloudy | rainy | ||

Weather previous day (Time point t –1) | sunny | a11=0.50 | a12=0.25 | a13=0.25 |

cloudy | a21=0.30 | a22=0.40 | a23=0.30 | |

rainy | a31=0.25 | a32=0.25 | a33=0.50 |

From table 1, we have a11+a12+a13=1, a21+a22+a23=1, a31+a32+a33=1.

Initial state distribution specified as uniform distribution is shown in table 2.

Table 2. Uniform initial state distribution ∏.

sunny | cloudy | rainy |

π1=0.33 | π2=0.33 | π3=0.33 |

From table 2, we have π1+π2+π3=1.

Observation probability matrix B is shown in table 3.

Table 3. Observation probability matrix B.

Humidity | |||||

dry | dryish | damp | soggy | ||

Weather | sunny | b11=0.60 | b12=0.20 | b13=0.15 | b14=0.05 |

cloudy | b21=0.25 | b22=0.25 | b23=0.25 | b24=0.25 | |

rainy | b31=0.05 | b32=0.10 | b33=0.35 | b34=0.50 |

From table 3, we have b11+b12+b13+b14=1, b21+b22+b23+b24=1, b31+b32+b33+b34=1.

There are three problems of HMM [1] [3, pp. 262-266]:

1. Given HMM ∆ and an observation sequence O = {o1, o2,…, oT} where ot Φ, how to calculate the probability P(O|∆) of this observation sequence. This is evaluation problem.

2. Given HMM ∆ and an observation sequence O = {o1, o2,…, oT} where ot Φ, how to find the state sequence X = {x1, x2,…, xT} where xt S so that X is most likely to have produced the observation sequence O. This is uncovering problem.

3. Given HMM ∆ and an observation sequence O = {o1, o2,…, oT} where ot Φ, how to adjust parameters of ∆ such as initial state distribution ∏, transition probability matrix A, and observation probability matrix B so that the quality of HMM ∆ is enhanced. This is learning problem.

This article focuses on the uncovering problem. Section 2 mentions some methods to solve the uncovering problem, in which Viterbi is the best method. Section 3 is the main one that proposes the longest-path algorithm.

2. HMM Uncovering Problem

According to uncovering problem, it is required to establish an optimal criterion so that the state sequence X = {x1, x2,…, xT} leads to maximizing such criterion. The simple criterion is the conditional probability of sequence X with respect to sequence O and model ∆, denoted P(X|O,∆). We can apply brute-force strategy: “go through all possible such X and pick the one leading to maximizing the criterion P(X|O,∆)”.

This strategy is impossible if the number of states and observations is huge. Another popular way is to establish a so-called individually optimal criterion [3, p. 263] which is described right later.

Let γt(i) be joint probability that the stochastic process is in state si at time point t with observation sequence O = {o1, o2,…, oT}, equation (1) specifies this probability based on forward variable αt and backward variable βt. Please read [3, pp. 262-263] to comprehend αt and βt. The variable γt(i) is also called individually optimal criterion.

(1)

Because the probability is not relevant to state sequence X, it is possible to remove it from the optimization criterion. Thus, equation (2) specifies how to find out the optimal state xt of X at time point t.

(2)

The procedure to find out state sequence X = {x1, x2,…, xT} based on individually optimal criterion is called individually optimal procedure that includes three steps, shown in table 4.

Table 4. Viterbi algorithm to solve uncovering problem.

1. Initialization step: |

• Initializing α1(i) = bi(o1)πi for all |

• Initializing βT(i) = 1 for all |

2. Recurrence step: |

• Calculating all αt+1(i) for all |

• Calculating all βt(i) for all |

• Calculating all γt(i)=αt(i)βt(i) for all |

• Determining optimal state xt of X at time point t is the one that maximizes γt(i) over all values si. |

3. Final step: The state sequence X = {x1, x2,…, xT} is totally determined when its partial states xt (s) where |

It is required to execute n+(5n2–n)(T–1)+2nT operations for individually optimal procedure due to:

• There are n multiplications for calculating α1(i) (s).

• The recurrence step runs over T–1 times. There are 2n2(T–1) operations for determining αt+1(i) (s) over all and . There are (3n–1)n(T–1) operations for determining βt(i) (s) over all and t=T–1, t=T–2,…, t=1. There are nT multiplications for determining γt(i)=αt(i)βt(i) over all and . There are nT comparisons for determining optimal state over all and . In general, there are 2n2(T–1)+ (3n–1)n(T–1) + nT + nT = (5n2–n)(T–1) + 2nT operations at the recurrence step.

Inside n + (5n2–n)(T–1) + 2nT operations, there are n + (n+1)n(T–1) + 2n2(T–1) + nT = (3n2+n)(T–1) + nT + n multiplications and (n–1)n(T–1) + (n–1)n(T–1) = 2(n2–n)(T–1) additions and nT comparisons.

The individually optimal criterion γt(i) does not reflect the whole probability of state sequence X given observation sequence O because it focuses only on how to find out each partially optimal state xt at each time point t. Thus, the individually optimal procedure is heuristic method. Viterbi algorithm [3, p. 264] is alternative method that takes interest in the whole state sequence X by using joint probability P(X,O|Δ) of state sequence and observation sequence as optimal criterion for determining state sequence X. Let δt(i) be the maximum joint probability of observation sequence O and state xt=si over t–1 previous states. The quantity δt(i) is called joint optimal criterion at time point t, which is specified by (3).

(3)

The recurrence property of joint optimal criterion is specified by (4).

(4)

Given criterion δt+1(j), the state xt+1=sj that maximizes δt+1(j) is stored in the backtracking state qt+1(j) that is specified by (5).

(5)

Note that index i is identified with state according to (5). The Viterbi algorithm based on joint optimal criterion δt(i) includes three steps described in table 5.

Table 5. Viterbi algorithm to solve uncovering problem.

1. Initialization step: |

• Initializing δ1(i) = bi(o1)πi for all |

• Initializing q1(i) = 0 for all |

2. Recurrence step: |

• Calculating all |

• Keeping tracking optimal states |

3. State sequence backtracking step: The resulted state sequence X = {x1, x2,…, xT} is determined as follows: |

• The last state |

• Previous states are determined by backtracking: xt = qt+1(xt+1) for t=T–1, t=T–2,…, t=1. |

The total number of operations inside the Viterbi algorithm is 2n+(2n2+n)(T–1) as follows:

• There are n multiplications for initializing n values δ1(i) when each δ1(i) requires 1 multiplication.

• There are (2n2+n)(T–1) operations over the recurrence step because there are n(T–1) values δt+1(j) and each δt+1(j) requires n multiplications and n comparisons for maximizing plus 1 multiplication.

• There are n comparisons for constructing the state sequence X, .

Inside 2n+(2n2+n)(T–1) operations, there are n+(n2+n)(T–1) multiplications and n2(T–1)+n comparisons. The number of operations with regard to Viterbi algorithm is smaller than the number of operations with regard to individually optimal procedure when individually optimal procedure requires (5n2–n)(T–1)+2nT+n operations. Therefore, Viterbi algorithm is more effective than individually optimal procedure. Besides, individually optimal procedure does not reflect the whole probability of state sequence X given observation sequence O. The successive section describes longest-path algorithm which is a competitor of Viterbi.

3. Longest-path Algorithm to Solve HMM Uncovering Problem

Essentially, Viterbi algorithm maximizes the joint probability P(X, O|∆) instead of maximizing the conditional probability P(X|O, ∆). I propose so-called longest-path algorithm based on longest path of graph for solving uncovering problem. This algorithm that maintains using the conditional probability P(X|O, ∆) as optimal criterion gives a viewpoint different from the viewpoint of Viterbi algorithm although it is easy for you to recognize that the ideology of the longest-path algorithm does not go beyond the ideology of Viterbi algorithm after you comprehend the longest-path algorithm. Following is description of longest-path algorithm.

The optimal criterion P(X|O, ∆) of graphic method is:

(Due to multiplication rule)

(Due to Markov property: the probability of current state is only dependent on the probability of right previous state)

(Because an observation is only dependent on the time point when it is observed)

By recurrence calculation on probability

We have:

Applying Bayes’ rule into the probability , we have:

(Applying Bayes’ rule into the probability )

(Applying Bayes’ rule into the probability )

(Because an observation is only dependent on the time point when it is observed, )

Applying Bayes’ rule into the probability by another way, we have:

(Because an observation is only dependent on the time point when it is observed, )

(Applying multiplication rule into the probability )

Because we had

It implies that

We have

Where,

Because the constant c is independent from state transitions, maximizing the criterion P(X|O, ∆) with regard to state transitions is the same to maximizing the product w1w2…wt…wT. Let ρ be this product and so, ρ is the optimal criterion of longest-path algorithm, re-written by (6).

(6)

Where,

The essence of longest-path algorithm is to construct a graph and then, the algorithm finds out the longest path inside such graph with attention that the optimal criterion ρ represents length of every path inside the graph. There is an interesting thing that such length ρ is product of weights instead of sequence of additions as usual. The criterion ρ is function of state transitions and the longest-path algorithm aims to maximize ρ. Following is description of how to build up the graph.

Each wt representing the influence of state xt on the observation sequence O = {o1, o2,…, oT} at time point t is dependent on states xt–1 and xt. We will create a graph from these wt (s). Because there are n possible values of xt, the state xt is decomposed into n nodes Xt1, Xt2,…, Xtn. There are T time points, we have nT time nodes. Let X = {X0, X11, X12,…, X1n, X21, X22,…, X2n,…, XT1, XT2,…, XTn} be a set of 1+nT nodes where X0 is null node. Firstly, we create n weighted arcs from node X0 to n nodes X11, X12,…, X1n at the first time point. These directed arcs are denoted W0111, W0112,…, W011n and their weights are also denoted W0111, W0112,…, W011n. These weights W011j (s) at the first time point are calculated according to w1 (see (6)). Equation (7) determines W1j (s).

(7)

Your attention please, it is conventional that W0i1j is equal to W011j, because the null node X0 has no state.

Moreover, these weights W011j (s) are depicted by fig. 1.

Figure 1. Weighted arcs from null node X0 to n nodes X11, X12,…, X1n.

For example, given weather HMM ∆ whose parameters A, B, and ∏ are specified in tables 1, 2, and 3, suppose observation sequence is O = {o1=φ4=soggy, o2=φ1=dry, o3=φ2=dryish}, we have 3 weights at the initial time point as follows:

For each node X(t–1)i where t > 1, we create n weighted arcs from node X(t–1)i to n nodes Xt1, Xt2,…, Xtn at the time point t. These directed arcs are denoted W(t–1)it1, W(t–1)it2,…, W(t–1)itn and their weights are also denoted W(t–1)it1, W(t–1)it2,…, W(t–1)itn. These weights W(t–1)itj at the time point t are calculated according to wt (see (6)). Equation (8) determines W(t–1)itj.

(8)

Moreover, these weights W(t–1)itj (s) are depicted by fig. 2.

Figure 2. Weighted arcs from n node X(t–1)i to n nodes Xtj at time point t.

Going back given weather HMM ∆ whose parameters A, B, and ∏ are specified in tables 1, 2, and 3, suppose observation sequence is O = {o1=φ4=soggy, o2=φ1=dry, o3=φ2=dryish}, we have 18 weights from time point 1 to time point 3 as follows:

In general, there are (T–1)n2 weights from time point 1 to time point T. Moreover, there are n weights derived from null node X0 at time point 1. Let W be set of n+(T–1)n2 weights from null node X0 to nodes XT1, XT2,…, XTn at the last time point T. Let G = <X, W> be the graph consisting of the set of nodes X = {X0, X11, X12,…, X1n, X21, X22,…, X2n,…, XT1, XT2,…, XTn} be a set of n+(T–1)n2 weights W. The graph G is called state transition graph shown in fig. 3.

Figure 3. State transition graph.

Please pay attention to a very important thing that both graph G and its weights are not determined before the longest-path algorithm is executed because there are a huge number of nodes and arcs. State transition graph shown in fig. 3 is only illustrative example. Going back given weather HMM ∆ whose parameters A, B, and ∏ are specified in tables 1, 2, and 3, suppose observation sequence is O = {o1=φ4=soggy, o2=φ1=dry, o3=φ2=dryish}, the state transition graph of this weather example is shown in fig. 4.

Figure 4. State transition graph of weather example.

The ideology of the longest-path algorithm is to solve uncovering problem by finding the longest path of state transition graph where the whole length of every path is represented by the optimal criterion ρ (see (6)). In other words, the longest-path algorithm maximizes the optimal criterion ρ by finding the longest path. Let X = {x1, x2,…, xT} be the longest path of state transition graph and so, length of X is maximum value of the path length ρ. The path length ρ is calculated as product of weights W(t–1)itj (s). By heuristic assumption, ρ is maximized locally by maximizing weights W(t–1)itj (s) at each time point. The longest-path algorithm is described by pseudo-code shown in table 6 with note that X is state sequence that is ultimate result of the longest-path algorithm.

Table 6. Longest-path algorithm.

The total number of operations inside the longest-path algorithm is 2n+4n(T–1) as follows:

• There are n multiplications for initializing n weights W0111, W0112,…, W011n when each weight W011j requires 1 multiplication. There are n comparisons due to finding maximum weight index .

• There are 3n(T–1) multiplications over the loop inside the algorithm because there are n(T–1) weights W(t–1)jtk over the loop and each W(t–1)jtk requires 3 multiplications. There are n(T–1) comparisons over the loop inside the algorithm due to finding maximum weight indices: .

Inside 2n+4n(T–1) operations, there are n+3n(T–1) multiplications and n+n(T–1) comparisons.

The longest-path algorithm is similar to Viterbi algorithm (see table 5) with regard to the aspect that the path length ρ is calculated accumulatively but computational formulas and viewpoints of longest-path algorithm and Viterbi algorithm are different. The longest-path algorithm is more effective than Viterbi algorithm because it requires 2n+4n(T–1) operations while Viterbi algorithm executes 2n+(2n2+n)(T–1) operations. However, longest-path algorithm does not produce the most accurate result because the path length ρ is maximized locally by maximizing weights W(t–1)itj (s) at each time point, which leads that the resulted sequence X may not be global longest path. In general, the longest-path algorithm is heuristic algorithm that gives a new viewpoint of uncovering problem when applying graphic approach into solving uncovering problem.

Going back given weather HMM ∆ whose parameters A, B, and ∏ are specified in tables 1, 2, and 3, suppose observation sequence is O = {o1=φ4=soggy, o2=φ1=dry, o3=φ2=dryish}, the longest-path algorithm is applied to find out the optimal state sequence X = {x1, x2, x3} as below.

At the first time point, we have:

At the second time point, we have:

At the third time point, we have:

As a result, the optimal state sequence is X = {x1=rainy, x2=sunny, x3=sunny}. The result from the longest-path algorithm in this example is the same to the one from individually optimal procedure (see table 4) and Viterbi algorithm (see table 5).

The longest-path algorithm does not result out accurate state sequence X because it assumes that two successive nodes X(t–1)i and Xtj are mutually independent, which leads that the path length ρ is maximized locally by maximizing weight W(t–1)itj at each time point, while equation (6) indicates that the former node X(t–1)i is dependent on the prior node Xtj. However, according to Markov property, two intermittent nodes X(t–1)i and X(t+1)k are conditional independent given the middle node Xtj. This observation is very important, which help us to enhance the accuracy of longest-path algorithm. The advanced longest-path algorithm divides the path represented by ρ into a set of 2-weight intervals. Each 2-weight interval includes two successive weights W(t–1)itj and Wti(t+1)j corresponding three nodes X(t–1)i, Xtj, and X(t+1)k where the middle node Xtj is also called the midpoint of 2-weight interval. The advanced longest-path algorithm maximizes the path ρ by maximizing every 2-weight interval. Each 2-weight interval has 2n2 connections (sub-paths) because each weight W(t–1)itj or Wti(t+1)j has n2 values. Fig. 5 depicts an example of 2-weight interval.

Figure 5. The 2-weight interval.

The advanced longest-path algorithm is described by pseudo-code shown in table 7.

Table 7. Advanced longest-path algorithm.

X is initialized to be empty, |

i = 1 |

For t = 1 to T step 2 |

// Note that time point t is increased by 2 as follows: 1, 3, 5,… |

Calculating n weights W(t–1)it1, W(t–1)it2,…, W(t–1)itn according to (7) and (8). |

For j = 1 to n |

Calculating n weights Wtj(t+1)1, Wtj(t+1)2,…, Wtj(t+1)n according to (8). |

End for |

Adding two states |

End for |

Because two intermittent nodes X(t–1)i and X(t+1)k that are two end-points of a 2-weight interval are conditional independent given the midpoint Xtj, the essence of advanced longest-path algorithm is to adjust the midpoint of 2-weight interval so as to maximize such 2-weight interval.

The total number of operations inside the longest-path algorithm is (2n2+1.5n)T as follows:

• There are n multiplications for determining weights W(t–1)it1, W(t–1)it2,…, W(t–1)itn. Shortly, there are nT/2 = 0.5nT multiplications over the whole algorithm because time point is increased by 2.

• There are 3n2 multiplications for determining n2 weights Wtj(t+1)l (s) at each time point when each weight requires 3 multiplications. There are n multiplications for determining product . Shortly, there are (3n2+n)T/2 = (1.5n2+0.5n)T multiplications over the whole algorithm because time point is increased by 2.

• There are n2+n comparisons for maximizing: and . Shortly, there are (n2+n)T/2 = (0.5n2+0.5n)T multiplications over the whole algorithm because time point is increased by 2.

Inside (2n2+1.5n)T operations, there are (1.5n2+n)T multiplications and (0.5n2+0.5n)T comparisons. The advanced longest-path algorithm is not more effective than Viterbi algorithm because it requires (2n2+1.5n)T operations while Viterbi algorithm executes 2n+(2n2+n)(T–1) operations but it is more accurate than normal longest-path algorithm aforementioned in table 6.

Going back given weather HMM ∆ whose parameters A, B, and ∏ are specified in tables 1, 2, and 3, suppose observation sequence is O = {o1=φ4=soggy, o2=φ1=dry, o3=φ2=dryish}, the advanced longest-path algorithm is applied to find out the optimal state sequence X = {x1, x2, x3} as follows:

At t=1, we have:

At t=3, we have:

As a result, the optimal state sequence is X = {x1=rainy, x2=sunny, x3=sunny}, which is the same to the one from individually optimal procedure (see table 4), Viterbi algorithm (see table 5), and normal longest-path algorithm (see table 6). The resulted sequence X = {x1=rainy, x2=sunny, x3=sunny} that is the longest path is drawn as bold line from node X0 to node X13 to node X21 to node X31 inside the state transition graph, as seen in following fig. 6.

Figure 6. Longest path drawn as bold line inside state transition graph.

4. Conclusion

The longest-path algorithm proposes a new viewpoint in which uncovering problem is modeled as a graph. The different viewpoint is derived from the fact that longest-path algorithm keeps the optimal criterion as maximizing the conditional probability P(X|O, ∆) whereas Viterbi algorithm maximizes the joint probability P(X, O|∆). Moreover the longest-path algorithm does not use recurrence technique like Viterbi does but this is the reason that longest-path algorithm is less effective than Viterbi although the ideology of longest-path algorithm is simpler than Viterbi. It only moves forward and optimizes every 2-weight interval on the path. The way longest-path algorithm finds out longest path inside the graph shares the forward state transition with Viterbi algorithm. Therefore it is easy to recognize that the ideology of longest-path algorithm does not go beyond the ideology of Viterbi algorithm. However longest-path algorithm opens a potential research trend in improving solution of HMM uncovering problem when Viterbi algorithm is now the best algorithm with regard to theoretical methodology and we only enhance Viterbi by practical techniques. For example, authors [4] applied Hamming distance table into improving Viterbi. Authors [5] propose a fuzzy Viterbi search algorithm which is based on Choquet integrals and Sugeno fuzzy measures. Authors [6] extended Viterbi by using maximum likelihood estimate for the state sequence of a hidden Markov process. Authors [7] proposed an improved Viterbi algorithm based on second-order hidden Markov model for Chinese word segmentation. Authors [8] applied temporal abstraction into speeding up Viterbi. According to authors [9], the Viterbi can be enhanced by parallelization technique in order to take advantages of multiple CPU (s). According to authors [10], fangled decoder helps Viterbi algorithm to consume less memory with no error detection capability. They [10] also proposed a new efficient fangled decoder with less complexity which decreases significantly the processing time of Viterbi along with 2 bit error correction capabilities. Authors [11] combined posterior decoding algorithm and Viterbi algorithm in order to produce the posterior-Viterbi (PV). According to [11], “PV is a two step process: first the posterior probability of each state is computed and then the best posterior allowed path through the model is evaluated by a Viterbi algorithm”. PV achieves strong points of both posterior decoding algorithm and Viterbi algorithm.

References