Mixed Hidden Markov Models for Clinical Research with Discrete Repeated Measurements

: A hidden Markov model (HMM) is a method for analyzing a sequence of transitions for a set of data by considering the outcomes Y to be output from latent state X, which has the Markov property. The HMM has been widely applied, with applications that include speech recognition, genomic analysis, and finance forecasting. The HMM was originally a method for dealing with single-process data. Thus, it is a natural extension to apply it to data with a repeated measure structure by incorporating random effects in it. This is called the mixed hidden Markov model (MHMM). With this extension, the MHMM was recently applied to clinical research data with repeated measurements, e


Introduction
A hidden Markov model (HMM) is a method used to analyze sequential data with characteristic transitions by considering the sequence of the outcomes to be output from a hidden state with the Markov property, for each sequence point [1]. The main benefit of this model is its ability to estimate the parameters of hidden states from the outcome. The history of the model began in the 1960s and 1970s, when Baum et al. investigated the mathematical property of the probabilistic function of Markov chains [2,3,4,5]. In the 1980s and 1990s, the technology expanded to wider applications, including speech recognition [6], genomic analysis [7], finance forecasting [8], and online signature recognition [9] In the life science field, the HMM is mainly used for bioinformatics, including genetic linkage maps [10] distinguishing coding regions in DNA [11], protein modeling [12], and the estimation of the genotype and copy number [13]. Other implementations include cancer segmentation in MRI data [14] and the analysis of kidney disease data [15].
The HMM was originally a method for dealing with single-process sequential data. Thus, it is natural to extend its application to repeated measure data. Wang et al. used a hidden Markov Poisson regression (MPR) model [16], and Crespi et al. used a hidden semi-Markov model [17]. Altman made a breakthrough for this model in 2007 [18]. In this paper, she incorporated random effect into the HMM likelihood function, and applied it to multiple sclerosis data.
With this extension, the mixed hidden Markov model (MHMM) could be applied to clinical research data with repeated measurements. Kenneth et al. used this model to analyze alcohol consumption data [19]. Bartolucci et al. expanded it to develop shared parameter models to apply missing data [20], but it has room for improvement. For example, the distribution which random effect follows is limited to discrete distribution. Marino et al. proposed a mixed hidden markov quantile regression model in conjunction with MHMM and quantile regression and applied to CD4 count data in HIV patients [28]. They also examined multiple integral problems in Gaussian quadrature method in MHMM [29]. De Ruiter et al. analyzed blue whale behavior using multivariate MHMM [30].
The properties of the single-process HMM parameters have also been examined. Leroux proved the consistency of the maximum-likelihood estimator, under certain regularity conditions [21]. Bickel et al. showed the asymptotic normality of the maximum-likelihood estimator (MLE) of the HMM [22]. However, in relation to a multi-process HMM like an MHMM, almost no conclusions about the estimator's property have been reported. This paper reports some simulation results for this problem. In this paper, a maximum-likelihood estimation via the direct maximization mentioned in Altman [18] is performed, and proposed an approximated likelihood function that reduces computational resources. This approximated likelihood function is refered in section 3.
This paper consists of six sections. Section 2 gives some notations for the models and symbols used throughout this paper, section 3 describes the inference method for the HMM, and section 4 reports some simulation results. Section 5 gives descriptions of the actual data analyzed, and shows the analysis results. In section 6, a global discussion and some challenges for the future are provided.

Markov Chain
This section provides definitions for some basic ideas. First, throughout this paper, Bold letters are used to denote vectors, and small letters for scalars, for example, ≔ , , … , . Then, a Markov chain is defined as follows. Let , ∈ ℕ and 1 ≤ ≤ represent sequence points, and |1 ≤ ≤ be a series of discrete random variables. Then, is called a Markov chain with state size s when it satisfies Pr | = Pr | for 1 ≤ ∀ ≤ , which means the current state depends only on the previous state, and is naturally independent of the Markov chain probability cannot be calculated because no previous state exists. Thus, parameters need to be specified. The initial probability can also be set by solving simultaneous equations = 31, ∑ 3 ' = 1 4 '6 . In this case, the HMM is called stationary. In this paper, homogeneous Markov chains are considered.

HMM
Now, the HMM is defined. Outcome 7 and latent state X are already defined as satisfying the Markov property. It is assumed that 7 is discrete. The pair of sequence 7 , 6 is called an HMM when it satisfies ! 7 | , 7 = ! 7 | . The defining characteristics of an HMM consist of three kinds of parameters: 1) state-dependent density parameters such as 9 for the mean of a Poisson distribution, 2) a TPM, and 3) the initial distribution, which represents the probability of the initial state for the Markov chain.

MHMM
The MHMM is an extension of the HMM that incorporates random effects to explain the between-cluster variability. To define a new model, Altman used generalized mixed effect models within an exponential family framework, which treats 7 , where 1 ≤ : ≤ ; represents certain clusters (e.g., patients, sites), as a conditional distribution: let < ≔ = ' '6 4 , >`' '6 4 , @ represent all of the model parameters, covariate for : and , let , be a random effect [18]. The MHMM is denoted as 7 , .

Discrete Repeated Measure Data
Discrete repeated measure data are data that are discrete versions of repeated measurements. Repeated measurements are longitudinal data collected over time for a certain cluster [1,23,24]. The cluster could be a subject, but could also easily be a rat, tree, field, country, or site. One typical example is the data taken from longitudinal studies. The main purpose of collecting this kind of data is to answer a scientific question such as the efficacy of a drug.

Inference Method
This section describes the inference method for the MHMM parameters in detail. First, a likelihood function is proposed. Let 7 , be the HMM as previously defined, and let's be the number of states of a latent Markov chain. Let 1 be a transition probability matrix, and 3 be the initial probability.
Various methods could be used to estimate the HMM parameters. Generally, when estimating mixed effects model parameters, Restricted Maximum Likelihood (REML) [25] are used, but its application to the MHMM has not been fully investigated. Altman proposed some methods to estimate HMM parameters [18], including 1. the expectation-maximization algorithm, 2. direct maximization, 3.
Monte-Carlo expectation-maximization algorithm (MCEM), and 4. simulated maximum likelihood. In this paper, the direct maximization method are used. Even though the EM algorithm is said to be "notoriously slow" [18], it is a very common method for estimating single-process HMM parameters. The difference is that when it comes to repeated measurement data, the random effect variables will increase in conjunction with an increase in the number of clusters. In the direct maximization method, random effect variables are marginalized using numerical integration, but this may cause an underestimation of the variance parameter value. This can be considered in the future. Because the Bayesian method allows freer modeling, it may provide more accurate estimates. The application of hierarchical Bayesian modeling can also be considered in the future.

Likelihood Function
Let the density function for each 7 be C. Let the density for the hidden state be T, and the density for the random effect be ℎ. Then, the likelihood function for the MHMM can be written in the following form: The summation part for each i is a regular HMM likelihood function. Thus, this can be more simply expressed as a product of matrices. More specifically, for each , , In particular, suppose that a random effect is unique per subject and id. Then, the equation will be as follows: C , ; < B,

Approximated Likelihood Function
Integrating a function is sometimes difficult using computer calculations. The simplest way to deal with this problem is to use a numerical integration technique. Let C be a function and from definition a one-dimensional integral, and let p, q be a certain interval where r C -; < B- , where M is an appropriately large positive integer and -' = p +

Maximum Likelihood Estimation
For a single-process HMM, the most common method for computing the estimator of the MLE seems to be the Baum-Welch algorithm [2], which is the HMM version of the usual EM algorithm. However, for the MHMM, the EM algorithm converges very slowly, because this method attempts to infer each random effect individually, which causes the number of parameters to increase with the number of cluster elements. In this study, the direct maximization via Newton-Raphson method is selected. This method is very reasonable because a variance-covariance matrix of the parameter estimator is obtained at the same time, by inversing the observed Fisher information matrix.

Asymptotic Normality
The biggest issue with this method is the asymptotic normality of the estimator, which has not yet been thoroughly examined. For a single-process HMM, Leroux proved the consistency of the MLE [21], and Bickel showed asymptotic normality [22] under sine regularity conditions. In a later section, the asymptotic normality will be examined via simulation results, showing that this assumption is almost fine.

Statistical Test
Sometimes there is a need to test some statistics and clarify whether a certain parameter estimation value is significant, such as in the case of an observational study, like a drug efficacy investigation. After confirming the asymptotic normality of the MLE for the MHMM, conducting an approximate Wald-type test [23] will be very reasonable. This can be denoted as {/}~, where the SE value of the maximum likelihood estimator can be given by 1/•€ •V~ , and € •V~ is the Fisher information of the parameter replaced by the observed Fisher information.

Simulation
This section reports simulation results that confirm the asymptotic normality and consistency of MLEs as the number of time points increases.

Model
First, the simulation details are described. The assumptions of the simulation were as follows. It was a small clinical trial to assess the efficacy of an investigational drug. The target disease was chronic and the condition of a patient transitioned between relapse and remission, satisfying the Markov property, with the outcome following a Poisson distribution. The number of states of the Markov chain } is set to 2, and the Markov chain is supposed to be stationary. The number of time points is 5, 10, 20, 30, or 40, and h ∈ A, P . The number of subjects is set to 10, which is denoted as : = 1,... 10. In addition, : = 1 to 5 is set as group A, : = 6 to 10 is group P, is the latent hidden state, R = ƒ 0, ℎ = P 1, ℎ = A ,R ∈ ℝ are the covariates of each :, and > , > are the coefficients for the covariates. 7 follows a Poisson distribution, as specified below.
7 | , , ~Poisson ˆ‰ Š "‹ b a Ib "‹ OE a IOE ") I Here, , ~• 0,ˆ Ž ••'' , and τ , τ are supposed to fixed effects corresponding to R , R . The TPM for , to avoid setting the boundary constraint to a parameter during the estimation. This Markov chain is already supposed to stationary. Thus, no initial distribution parameters are needed. The simulation is executed as follows: 1. Construct a system that generates sample data following the MHMM, as previously mentioned. 2. Set the true values for the parameters in the system. 3. Execute the MHMM parameter estimation program for the sample data. 4. Record the estimated value. 5. Return to step 1 and iterate until the iteration count reaches a pre-specified value. The true values are specified in Table 1. After execution, the results and calculated the sample mean of the MLEs for each parameter value are summarized, with the 5 percentile point and 95 percentile point for the 90% confidence interval. SAS 9.4 are used for the computation, and executed the direct maximization using the Newton-Raphson method via the NLPNRA subroutines of SAS/IML. To integrate the random effect, the quadrature points of the Gaussian quadrature were set to 60, and range was set to (-3, 3), because the likelihood function had a value of zero outside of the range. Each iteration number is set to 300. Table 1 lists the simulation results.  For T = 5, some parameters were not well estimated. For example, the sample means of τ , τ were overestimated. However, with T = 10, these values were improved, and with T = 20 and 30, the sample means almost coincided with the true values. For T = 40, the sample means of τ , τ were a little bit far from the true value, but other parameters were precisely estimated. In terms of the confidential interval, the width became narrow as T increased for all the parameters. In addition, for T = 5, some skewness of the sample distribution are observed, with the interval between the 5 percentile to sample mean smaller than the range between the sample mean and 95 percentile point. However, as T increases the disproportion tends to improve. On the other hand, the variance component of the random effect tends to be underestimated for all values of T. It is considered that this is because, if less between-patient variability is observed, the estimated value tends to be smaller. Overall, the consistency and asymptotic normality of the MHMM parameters in relation to the time points are very reasonable. In addition, the confidential interval of > straddles zero at T = 5, but the range decreases as T increases. At T = 10, the 5 percentile point (lower value of confidential interval) is almost equal to zero, but at T = 20, 30, and 40, the 5 percentile value is obviously greater than zero. Thus, the drug efficacy can be detected, and it is concluded that as far as the situation of this simulation, a precise estimation can be performed if the number of time points is greater than 20.

Background
This section gives a detailed description of an actual data application. The study was a randomized, double-blind, parallel group multi-center study to compare a placebo with a new anti-epileptic drug (AED), in combination with one or two other AEDs. The datasets were obtained from the supplemental materials of Molenberghs et al. [26].
The study is described in full detail in Faught et al. [27]. The outcome of interest was the number of epileptic seizures experienced during the last week, i. e., since the last time the outcome was measured. The key research question was whether or not the additional new treatment reduced the number of epileptic seizures. Table 2 lists the descriptive statistics for the demographic variables. There was no significant difference between the placebo arm and treatment arm. The spaghetti plot of the epileptic seizure counts for each treatment group is shown below. Verbeke et al. applied a generalized linear mixed effect model with a Poisson regression to these data [23], which include a random intercept as well as a random slope. The problem with this method is that the model assumes that the data have a tendency to increase or decrease over time. However, in Figure 1 the transitions of the outcomes are more stationary. Thus, the assumption does not seem reasonable. Furthermore, there are some outstanding values, and the usual Poisson regression model with identical random effects within patient data is difficult to use to describe that phenomenon. The MHMM can deal with this problem, and effectively explain the stationarity and transition of the disease state. The treatment effects are evaluated using p-values based on Wald tests, with the level of significance set to 0.05.

Application
This section describes how to apply the MHMM model to epilepsy data. Let 7 Ÿ be the epileptic seizure counts for subject :, time point t, and treatment arm ℎ ∈ , , where P = placebo and D = drug, and ¡ is the hidden state. It is supposed that ¡ ∈ 1, 2 , which represents the remission and relapse of the disease state. The distribution for 7 Ÿ is supposed to follow a Poison process, which has the mean £ Ÿ , with £ Ÿ ' =ˆR¤ = ' + > Ÿ' R Ÿ' + , , where R Ÿ' = ƒ 0 :C ℎ = 1 :C ℎ = . The parameters are set using the same method as used in the simulation. Random effect , ~• 0,ˆ Ž ••'' , and τ , τ are fixed effects corresponding to the hidden state.

Result
The results are listed in Table 3. show that the treatment effects are managed to decrease the outcome, but no significant differences are detected. In relation to the transition probability, as it can be seen from the TPM, it was relatively difficult for the state transition to occur, so the data were almost in the remission state. Thus, an inference on the parameter associated with the relapse state may not have worked well. This point will need further investigation. Overall, the analysis results effectively explained the phenomenon.

Conclusion
To summarize, MHMMs are imprementedand examined the asymptotic normality and consistency in relation to the time points via a simulation study. The results showed that both properties were reasonable, and if the time points are set to more than 20, good inference values are obtained in the situation that are set. An application to actual study data are investigated and found that it effectively interpreted the data. Our future tasks include the application of random effects to the transition probability matrix, which will cause the Markov chain to become non-homogenous and produce some difficulty explaining the correlation between random effects. It is easy to assume that all of the effects are independent, but this causes divergence from reality. Second, in this simulation study missing outcomes and subject dropout are not considered, but these are unavoidable in an actual study, especially long-term observational trials. Hence, the authers are planning to handle these problems by using a shared parameter model (SPM). An SPM is a model that is used to treat missing data in a longitudinal study, in which the missing process is non-informative.