Bayesian Analysis of Multivariate Longitudinal Ordinal Data Using Multiple Multivariate Probit Models

: Multivariate longitudinal ordinal data are often involved in longitudinal studies with each individual having more than one longitudinal ordinal measure. However, due to complicated correlation structures within each individual and no explicit likelihood functions, analyzing multivariate longitudinal ordinal data is quite challenging. In this paper, Markov chain Monte Carlo (MCMC) sampling methods are developed to analyze multivariate longitudinal ordinal data by extending multivariate probit (MVP) models for univariate longitudinal ordinal data to multiple multivariate probit models (MMVP) for multivariate longitudinal ordinal data. The identifiable MVP models require the covariance matrix of the latent multivariate normal variables underlying the longitudinal ordinal variables to be a correlation matrix, thus a Metropolis-Hastings (MH) algorithm is usually necessitated, which brings a rigorous task to develop efficient MCMC sampling methods. In contrast to the identifiable MVP models, the non-identifiable MVP models can be constructed to circumvent a MH algorithm to sample a correlation matrix by a Gibbs sampling to sample a covariance matrix, and hence improve the mixing and convergence of the MCMC components. Therefore, both the identifiable MMVP models and the non-identifiable MMVP models for multivariate longitudinal ordinal data are presented, and their corresponding MCMC sampling methods are developed. The performances of these methods are illustrated through simulation studies and an application using data from the Russia Longitudinal Monitoring Survey-Higher School of Economics (RLMS-HSE).


Introduction
Longitudinal ordinal data appear in many scientific research fields, such as medical research and health related surveys. However, these kinds of data usually involve more than one longitudinal ordinal measure collected from the same individual. For instance, in depression research studies, each individual may be asked to fill in the depression score, an ordinal variable indicating the levels of depression during follow-up visits; the sleeping disorder score may often be collected, also an ordinal variable measuring the extent of the quality of sleeping. Since those measures are collected from the same individual, it is desirable to analyze them jointly instead of separately.
Generalized linear mixed-effects models have been widely utilized to analyze multivariate longitudinal data. Gibbons and Hedeker [8]) proposed three-level mixed-effects model for binary clustered data; Liu and Hedeker [17] extended the work of Gibbons and Hedeker [8] to longitudinal multivariate ordinal data. Cagnone et al [4] modeled multivariate longitudinal ordinal data using latent variable models by incorporating random effects to account for correlated structures. Grigorova et al [9] proposed an EM algorithm for multiple ordinal outcomes using a probit model with random effects. Laffont et al [15] proposed a multivariate probit mixed effects model for multivariate longitudinal ordinal data. Grigorova [10] proposed a random effect model for two longitudinal ordinal outcomes using EM algorithm for maximum likelihood estimation. Tran et al [23] proposed a latent linear mixed model with serial correlation structures for multivariate longitudinal ordinal data.
Multivariate marginal models have also been popularly explored for multivariate longitudinal or clustered ordinal data. Due to the lack of explicit likelihood functions, generalized estimating equations (GEE) methods are at an advantage as tools to analyze multivariate non-Gaussian data including multivariate longitudinal or clustered ordinal data. Related works include Qu et al [20], Heagerty and Zeger [12], Jiang et al [13] and Spiess [22]. Formulating the estimation equation using quasi-likelihood, Cho [6] proposed a multivariate marginal model to analyze multivariate longitudinal data. Complete reviews regarding multivariate longitudinal data can be referred to Bandyopadhyay and Ganguli [1] and Verbeke et al [24].
The MCMC methods have become general computation tools in Bayesian inference and have been generally investigated for various multivariate data. Univariate longitudinal binary data has been investigated using the MVP models [5,18,25,26]. Univariate longitudinal ordinal data has been considered by Lawrence et al [16] and Zhang [27] using the non-identifiable MVP models. Dunson [7] proposed Bayesian latent variables with random effects for clustered mixed data including multivariate longitudinal ordinal data. However, multivariate longitudinal ordinal data have been seldomly inspected using marginal models. Comparing with the random-effects models, the marginal models can give direct estimation for correlations among multivariate measures. This paper is trying to fill in this gap by developing MCMC methods to analyze multivariate longitudinal ordinal data using the MMVP models. The reminder of this paper is organized as follows. Section 2 contains the identifiable MMVP model and the corresponding MCMC method for multivariate longitudinal ordinal data. The non-identifiable MMVP model is constructed in Section 3, consisting of the two proposed MCMC methods. Simulation studies regarding these proposed methods are conducted in Section 4, followed by Section 5 -an application to the RLMS-HSE data. A brief discussion is presented in Section 6.

Identifiable MMVP Model
The univariate probit model assumes that there is an underlying univariate normal variable corresponding to an ordinal variable. Specifically, suppose for each individual , = 1, ⋯ , , there is an ordinal outcome with categories and a × 1 covariate vector . Then the probit model assumes that there is a latent variable underlying , following a normal distribution with mean and variance being , denoted by ( , ), where is the × 1 regression parameter vector. The model further assumes that i.e., is the standard normal distribution function and = ( , , , ⋯ -) being the unknown cut-points. The identifiable model is usually defined by = 1 and , = −∞, = 0, and -= ∞. elements. If each longitudinal response variable, •1 , is assumed to has the same number of time points, i.e., 9 = 9 = ⋯ = 9 ; , denoted by 9 , then the number of elements for the response vector is × 9 × 8.

Metropolis-Hastings-Within-Gibbs Sampler (MH-GS)
The priors for , and 6 are assumed to be independent, i.e., ( , , 6) = ( ) × ( ) × (6). Then the posterior joint density of , , 6, and given the observed multivariate longitudinal ordinal outcome = ( , ⋯ , : ) 2 can be derived in the following: , which does not belong to any standard density function due to 6 being a correlation matrix instead of a covariance matrix. Zhang et al [26] proposed a MH algorithm to sample 6 for univariate longitudinal binary data and used Wishart distributions to derive the prior and proposal distributions for 6. For MMVP model, the proposed method for sampling 6 is extended to allow more flexible priors and proposal distributions including those based on Wishart distributions.
Although there is not an explicit formula for the marginal density of 6 derived from a Wishart distribution with a non-diagonal scale matrix and an inverse-Wishart distribution, the joint density of 6 and wcan be derive as (6, w) = (Σ) × | }→•,€ | with Jacobian matrix | }→•,€ | = |w| q^L s and (Σ) being the density function of Wishart or inverse-Wishart distribution. The weak point to use (6, w) as the prior of 6 is that it includes redundant w with diagonal elements serving as variance parameters; however, the advantage is that it allows more flexible prior specification for 6 through the specification of the scale matrix c other than a diagonal matrix. If (6, w) is used for the prior of 6, then the full conditional density function of 6 and w is (6, w| , , , ) ∝ (6, w) × ∏ Z( ; : > . As can be seen, w is involved only through the prior of (6, w), the model itself does not change and still remains identifiable. The MH algorithm is proposed to sample 6 and w as follows: Set initial value of 6 (,) , w (,) through setting Σ (,) = w (,) L s 6 (,) w (,) L s to an initial covariance matrix. Then, at iteration S Generate (6 * , w * ) by generating Σ * = w * L s 6 * w * L s from Accordingly, the MCMC sampling framework can be implemented through three Gibbs sampling steps for , and and one MH sampling step for 6, and this sampling method is denoted as Metropolis-Hastings-within-Gibbs Sampler (MH-GS).

Non-Identifiable MMVP Model
The identifiable MVP model assumes that the latent variable ~ 1 ( , 6) with 6 being a correlation matrix.
The non-identifiable MVP model can be constructed by with w being a diagonal matrix with diagonal elements ( , , ⋯ , 11 ) serving as redundant variance parameters.
The non-identifiable MVP model for univariate longitudinal ordinal data can be extended to the MMVP model for multivariate longitudinal ordinal data. The notations for the response variable = ( , ⋯ , : ) 2 , the covariate matrix for individual , and the regression parameter vector remain the same as those defined in Section 2.1. But the non-identifiable MVP model is assumed for each longitudinal ordinal response variable •1 = ( 1 , ⋯ , 2 < 1 ) 2 , there is a latent multivariate normal variable Š •1 = (Š 1 , ⋯ , Š 2 < 1 ) 2 following a multivariate normal distribution with mean vector being , Σ), and Σ is a covariance matrix with the block diagonal matrices being Σ , ⋯ , Σ ; . Assume that O1 has P O1 categories, then the cut-points for the latent variable Š O1 can be defined as follows:

Parameter-Expanded Data Augmentation Sampling Algorithms
Assume the priors for , ‹ and Σ are independent, i.e., As illustrated, the above four sampling steps are Gibbs sampling, and especially the posterior sample for Σ , a covariance matrix, can be directly sampled from an inverse-Wishart distribution, instead of a MH step to sample 6, a correlation matrix, for the identifiable model in Section 2.2. This algorithm is termed as the parameter-expanded data augmentation (PX-DA) algorithm. However, including the redundant parameters in the diagonal of w , the model becomes non-identifiable and the sampled values of and ‹ are not identifiable. Therefore, marginalizing w can be considered by jointly sampling Š, w| , ‹, 6, (sampling w| , ‹, 6, followed by sampling Š| , ‹, 6, w, ) and then jointly sampling , ‹, 6, w|Š, (sampling |6, w, Š, , 6, w| , Š, , and then ‹| , 6, w, Š, ). It is noticeable that Š| , ‹, 6, w, , |6, w, Š, ; 1> [11]. However, if c is not a diagonal matrix, sampling w necessitates a MH algorithm. Replacing (6, w| , , , ) or (6| , , , ) by (w|6), the MH sampling method in section 2.2 can be used to sample w given 6. This algorithm is termed as the parameter-expanded data augmentation with marginalization (PX-DAM) algorithm.

Simulation Studies
The MH-GS algorithm for the identifiable MMVP model is data sets are generated for each investigated scenario. Each algorithm runs 10,000 iterations with 2,000 burn-in. The MCMC convergence diagnostics were conducted using the R package-coda by Plummer et al [19] and R package-boa by Smith [21].
The averaged posterior means and standard deviations with 95% credible interval coverage probabilities for the regression parameters and cut-points are presented in Tables  1 and 2 for sample sizes being 500 and 2,000 (the results for sample size being 1,000 is presented in Appendix Table A1). As can be seen, these three methods produce similar estimated values for sample sizes being 500 and 1,000; for sample size being 2,000, the PX-DA algorithm produces biased estimation with lower 95% credible interval coverage probabilities, such as with estimated value being 4.873 and 77% coverage probability, , , with estimated value being 0.93 and 61% coverage probability and all the rest cut-points. Overall, it seems that the MH-GS algorithm gives the most precise estimated values for the cut-points; the PX-DA algorithm shows biased estimation for data with large sample sizes such as 2,000; while the PX-DAM algorithm has the largest standard deviations and thus maximum 95% interval credible coverage probabilities.   It shows that with sample size being 500 (the first row), these three methods produce similar absolute biases for those correlations; with sample size being 1,000 (the second row), the MH-GS and PX-DAM algorithms give similar absolute biases while the PX-DA produces biased estimated correlations, especially for the second longitudinal response; with sample size being 2,000 (the third row), the MH-GS and PX-DAM algorithms still give similar absolute biases while the PX-DA produces serious biased estimated correlations for both longitudinal responses. The boxplots of the absolute biases for the correlations between those two MVP models are presented in Appendix Figure A1. It shows the similarity of these three algorithms for sample sizes being 500 and 1,000 while the PX-DA algorithm has serious biases for sample size being 2,000.

True Values MH-GS PX-DA PX-DAM
The autocorrelation function (ACF) plots for selected parameters are shown in Figure 2. It is evident that in general the MH-GS algorithm has the slowest decreased ACF values while the PX-DAM algorithm shows the fastest. The ACF values for the PX-DA algorithm display that with large sample size (such as 2,000) the ACF values of the regression parameter decreased slower even than that for the MH-GS algorithm; the ACF values of the cut-points significantly decrease slower with larger sample size; the ACF values for the correlations are affected not only by sample sizes but also by the values themselves, i.e., t š 0.0625 without being affected while t oe• 0.7 being significantly affected by increasing sample sizes.

Application to the RLMS-HSE
The Russia Longitudinal Monitoring Survey-Higher Schools of Economics (RLMS-HSE) was to collect data to study the effects of Russian reforms on the health and economic welfare of households and individuals [14]. One of the main aspects focuses on the job satisfaction with the primary and secondary employments, which is a 5-categorical ordinal variable defined as "absolutely satisfied", "mostly satisfied", "neutral", "not very satisfied" and "absolutely unsatisfied". To study the relationship between the job satisfaction and other satisfaction, such as satisfaction of education, family, environment, living conditions, etc, is also another focus of the survey. This application is targeted to investigate the relationship between the job satisfaction and the satisfaction with life at present, which is another longitudinal ordinal variable with five categories. The data is chosen from 2013 to 2019 and include gender (1 being male and 0 female), marital status (1 being in a registered marriage and 0 otherwise), education (1 having university diploma and 0 otherwise), and age as the covariates. There are a total of 2704 individuals after excluding those with missing values. Then categories of "not very satisfied" and "absolutely unsatisfied" for job satisfaction are combined and categories "fully satisfied" and "rather satisfies" are combined for the satisfaction with life at present due to sparse sizes in those categories.
The model selection is conducted for job satisfaction using the MVP model: first considering the main effect model including gender, marital status, education and age (all the effects are significant); then -constructing two-way interaction model by including all two-way interaction besides the main effects (only the interaction of marital status and gender is significant); finally, considering the model with all main effects and the interaction of marital status and gender. The log likelihoods, BIC and AIC and selected the model with main effect without any interactions are calculated. The model selection for the satisfaction with life at present is also conducted with the similar conclusion as those for job satisfaction. Therefore, in the following multivariate longitudinal analysis, the main effects as the covariates for both the job satisfaction and the satisfaction with life at present are considered.
The MH-GS, PX-DA and PX-DAM algorithms are run with 20,000 iterations with 10,000 burn-in using the same prior specifications as those in Section 4. Table 3 presents the log-likelihood, the estimated posterior means with standard deviations of the regression parameters for job satisfaction and satisfaction with life at present. It shows that the MH-GS algorithm has the largest log-likelihood values, while the PX-DA algorithm has the smallest among those three algorithms. All the covariates are significant (using 95% credible intervals Specifically, female tends to have a higher job satisfaction than male, while male tend to have a high satisfaction of life at present than female; older people tend to have a high job satisfaction while have a lower satisfaction of life at present; people with married status and diplomas (institute, university, academy) tend to have a higher job satisfaction and satisfaction of life in present as well.  Tables 4, 5 and 6 present the estimated correlations of the underlying latent multivariate normal variables corresponding to job satisfaction and satisfaction of life at present (the estimated standard deviations vary from 0.02 to 0.03 for each method). It indicates that the correlations of the latent variables for satisfaction of life at present varying from 0.39 to 0.60 are a little higher than those for job satisfaction varying from 0.33 to 0.58; the correlations between those two measures vary from 0.17 to 0.40, suggesting that those two measures are modestly correlated. As can be seen, the estimated correlations using the PX-DA algorithm are unanimously smaller than the MH-GS and PX-DAM algorithms; for instance, t

Discussion
In this manuscript the MH-GS algorithm based on the identifiable MMVP model and the PX-DA and PX-DAM algorithms based on the non-identifiable MMVP model are proposed for analyzing multivariate longitudinal ordinal data.
Simulation studies show that those proposed algorithms produce similar estimated regression parameters, cut-points and correlations for data with sample size being 500, while the PX-GS algorithm tends to give biased estimation for data with large sample sizes, such as 1,000 and 2,000. However, the PX-DA and PX-DAM algorithms have faster convergences for almost all parameters than the MH-GS algorithm does, especially for sample size being 500 and 1,000. This suggests that the sampling methods based on the non-identifiable models improve the convergences of the MCMC components compared with those based on the identifiable models. However, for data with large sample sizes, marginalizing the redundant parameters in the non-identifiable models should be considered, otherwise it may produce biased estimated values.
Real data application illustrates that the MH-GS algorithm has the largest log-likelihood values among those three algorithms, suggesting that the algorithm based on the identifiable model produces more precise estimated values in comparison with those based on the non-identifiable models. The observation regarding correlations estimation, i.e., the PX-DA algorithm produces biased or underestimated correlations in comparison with the MH-GS and PX-DAM algorithms, is consistent with the findings of simulation studies.

Conclusion
The MCMC methods proposed in this manuscript can provide a joint analysis of multivariate longitudinal ordinal data and provide a direct estimation for the correlations among the underlying multivariate normal variables for multivariate longitudinal ordinal data. However, both mixed-effects models and separate analysis cannot provide a direct estimation for correlations of those multivariate measurements.
The investigation further demonstrates that MCMC sampling methods based on non-identifiable models may improve the convergences in comparison with those based on the identifiable models, while those based on the identifiable models may produce model with larger log-likelihood values than those based on the non-identifiable models. However, non-identifiable models may produce serious biased estimation without marginalization of the redundant parameters for data with large sample sizes.
Further investigation of those proposed algorithms in other multivariate longitudinal data structures, such as mixed longitudinal ordinal and continuous data, will be one of our future research focuses.