On Bayesian Estimation of Dirichlet Process Lognormal Mixture Models and Comparison of Treatments in Censoring

One current interest in medical research is the comparison of treatments in the analysis of survival times of patients. This is particularly problematic, especially for censored data, and when these data consists of several groups, where each group has distinct properties and characteristics but belong to the same distribution. There are various modeling schemes that have been contemplated to overcome these complexities inherent in the data. One such possibility is the Bayesian approach which integrates prior knowledge in analysis. In this paper, we focus on the use of Bayesian lognormal mixture model (MLNM) with related Dirichlet process (DP) prior distribution for estimating patient survival. The advances in the Bayesian paradigm have considerably bolstered the development and application of mixture modelling methodology in the field of survival analysis. The proposed MLN model is compared with the conventional parametric lognormal and the nonparametric Kaplan Meier (K-M) models used to estimate survival to establish model robustness. A simulation study that investigates the impact of censoring on these models is also described. Real data from past research is used to show the resulting Dirichlet process mixture model’s robustness in the comparison of censored treatment. The results indicate that the proposed lognormal mixtures provide a better fit to complex data. Further, the MLN models are able to estimate various survival distributions and therefore appropriate to compare treatments. Clinicians will find these models useful especially when confronted with the obstacle of choosing a suitable therapy for a disease.


Introduction
Most medical data are censored and/or arise from several homogenous subgroups relating to one or several characteristics, for example when different treatments are administered to patients. There is therefore an increasing need for efficient estimation of patient survival through comparison of treatments. Several modeling procedures have been postulated in literature. One such scheme is the Bayesian framework that incorporates prior information regarding the data without compromising the accuracy of estimates [2]. And as reported by [5], models based on Bayesian nonparametrics present more flexibility leading to ease of computation. Bayesian nonparametric models however suffer from the possible impediments of inference, particularly due to the challenges associated with prior choice. The Dirichlet Process (DP) is the prior of choice for Bayesian non-parametric models. However the DP is inadequate in these settings since the posterior is not DP but a mixture of DP. Mixture DP distributions offer the options of discreteness and flexibility especially since they consider data as represented by weighted sum of distributions, with each distribution characterized by a unique parameter set representing a subspace of the population. Given the complexities associated with these MDP models, posterior inference is feasible through tractable MCMC technique algorithms. [13], postulated that the Dirichlet Process (DP) prior for mixing portions can be handled by both a Bayesian framework through an MCMC algorithm. [9] use Bayesian techniques, with prior drawn from specific estimation points to analyze survival times data while [1] consider Dirichlet process and its extensions in boistatistics.
The main aim of this paper is to develop and apply the Bayesian Dirichlet process lognormal mixture approach to model patients' survival in cases of censoring. This aim is addressed through the estimation of a two component lognormal mixture model in a simulation study and an application to a leukemia remission dataset from [4]. A Bayesian Markov Chain Monte Carlo (MCMC) approach through Gibbs sampling and Metropolis-Hasting algorithm, assuming known number of mixing components was implemented.
In this paper we carry out posterior inference by sampling from the posterior distribution using simulation employing Markov Chain Monte Carlo (MCMC) methods. We employ the MCMC sampling algorithms through the Win BUGS [14] software. The overall aim of MCMC sampling is to simulate from a complex (posterior) density by creating a Markov chain with the posterior density as its stationary distribution. This is done by direct successive simulations from the component conditional distributions.
The rest of this paper is organized as follows. In Section 2, we define the Dirichlet process mixture model and Bayesian computational approach for parameter estimation. We also provide the method of simulation and discuss the issue of model evaluation. In Section 3, we illustrate the model using simulated datasets and the leukemia dataset. The results are discussed further in Section 4.

Model Formulation
In this section, we define the Dirichlet process lognormal mixture model for analyzing survival data. Let 1 ,... n t t be a random sample taken from a censored and possibly heterogeneous population. The two parameter lognormal distribution given by where 0 µ > is the scale parameter and 2 0 s > is the shape parameter [15]. A DPLN mixture model for t can be written as If i t is an uncensored failure time, that is, 0 i = , the full conditional mixture DP model is as given by equation (9).
where j n are the number of uncensored failure times in the th j cluster.
For this finite mixture model, we treat the number of subgroups, K representing the data under study as known. As the data size grows and data become more complicated, an infinite number of prior information is theoretically assigned for growing with data, giving a hierarchical representation. The proportion of data explained by a subgroup j is represented by the component weight j If prior distributions are specified, such that For this lognormal distribution mixture we model flexibility by conveniently assigning the following independent prior distributions [8] for the unknown parameters, and accordingly, express the MDPLN model hierarchically as In WinBUGS [12], censored data are modelled using a missing data approach through the command I(., ) as follows

Computation Method
The model described in Section 2.1 can be fitted using MCMC sampling with latent values

Simulation Algorithm
This section describes the sampling algorithm [6] for simulating from the complex posterior distribution. This is implemented by creating a Markov chain with the posterior as its stationary distribution through direct successive simulations from the component conditional distributions. We note that the Gibbs sampler and its various adaptations has been the most commonly used approach in Bayesian mixture estimation.
Our interest in this study was to estimate the parameter of Bayesian Dirichlet process lognormal mixture. We used the simulation algorithm for analyses. For this study, data were simulated from two component lognormal mixture models with the following parameter configurations: 2 ( | , , , ) 0.4 (4,0.16) 0.6 (5,0.09) The censoring levels of 30% was applied to model and a sample size of N = 100 for n=100 individuals was used for all experiments. The following steps were applied to carry out the simulations.
2. Generate censoring times by assuming that the largest C% survival times are right censored. 3. Fit the model based on the data, with 10000 iterations, 5000 burn-in. 4. Record posterior estimates of the model parameters, namely median and standard deviation.

Simulated Data
Based on the nature of the survival data, a mixture of two Lognormal [13] distributions is considered. We generated N = 100 data sets, each with n = 100 individuals. We generated a right-censored data set by first generate a variable T for true survival time from the mixture distribution. Next, we generated another variable for censoring time, C from a uniform distribution on the interval (0, 4.784) so that each individual has approximately a 30% chance of having a right-censored survival time. This was to obtain approximately 30% censoring for the data set.
Since we know very little about the true values of these parameters, we used vague Gamma priors [3] as follows These non-informative prior distributions were deployed to generate lifetime data sets resembling the nature of complex models [13], and each have a variance of 6 10 , not to influence the posterior distribution. A large prior variance is indicative of a vague distribution and therefore reflects relative ignorance about the true parameters.
We carried out a convergence diagnostic test to ensure convergence of the Markov Chains by estimating the length of the burn-in period, before taking a sample from the converged chain. The plot in Figure 1 illustrates the trace history for µ and 2 s while Table 1 confirms the accuracy of the parameter estimates in the 30% of censoring.  The figure shows quite a good mixing of the algorithm, with the mixture size moves oscillating without remaining in the same place for too long. The simulated data was used to illustrate the performance of the proposed MDPLN model with competing models in literature. We employed both graphical and quantitative methods to compare the parametric lognormal model, the non-parametric Kaplan Meier (K-M) and the proposed MDPL model. Graphical comparison was through fitting the survival functions of the three models to the data and a visual inspection as to how similar shape and behavior of the survival functions (curves) are to the true model made (Equation 21). Figure 2 shows the survival curves (plots) obtained.
From the plots in Figure 2 we see that the parametric lognormal is not capable of capturing the generated mixture distribution with long tail and thus is not a good choice for estimating the mixture survival time data. However, the MDPLN model fits the data better than both parametric lognormal and the nonparametric K-M. To facilitate a quantitative comparison, the Kolmogorov-Smirnov (KS) test, a nonparametric test for goodness-of-fit [7], was used to assess the appropriateness of the proposed models against the true mixture model. The KS test summarizes the discrepancy between observed values and the values expected under the models in question. Table 2 shows the results from the comparison.  The results in Table 2 show that the estimated CDF for the mixture model using MDPLNM has the smallest test statistics value of 0.1476 with a p-value of 0.8680>0.05. A smaller test statistics reflects a better model fit. We conclude that MDPLN model offers the best estimate and therefore most appropriate to estimate survival.

Application
Here we analyzed data from remission times of 21 pairs of 42 acute leukemia patients [4] in a clinical trial designed to test the potency of 6-Mercaptopurine (6-MP) to lengthen remission in patients randomly assigned to maintenance therapy of either 6-MP or a placebo. As in the simulated example, we used the same prior distributions and the Sampling MCMC algorithm through Win BUGS with 10000 iterations (5000 to burn-in) to fit the data. In Figure 4 we illustrate and predict the survivor functions. The Survivor functions have also been compared to the Kaplan Meier estimator. From the figure there appears to be a good correspondence between the K-M and the plots for each set of treatment observation. From figure 3, we conclude that patients who receive the 6-MP treatment have a longer survival rate than the patients in the placebo group. This justifies the effectiveness of 6-MP treatment in prolonging duration of remission for the patients.
In Table 3 we show a quantitative comparison using Kolmogorov-Smirnov test, a nonparametric test for goodness-of-fit, for testing statistical differences in survival between groups. The null hypothesis states that the leukemia patient groups have the same survival distribution against the alternative that the survival distributions are different. From these p-values for each test statistic, we conclude, at the 0.05 significance level, that patients who receive the 6-MP treatment have a longer survival rate than the patients in the placebo group. This result supports earlier findings by [4].

Conclusions and Further Developments
In this paper, we have illustrated how Bayesian methods can be used to fit a mixture of lognormal model with a known number of components to heterogeneous, censored survival data using MCMC algorithm through the Win BUGS software to estimate the survivor function. We show from simulation results and those from real data that the mixture Dirichlet Process LNM model is capable of capturing different shapes of complex survival time distributions. This indicates that the proposed model is more robust than the competing parametric lognormal and nonparametric Kaplan Meier models, and therefore shows promising potential to be applied to compare different treatments. Some extensions and modifications of the model would be extended to cases where we have unknown number of components K as data grows in complexity.