Prediction of Survival of HIV/AIDS Patients from Various Sources of Data Using AFT Models

The aim of this paper is to predict and compare the survival of HIV/AIDS patients under ART follow-up in three different hospitals in Ethiopia. Three data sets with total 1304 patients were considered. Three parametric accelerated failure time distributions: lognormal, loglogistic and Weibull are used to analyze, predict and compare survival probabilities of the patients. The results indicate that the empirical hazard rates of the three data sets reveal maximal peaks. The patients from Arba Minch hospital seems to have highest event intensity. The AFT loglogistic model is selected to best fit to each of the data sets. Different covariates except TB infection status are found to affect patients' survival at each of the hospitals. Patients with TB infection at baseline tend to have shorter survival time as compare to one with no TB infection, with significant differences of survive time between the two groups. Patients under follow-up at Shashemene hospital tend have consistently highest survival probabilities in both TB positive and negative groups. Patients from Bale Robe hospital tend to have longest survival time, while those from Arba Minch hospital have shortest survival time. Patients with bedridden status have the shortest survival time. The AFT-loglogistic is recommended in modelling time-to-event data considered in this study. The results are unique to each hospital implying that patients' care and intervention needs to be specific.


Introduction
Antiretroviral therapy (ART) has improved the survival of HIV/AIDS patients. Quality of life of the patients has been generally improved worldwide. In resource-poor countries, in particular, the ARThas reduced mortality rate among treated HIV/AIDS patients [1], [2]. There are many circumstances in which both a repeatedly-measured biomarker outcome and the elapsed time to an event are collected on each individual in a medical study. These observed biomarker series are frequently important health indicators that represent the progression of a disease. Such data typically have additional features and complications associated with them, including the presence of treatment group indicators and baseline covariates, measurement error in the biomarkers, and right censoring of the event time with the possibility of dependent censoring [3][4]. A study by [4] has demonstrated joint modelling of longitudinal observation of CD4 counts and time-to-death using AFT models under Bayesian settings. They analyzed two of the data sets with various models and found out interesting results on how covariates and shared frailty affect survival outcome of the patients.
In this study, three data sets are analyzed using AFT models, namely, lognormal, loglogistic and Weibull distributions with classical estimation approach. The purpose is to compare these models and predict survival probabilities in the three different population of patients.

Data Description
Three data sets are analyzed in this data sets Data 1, Data 2, Data 3. The three data sets are collected with similar settings. Descriptions of the data given here below.

AFT Models
Survival models are important statistical methods to describe and analyzethe time-to-deathdata of HIV/AIDS patients. An initial step in the analysis of a set of survival data is to present numerical or graphical summaries of the survival times in a particular group. In summarizing survival data, the two common functions applied are the survivor function and the hazard rate functions.
The basic quantity employed to describe time-to-event process is the survival function, the probability of an individual surviving beyond time t. Moreover, the distribution of survival time is characterized by three functions: the probability density function, the survivorship function, and the hazard function [5] - [9].
In survival analysis, an accelerated failure time (AFT) model is a parametric model that provides an alternative to the commonly used proportional hazards models for the analysis of survival time data. Under AFT models, we measure the direct effect of the explanatory variables on the survival time instead that of the hazard [3], [4], [6].
Let T = min(t , c ) be the observed time for the i th subject, where t is the time-to-event and c represents the censoring time which is assumed independent oft where δ =1 if the event is observed and δ =0 otherwise. Let X = (X , X ⋯ X ) be a vector of p covariates. The corresponding log-linear form of the AFT model with respect to time is given generally as: where is intercept, is the vectors of unknown coefficients, is the scale parameter and is a random variable assumed to have a particular distribution (!). Three AFT models used in this studyare as defined in [4]: Lognormaldistribution with survival and hazard functions: The parametric link to the covariates and random effects is: T ~67897:;<6(= (t), σ ? ), 678@= (!)A = + Loglogistic distribution with survival and hazard functions: The parametric link to the covariates and random effects is: Loglogistic and lognormal distributions have hazard rate functions that are non-monotonic that is increasing to reach a peak and then declining over time [3], [4], [6] Weibull distribution with survival and hazard functions are: The parametric link to the covariates and random effects is: Under the AFT model, we can make prediction of the future survival probabilities given history data of the patients, which is given as:

Comparison of Models
Model comparison and selection are among the most common tasks of statistical practice. The most commonly used methods of selection include Akaike information and likelihood based criteria. In this study, the AIC criterion used to compare the parametric models, defined as: whereLLis the log-likelihood, p is the number of parameters in the model. Smaller value of AIC suggests a better model in fitting it to the data [10].

Results and Discussion
The objective of this study is compare survival probabilities of HIV/AIDS patients under ART follow-up in three different hospitals using three accelerated failure time models. Data

The Empirical Hazard Rates
The empirical hazard rate estimates of the three data sets are plotted in Figure 1. They show non-monotonic behaviors of the hazard rates, showing suitability of lognormal and loglogistic models instead of Weibull in analyzing these data sets. The maximum hazard rate for Data 1 is estimated to be 0.001508 at time 56.186 months or 4.682 years. The maximum hazard rate for Data 2 is 0.001971 at time 29.678 months or 2.473 years. For data set from Arba Minch General Hospital, the maximum hazard rate is about 0.008438 at time 64.7955 months or 5.399 years. The results indicate that the patients under follow-up at Arba Minch General Hospital might have higher event intensity as compared to those at both Shashemene and Bale Robe Hospitals.

Comparison of Survival Probabilities
To compare the event experiencing time of two or more groups the survival function used of the groups is good indication. To obtain a closer look at estimate of the survival timethe Kaplan-Meier estimation technique was used. The estimated survival functions in Figure 2 show declining of the survival probabilities over time (in each of the data sets). The pattern of survivorship function lying above another means the group defined by the upper curve had a better survival than the group defined by the lower curve.

Comparison by Sex
In case of sex group, there is no significantdifference until peak point of hazard rate. However, we can observe a slight difference in survival probability between women and men after peak point of the hazard rates.
The log rank test is a non-parametric test for comparing two or more independent survival curves. To compare the survival probabilities between sex categories, we employ the log-rank statistical test with hypothesis:  The analysis show that the log-ranktest for sex of Data 1 is: (Z = 0.3 with 1 df, p= 0.607). The log-ranktest for sex of Data 2: (Z = 0.2 with 1 df, p = 0.672), and that for Data 3 is: (Z = 0.5 with 1 df, p= 0.491). At 5% level of significance, there is not sufficient evidence to rejectthe null hypothesis that survival probability of women patientis equal to that of males. Both gender have same survival probabilities.

Comparison TB Status
In all the cases, a patient with TB infection has lower survival probability as compared to that with no TB infection. The difference is significant in all the cases.

Analysis for Model Comparison
Analysis of the three data sets for model comparison are given in Table 3. Estimates of total AIC for the models AFT-Lognormal, AFT-Loglogistic, and for AFT-Weibull are displayed. In case of Data 1, the AFT-Weibull best fits to the data set. However, Weibull hazard rate has monotonic behavior that does not match with the empirical hazard rate that is revealing hump-shaped. Thus the Loglogistic model, with the next smallest AIC, is suggested for analyzing Data 1.
In case of Data 2, theAFT-Loglogistic model has the smallest total AIC. Fortunately, the hazard rate of loglogistic distribution behaves like that of the empirical hazard rate. Hence it is considered to be the final model for Data 2.
In the case of Data 3, we consider loglogistic as bets fitting model, for same reason given in case of Data 1. Thusthe AFT loglogistic model is selected to fit to all the three data sets.

Analysis of the Data Sets Using Loglogistic Model
Case Study of Data 1. See the results in Table 2. Under the AFT loglogistic model, the intercept term is significant at 5% ssignificant level. The covariates having significant effects on survival times of patients are TB status at baseline, awareness about ART, condom use and opportunistic infection. Butage, functional status, sex, weight, employment, WHO stages, tobacco, drug and alcohol usehave no statistically significant effects on survival of HIV/AIDS patients.  Table 3. Under the AFT loglogistic model, the intercept term is significant at 5% ssignificant level. The covariates having significant effects on survival times of patients are age, TB status at baseline, functional status, condom use, weight, employment andTobacco use. However, opportunistic infection, sex, awareness about ART, WHO stage, drug and alcohol useare not significant.  Table 4. Under the loglogistic model, the intercept term is significant at 5% ssignificant level. The covariates having significant effects on survival times of HIV/AIDSare opportunistic infection, TB status at baseline, Functional status andcondom use.

Prediction of Survival Probabilities Using Covariates
Here we predict survival probabilities and survival times for the HIV/AIDS patient given their TB and functional status. TB status is categorized as with TB infection or no infection. Functional status is categorized in to three levels: working, ambulatory and bedridden.    We can observe from the prediction of future survival that patients with under follow-up at Shashemene hospital will have consistently higher survival as compared to those in other hospitals. Patients with under follow-up at Bale Robe hospital will the longest survival time, while those from Arba Minch hospital will have shortest survival time. Figure 6 displays prediction results with functional status of patients. The results show that the patients in working status will have higher survival probabilities and survival time than those with ambulatory and bedridden status. Patients with bedridden status are those in serious sickness and will have the shortest survival time.

Conclusions
In this study, we aim to predict and compare the survival status of HIV/AIDS patients under ART follow-up at three different hospitals using three AFT models: lognormal, loglogistic and Weibull models. Three data sets are considered.
Based on model comparison and behavior of hazard rate functions, the AFT loglogistic modelis selected to best fit to all the data sets. Under this model, the results of analysis show that different covariates affect the survival of the patients from different hospitals. For patients under followupat Shashemene General Hospital (Data 1), TB status at baseline, awareness about ART, condom use, and opportunistic infection significantly affect their survival time at 5% significant level. For patients under follow-up at Bale Robe hospital (Data 2), the predictors age, functional status, TB status at baseline, weight, type of employment, and tobacco use are found significant. For the third data set where patients are under follow-up at Arba Minch hospital, the predictorsfunctional status, TB status at baseline, condom use, and opportunistic infection significantly affectsurvival time of patients.
When predicting future survival status base on TB status at baseline, a patient with TB infection at baseline has short survival time as compare to one with no TB infection. This is consistent in all the data sets. The estimated differences of mean survive time for a patient with no TB infection are 19, 22 and 7 years for the three hospitals, respectively. Patients under follow-up at Shashemene hospital have consistently highest survival probabilities in both TB positive and negative groups. Moreover, patients from Bale Robe hospital tend to have longest survival time, while those from Arba Minch hospital have shortest survival time. With respect to functional status, patients in working status have highest survival probabilities and longest survival time. Patients with bedridden status have shortest survival time.
AFT-loglogistic is recommended in modelling time-toevent data considered in this study. Patients from different hospitals reveal different results in this study. Thus the resultsare unique toeach hospital implying that patients' care and intervention needs to be specific.