Cell Counts of HIV-Positive Patients Initiated to ART: A Case Study at Ambo Hospital

Introduction: HIV is a virus that causes Acquired Immunodeficiency Syndrome (AIDS) by reducing a person's ability to fight the infection. It attacks an immune cell called the CD4 cell which is responsible for the body's immune response to infectious agents. Now a days anti retro viral therapy treatment is avail to elongate the life of patients. The treatment is given for patients to increase the CD4 counts of patients to keep the ability of body preventing the disease. Objectives: This study was aimed to identify the potential associated risk factors with CD4 counts of patients under ART treatment at public hospital in Ethiopia. The other was to fit linear mixed model by handling missing value of the data during follow up time. Method: To see the structure of the data, exploratory data analysis was conducted. Of the familiar variance structures, unstructured variance covariance is selected to be best and to fit the data under study, step-by-step procedure was passed to obtain best model. Results: The descriptive statistics directed that the progressive change in CD4 counts of females seems better than that of males. On the other hand, the output of the fitted model indicated that covariates significant with 5% level of significance is that baseline CD4, time, weight and interaction of Sex, baseline CD4 with time. Allowing the significance level to increase to 25% increases most covariates to be significant that help patients in a better awareness. Conclusion: With this result, full linear mixed with random intercept and slop is found to best model. There was high variability within patients over time and between patients and the interaction of time with covariates was also significant. Generally, the data was fitted by handling the missing value using multiple imputation technique.


Introduction
Human Immunodeficiency Virus (HIV) is a virus that causes Acquired Immunodeficiency Syndrome (AIDS) by reducing a person's ability to fight the infection. HIV attacks an immune cell called the CD4 cell which is responsible for the body's immune response to infectious agents [2]. HIV/AIDS is one of the major public health problems in world especially Sub-Saharan Africa, and Ethiopia, as one of these countries has been the most affected by the epidemic. In 2017, however, UNAIDS estimates show a slightly different trend: new adult infections are estimated to have declined by 8% between 2010 and 2015 and by 11% between 2010 and 2016 [2].
Theoretically, the progressive change of HIV infection from an asymptotic stage to acquired immunodeficiency syndrome, AIDS, have a direct associated with a gradual decline in the total number of CD4 + T cells in the blood. The biologist interpret this fact as, the decrease in the total number of CD4 + T cells also correlates to an increase in the number of infected T cells and an increase in the amount of free virus in the blood [14].
However, HIV is not the only responsible for the reduction of CD4 from the body; rather other factors are also sharing 33 Endale Alemayehu and Tsigereda Tilahun: Repeated Measure Analysis for the CD4 + Cell Counts of HIV-Positive Patients Initiated to ART: A Case Study at Ambo Hospital this infection. For instance, one with older age is under the risk of higher CD4 reduction. In regardless of sex, different suggestions were given by different studies. Some study indicated that females experienced better CD4 count response to ART compared to males as reported by a study done in North Ethiopia [3]; but others gave weight for male [7,8]. Available literature indicated that there is significant positive relationship between the baseline CD4 counts and CD4 counts over time [8].
HIV/AIDS is one of the major public health problems in the world and Ethiopia, as one of these countries has been affected by the epidemic with a higher burden of prevalence rate. Now a day, even if many directional HIV/AIDS prevention methods were introducing, the number of infected people were increasing thoroughly worldwide. CD4 count is used to measures the number CD4 cells in the blood mostly during ART treatment to know the risk progression of HIV in the HIV infected patients. To attain on common idea, number of researchers have tried to model the CD4 counts inconsideration of longitudinal data and other correlated dataset. As there is no medicine to cure of this virus yet, the drug that used to reimburse the damaged CD4 counts which is key to elongate the life of infected people is the issue that investigators deals very currently [4,16].
However, statisticians have no common view in modeling this natural count form of CD4 and they all are with impressive reason for their approach to use; especially it seems that they are in two groups; modeling CD4 counts in the form of Gaussian linear mixed model and non-Gaussian generalized linear mixed model. Since the nature of the CD4 count is naturally in the count form, investigators preferred to look for non-Gaussian especially Poisson distribution under Generalize linear mixed model and its extension [1,8]. However, Poisson distribution has one drawback of having the same mean and variance which is very impossible in practice and even using its extension Negative-Binomial for CD4 count is not satisfactory; since assuming very disperse distribution of the virus is not expected in real world [5,9,15].
The general objective of this study was to model the progression of CD4 + counts of HIV-positive patients on ART for a period of four years It is known that the burden of HIV/AIDS is a critical issue professional stress for long times to come up with drug which cure the virus. But, yet no one is lucky to do so, the number of infected person increase through time. As immediate solution for a being, CD4 compensator Anti-Retroviral treatment is avail. Since HIV is not the only responsible for the reduction of CD4 immune body, researchers conducted repetitive study by considering other genital and environmental factors [11].
Considering the sub-objective this study has addressed, different sectors and individuals will benefit from the relevant scientific result of the study. For instance, it will enhance the infected individuals to have careful attendance of ART in order to get regular increment of CD4 counts. In other way, this study will also use as input for the ART trainer so that they can give attention and follow those with less CD4 count based on identified factors in this study.

Source of Data
The data for this study has been obtained from Ambo Hospital in Ethiopia from 191 patients and 1528 measurements. As usual, all the relevant information was taken from patients' cards those starts ART treatment for the specified 4 year period of time.

Variables in the Study
In the study any research, selecting the outcome variable and the corresponding factors is very crucial to get potential results that agreed with real world facts.

Dependent Variable
The response variable of this study was the CD4 counts of HIV-infected patients during the follow up time of 4 year from the date of ART treatment initiation. The transformed square root CD4 was considered to assure normality assumption.

Methodology
To answer the basic research questions and realize the objectives of the study, different statistical methods have used.
For the analysis of the data R 4.0 and SAS 9.4 statistical software was used. To explore data techniques like individual plot profile, average plot profile, variance structure plot and correlation structure plot, and some summery numeric statistics were used. The model used to fit the data was linear mixed model and other different techniques of model selection were stated step-wise here below.

Longitudinal Data Analysis
Longitudinal data are the repeated measurement of the same cases over time. It is commonly applicable especially in medicine and social science. Longitudinal data may come in many different forms that its measurement can be taken at different and irregular time points. The main advantage of longitudinal data analysis is that it relaxes the independence assumption of classical linear regression and handles the interdependency of observation [13].
Linear Mixed Model (LMM) Model Building: When independence assumption is failed in general linear model because of dependence among observation or repetition of subjects under study, linear mixed model was used to produce proper inference by allowing residuals to be correlated.
Stage 1: The development of a linear mixed-effects model can be performed in two stages [10,12]. In the first step, an ordinary linear regression model is fitted separately for each subject separately. The linear regression model is used to identify the within variability effect and denoted by: Where: 1. Zi is (nxq) matrix of known covariates 2. is a dimensional vector of subject-specific regression coefficients 3. ~ (0, ∑ ) is the vector of measurement errors (residuals). In many situations, a simple structure for the covariance matrix for measurement error is assumed. Stage 2: In the second stage, between subjects variability is modeled. This is achieved by relating the estimated coefficients with known covariates. Thus, Where: 1. Ki is (qxp) matrix of known covariates. Additional covariates other than time can be adjusted for in this step. 2. is a P dimensional vector of regression coefficients ~ (0, ) and is a matrix of random effects indicating the deviation in individual subjects' measurements from the population average. D is the covariance matrix capturing the between-subject variability.
Generally, the linear mixed model from the above two stages were formulated as: Missing Data: The impact of missing data is a subject that most of us want to avoid. In most cases, you have to eliminate missing data before you address the substantive questions that led you to collect the data. You can eliminate missing data by (1) removing cases with missing data or (2) replacing missing data with reasonable substitute values. In either case, the end result is a dataset without missing values. With this study, we dealt with missing values without deleting the row with missing data. To do so, first we identified the nature of missing values

The Overall Descriptive Statistics
With this sub section we presented some rough overall view of exploratory data analysis of CD4 + counts for those patients registered under ART follow up treatments at some public hospital in Ethiopia for four consecutive years. The table 2 indicates the mean distribution of CD4 counts over follow up time. The mean average CD4 count of patients during the baseline is 264.4293 with standard deviation of 218.431 which is an indicator of large difference between patients when they registered. The total follow up time for individuals were 8 times which taken per six months and with this study patients were failed to follow up all the checkup time which totally there were 411 missing with CD4 outcome variable. When the number of patients under ART treatment is decreased over time, the mean counts of CD4 is increase except it slightly decreased at fifth follow up time.

Testing the Distribution of Outcome Variable (CD4 + )
The distribution of the dependent variable is the significant factor in identifying the model to use and is the base work before starting analysis. It gives potential clues on the nature and characteristics of the data and what to do next. To agree with this theory, we first tested the assumption of normality for the CD4 and squrtCD4 with Shapiro test (for univariate at each follow up time), and q-q plot (discussed after the model was fitted)

Mean Profile Plot
The primary question under this method is how responses vary with time. The immediate impulse is to plot Yij against ti, with values for each individual connected by line segments. These plots can be very messy, and practically useless [6]. To assess the trend with time requires some estimate of the trend to be plotted, along with enough information about the distribution of values to assess the strength of the trend.
The figure 1 indicated the mean profile plot of CD4 count over time using loess smoothing with missing value and by multiple imputation respectively. The red line on the left hand side figure revealed that there seems slight increase in average counts of CD4 and whereas similarly the next plot also assured that there was increase in the mean counts of CD4 expecting not constant increase at follow up time.

Variance Structure Plot
The variability structure plays a great role in identifying the pattern of response variable and gives clues what variance-covariance structure to be expected.
The figure 2 is the variance structure of CD4 counts. It indicated that there is CD4 count variability at baseline stage than at time 1. The CD4 variability increase very fast from time 1 to 2, whereas slightly increase to time 3. Then the 35 Endale Alemayehu and Tsigereda Tilahun: Repeated Measure Analysis for the CD4 + Cell Counts of HIV-Positive Patients Initiated to ART: A Case Study at Ambo Hospital pattern shows that it is fast decrease except increase for the last follow up time.

Statistical Model Developments
In any data analysis, model inference is the crucial part of the research. In doing so, identifying variables selection techniques and considering the covariance between subjects over time is the basic issue to be undertaken. The general procedure included to develop the model was included identifying the appropriate covariance structure, correlation structure, forward variable selection method started from linear model to linear mixed model including interaction and finally the last model was selected by considering estimation techniques.

Covariance-Variance and Correlation Structure
The variance covariance structure is the important issues to be included in the longitudinal data analysis. The table 3 indicated that unstructured variance-covariance has smaller AIC (8022.394) which is to mean that it was better in explaining the model under study. Model selection technique for full model and the effect of time interaction on full model was described under table 3 below. The AIC result indicated that the full model with interaction of time is the best model as compared to the full model without interaction.
Generally, with not considering the random effect, the final comparative best model used under marginal model was that the one including all covariates and the effect of interaction effect. Once the best marginal model was selected, then we have seen the influence of random effect on the change of transformed square root of CD4.

Hierarchical Linear Model
In longitudinal data analysis techniques the effect of random terms (subject) have to be seen to determine the between subject variability and the dispersion measure of individuals over time. As it was discussed under figure 1 the average measure of CD4 count was seems linear and since the distribution of actual CD4 counts seems not normal, the transformed square root CD4 was better normal which was used as response variable of this study. Therefore, linear mixed model by including the random effect to determine subject specific (hierarchical) influence, was developed and finally seen whether it has significant effect or not.
The table 4 below is the result of full marginal model and hierarchical intercept model. Hence, since the mixed model has smaller AIC (8369.065), the random effect has influence in explaining the subject-specific variability and it is the best relative model.

(i). Random Intercept only Model
For the data with repeated follow up time, the effect of differences among the subjects (patients in our case) is expected to have significant influence on the event under study. The random intercept here is considered to see the between subject variability among patients not considering the effect of time. In fitting the random effect, estimating random intercept is used to test the intra class correlation among patients. This is because determining coefficients for all random intercept (ID) of patients are quit tedious and we are limited to see the variation in explaining the changes of CD4 counts over time. In table 5 above the intra class correlation calculated to be 0.538. This is to mean that the 53.8% of variability in progressive change of individuals (within patients over time) is due to random effects. Thus, ID has significant effect in changing the CD4 counts of patients and has not been excluded from the model.

(ii). Random Intercept and Slop Model
The introduction of random effects has important ramifications for the interpretation of the "fixed-effects" regression parameters. The random slop is used to test the between patients variability over time in that to study the difference among subject change in progressive of follow up time.
As indicated under table 6, the model including the random effect specifically random slop is better than the marginal model. Therefore, including the between patient variability is important in explaining the model; so that time has significant effect in progressive change of the transformed square root CD4 counts.

Discussions and Conclusion
The basic objective of this study was to identify the major significant factors which associated with the progressive change of CD4 counts over follow up time of patients under ART treatment at public hospital in Ethiopia. Before the inference analysis of the finding, the exploratory data analysis was viewed that the mean structure of CD4 counts indicated that the average measure was increase through time except decrease at the fifth time. Besides, the variance structure is clearly indicated that there is high variability in progressive change of CD4 counts between patients under study.
Since the primary objectives of the study is to develop linear mixed model, identify the significant factors for the case under study, step-by-step variable relevance have been tested by fitting different models so as to show the priority of full linear mixed model. In first step of model fitting, full marginal model was found to be best model of all marginal model fitted using forward variable selection method as stated in table 6. To check the interaction effect of follow up time with all other covariates, we compared it with full model without interaction an since its AIC value is found to be smaller, time is considered to have significant effect as interaction of the covariates. Under marginal model the interaction of time with Sex and baseline CD4 is significant. The other interesting portion of the model is hierarchical modeling with random intercept and slop. As the usual, we first compared the best model from the marginal family (full linear model with time interaction) with linear mixed model of having only intercept and because of the later model have smaller AIC 8369.065 as stated under table 4 it is relatively best model. Thus, the random effect has influence on the progressive changes of transformed square root CD4 counts of patients. The other major key concern is the effect of random slop which compared with the model having only random intercept and under this model since the AIC value of the model including slop is smaller; it was selected as the best relative model with REML estimation techniques.
Finally, we were interested to compare the two familiar estimation methods maximum likelihood and restricted maximum likelihood under the last selected model and dramatically full linear mixed model including both intercept and slop with ML estimation has smaller AIC (8286.651) and found to be the final best model selected to fit this data.
The dropdown of CD4 counts in human body has hurried patients to fall under a risk by reducing disease prevention immunity of their body. This study was therefore, aimed to identify the risk factors associated with the progressive change of patients registered under ART treatment at public hospital in Ethiopia for four year per six month follow up time the finding is consistent with the study by [1,3,4].
To identify the potential covariates associated with progressive change of CD4 counts of patients, the variables included in the study were Age, baseCD4, Functional Status, ID, Marital status, Regimen class, Sex, Time of follow up, 37 Endale Alemayehu and Tsigereda Tilahun: Repeated Measure Analysis for the CD4 + Cell Counts of HIV-Positive Patients Initiated to ART: A Case Study at Ambo Hospital Weight and WHOStage. Thus, to fit the best model we compared different model from marginal family using forward method and hierarchical family considering random intercept and slop. Finally, the full linear mixed model with both intercept and slop under maximum likelihood estimation method is found to best model. The result of the fitted model indicated that baseline CD4, weight, time and interaction of Sex, baseline CD4 significant with 5% level of significance. But, allowing 25% significance level make most variables to be significant as it is observed from table 6 and it is recommended to consider covariates those may not be fall under obvious level of significance 5% to increase the awareness of patients. The result of residuals is also indicated that the underlined assumption of normality is meeting. It is based up on the transformed square root CD4 counts since the distribution of actual CD4 counts indicated that the data is not normal. With the increasing sample size the significant factors can identified. Researchers should look for other models that handle the dependency to identify more factors.

Declarations
During conducting the study, the investigators have included the following declarations.