Ordinal Regression Modeling of Mother to Infant HIV Transmission in Nyeri County, Kenya

The transmission rates of HIV from a HIV-positive mother to her child during pregnancy, delivery or breastfeeding remains of much concern. Various governments and non-governmental organizations have aimed at coming up with policies aimed at minimizing the transmissions. For this to be achievable, there is a need for sound statistical procedures in the analysis of the mother to infant HIV transmission data. The study gives an application of the ordinal regression to the modeling of such data, a case of Nyeri County-Kenya. The logistic regression has been described as the best methodology for modeling binary response variables. However it does not provide a best fit for an ordered categorical variable with more than two categories. This calls for the extensions of the logistic regression which can be used when modeling such kind of variables, such an extension is the ordinal regression methodology. This study proposes the use of the ordinal regression methodology with probit and logit link functions to model infant feeding, arv regimen, maternal cell count and maternal viral load effect on mother to infant HIV transmission in Nyeri county, Kenya a case of Karatina sub-county referral hospital. An aspect of the ordinal link models, which can be useful for this implementation is particularly emphasized as it is in their interpretation that the classes of the dependent variable can be considered from the partition of the variation interval of an underlying continuous random variable. Data to be used shall be secondary data collected from Karatina sub-county referral hospital. Inference on parameters and model diagnostics is also provided.


Introduction
Mother to child HIV-1 transmission rates remains of much concerns in the Sub-Saharan Africa [17]. To reduce HIV-1 transmissions from expectant and lactating mothers to their infants, different feeding methods and use of ARVs ought to be introduced [16,17]. The World Health Organization (WHO) recommends national health authorities to come up with infant feeding strategies and combinations of ARV's likely to provide the greatest chance of HIV-free survival for HIV exposed infants [17].
Infant feeding is crucial for child growth and development as it enables newly born infants to acquire nutrients and develop immune systems that can fight and prevent new disease infections. The immune system at this early stages of life is usually not fully developed and thus the risk of new infections like HIV-1 transmission hence calling for proper feeding programs for infants [13]. The type of infant feeding substantially affects the risk of HIV transmission but prolonged breastfeeding still remains the main cause of acquiring HIV and other infections for newly born infants [5]. Most of HIV exposed infants become infected during breastfeeding period and this transmission rate can be reduced by giving antiretroviral (ARV) drugs to the mother and the newly born infant during the breast feeding period. Infant feeding in the first six months and the use of ARV drugs has been considered as an important strategy in curbing HIV-1 and other infections from being transmitted from the mother to the newly born infants for all HIV exposed infants [7].
Infants born of HIV positive mothers are more prone to HIV than those born of HIV negative mothers. It is assumed that there is a negative association between breast milk HIV shedding and use of combined oral contraceptives and thus the need for further studies on the same [13]. Further proposals are on the use of replacement feeding methods by lactating mothers so as to reduce the potential risk of HIV transmission [7,9,11].
This study notes that immunological benefits of exclusive breastfeeding far outweigh the risk of HIV transmission for the child in the context of malnutrition, diarrhea, pneumonia and other infant diseases. Despite progress on implementation of prevention of mother to child transmission (PMTCT) programs over the years, reports show that each year an estimated 13,000 new HIV infections occur among Kenyan children during the breastfeeding period [11].
Previous studies done in Kenya show poor uptake of WHO feeding guidelines with replacement feeding being a regular practice and thus responsible for the continued increase in new HIV-1 infections. However, the study notes that the risk of mother to child HIV transmissions can be reduced by a comprehensive approach during the first six months.
This study aims to investigate the effect of infant feeding, use of ARV's, infant CD4 count, maternal cell count and the viral load effect on mother to infant HIV-1 transmission during the first six months after giving birth. The ordinal regression model with logistic and probit link functions were used.

Over-View of Ordinal Regression
The ordinal regression methodology is an advancement of the logistic regression to modeling of response variables with more than two categories. This categories of the response ought to be ordered unambiguously. The idea behind the regression model is that as one factor increases, it would result in a shift toward either end of the spectrum of ordinal responses; the probability of responding towards either end of the spectrum would increase as the predictor variable changes in a given direction.
The ordinal logistic regression works with the log-odds transformation of the respective category probabilities. It requires a researcher to think about the cumulative probabilities instead of the individual category probabilities. Hence for statistical soundness of the methodology, the response variable ought to be ordinal. It is in this regard that the study adopted the Infant HIV-1 infections as a response variable with four categories of the infections; None, Mild, Moderate and Severe.
This regression methodology encompasses itself in the following model assumptions; i. There is only one ordinal dependent variable being regressed on a set of factors (covariates).
ii. The parallel lines assumption in which there is one regression equation for each category of the dependent variable except for the last category.
iii. Adequate cell count assumption in which 80% of the cell categories must have more than 5 counts. No category should have a zero count since the greater the category with less count, the less reliable the chi-square test will be.

Statement of the Problem
Nyeri County has been classified under the medium incidence cluster of new HIV-1 infections according to Kenya HIV Estimates [18]. The Nyeri County Integrated Development Plan (2013-2017) indicates that HIV and AIDS is a leading cause of morbidity and mortality in the county with 20,797 people living with HIV and an estimated 1,307 new HIV infections reported annually [18]. Although HIV testing services (HTS) is a key intervention in prevention and treatment, approximately 30% of the population in the county do not know their HIV status. This study recognizes that the burden of HIV and AIDS continues to impact negatively on the economic performance of individuals, families and communities in the medium and long term. It is in this regard that this study aims to investigate the possible mitigation measures that can be put in place to help curb mother to infant HIV-1 new infections by the use of ordinal logistic regression model in Nyeri County, Kenya.

General Objective
The general objective of the study is to model the mother to infant HIV transmission using the ordinal regression methodology.

Specific Objective
The specific objectives shall be; i. To model the mother to infant HIV-1 transmission in Nyeri County, Kenya using the ordinal regression model. ii. To estimate the ordinal regression model parameters for the mother to infant HIV transmissions in Nyeri County. iii. To perform the model diagnostic checks on the fitted model in modeling the mother to infant HIV transmission rates. iv. To predict the mother to infant HIV transmissions in Nyeri County, Kenya using the fitted model.

Significance of the Study
This study is being undertaken to improve HIV free survival of HIV exposed infants by providing guidance on the use of the right infant ARV regimen combination and the correct infant feeding method depending on the maternal CD4 count. It shall create awareness to HIV infected mothers on how to reduce risk of HIV transmission depending on their maternal CD4 count. It also seeks to add to existing knowledge available for reducing HIV and other infections transmission from mother to infant by use of different infant feeding methods and ARV regimen during the first six months. This study showed a useful application of ordinal regression in modeling mother to infant HIV transmissions. It shall include modeling decisions with respect to how many categories (or the number of thresholds) of a given continuous response variable (mother to child HIV transmissions) fit data better when it is ordinal.
This will help in formulation of new strategies on infant Transmission in Nyeri County, Kenya feeding methods, infant ARV use depending on the maternal CD4 count, maternal cell count and viral load with regard to minimizing the mother to child infant infections.

Scope of the Study
The study will be carried out in Nyeri County at Karatina sub-county referral hospital, to assess the effect of infant feeding methods and ARV regimen on HIV transmission during the first six months after birth. Only HIV exposed infants will be enrolled in this study. Infant feeding methods will be either exclusive breastfeeding or replacement feeding while the ARV Regimen will depend on the infant ARV prophylaxis given.

Literature Review
This chapter is established with the intention of studying previous works on logistic and mother to child HIV-transmission modeling so as to get appropriate theories and the experiential proves to substantiate this study. Thomas

Methodology
This chapter discusses the ordinal regression with logistic and probit link functions used in the modeling of the mother to infant HIV-1 transmissions. A mention of the Ethical Considerations, Study Area and Data to be used in the study are also given.

Study Area
The study was carried out in Karatina sub-county in Nyeri County, Kenya with a population of 759,164 and an area of 2361 km (square)-Kenya Housing and Census 2019. It is one of the Kenyan Counties established under the Kenya Constitution of 2010 and has a total of six sub counties where Karatina sub-county is one of them with majority of the population grappling with high levels of Mother to Infant HIV infections. This formed the basis for the study as it aimed to determine the effects of infant CD4 count at six months, infant feeding programs, use of ARV by expectant and lactating mothers, maternal cell count and the viral load effect on mother to infant HIV-1 infections in Karatina Sub-county, Nyeri-Kenya.

Data
Data used in the study was secondary data obtained from the Karatina sub-county hospital in Nyeri County, Kenya. Permission was granted by the Karatina Health Information System (KHIS) to utilize the data purely for this specific analysis and received the Jomo Kenyatta University of Agriculture and Technology (JKUAT) to do so. Permission and ethical clearances were not given to share the data and thus an effort to get the data shall require permission and requests can be sent to the manager Karatina Health Information System -Nyeri County, Kenya.

Ordinal Regression Modeling
In order to model the mother to infant HIV-1 transmissions, the dependent variable (Infant HIV Infections denoted by Y) was categorized into four categories namely; none, mild, moderate and severe. In this regard, this study assumed that there exists a continuous variable * which is difficult to measure and it is be assessed by identifying cut points for the latent variable so that infants shall classified as having no disease and as having mild, moderate and severe symptoms of the disease for the categories.
The cut points shall define the ordinal categories for which = 4 with associated probabilities for which ∑ = 1 . The cumulative odds for the category shall be defined as; For which the covariate effect on Infant HIV infections was defined as; In which ' are the regressed variables on Infant HIV infections for the categories. The probit and logistic link functions used in the ordinal regression modeling of mother to infant HIV infections are respectively given as; ) is the cumulative normal distribution assumed for the probit link function.

Parameter Estimation
This study estimates model parameters using the maximum likelihood (ML) approach. To describe the ML procedure, we introduce the likelihood function, 7. For 8 observations and 9 : being one of the observations for ∑ 9 : 1 from the outcomes and zero otherwise, the log likelihood is given by; Maximum likelihood estimates of the &EF are those values that maximize this log likelihood equation. Because of the nonlinear nature of the parameters, there is no closed-form solution to these equations and they must be solved iteratively. The Newton-Raphson method as described in Albert and Harris (1987) is used to solve these equations.

Model Diagnostics
A critical step in assessing the appropriateness of any model is to assess how well the model describes the observed data and examine its fit in relation to other models. To ascertain the goodness of fit of the fitted models, the following procedures were used; the Likelihood Ratio Test, Wald Test, Deviance Residuals and the Sign-Based Statistic. Model selection was based on the use of AIC/BIC.

Data Analysis Results & Discussions
This chapter is established on the premises of modeling the mother to infant HIV infections by the use of the ordinal regression methodology with logistic and probit link functions. Descriptive statistics, model parameters and their inferences are provided. Table 1 gave a summary of the mother to infant HIV infections in which a total of three hundred and sixty one (361) lactating mothers and their newly born infants were enrolled in the study. The mother to infant HIV infection categories were with regard to;

Descriptive Data Analysis
A. The infant having no infection of the disease (None). B. Infected with poor weight gain (Mild). C. Infected with persistent cough, oral thrush and diarrhea (Moderate).
D. Infected with severe pneumonia and tuberculosis (Severe). This was given a graphical visualization as in Figure 1.

Fitted Model Parameter Estimates
In the modeling of mother to infant new HIV infections, two ordinal regression models were fit to the data. One with the logistic link function and the other with a probit link function. The model parameter estimates are given in Tables 2  and 3.  Table 2 on the ordinal regression model with logistic link coefficient, the infant CD4 count was found to be a significant factor in the expected mother to infant HIV new infections as it had the lowest p value.   Table 3 gave the ordinal regression with probit link model coefficients. The use of ARV was found to be a significant factor in the expected mother to infant HIV new infections as it had the lowest p value. For a unit increase in Maternal Cell Count, Infant Feeding and Use of ARV the study expect about 0.001, 0.10 and 0.23 decrease in the expected mother to infant HIV infections in the log odds scale.  From the model parameters for the ordinal regression with logistic and probit link functions, an early indication is insinuated on the need for lactating and expectant mother to use ARVs and a proper infant feeding methodology aimed at curbing mother to infant HIV infections. Expectant and lactating mothers thus ought to be encouraged to make frequent visits to medical facilities as this shall help identify new mother HIV infections and thereafter look for ways of mitigating the spread of the disease to the new infant.

Results Discussion
The data exploration of this study was based on the usage of the ordinal regression with logistic and probit link functions on the modeling of mother to infant HIV new infections in Nyeri County, Kenya. The residual deviance, surrogate residuals, log-likelihood and the Pseudo R squared were used to aid the data exploration. Table 4 gave the model diagnostic coefficients in which the ordinal regression with logistic link function had a lower AIC and Log-Likelihood. This gave an implication of a better fit to the mother to infant HIV infections data. This was confirmed by the Residual deviance which was lower for the logistic ordinal regression and a higher Psuedo R squared value than the Probit Ordinal Regression model.

Fitted Model Confidence Intervals
Fitted model confidence intervals were used in the determination of consistent factors in the estimation and prediction of mother to infant HIV-1 infections. Table 5 gave the fitted model confidence intervals. From this Table, the infant CD4 count and viral load (RNA) gave lower confidence intervals than the other factors both for the Logistic and Probit Ordinal Regression models at a 95% confidence interval. With low confidence intervals close to zero for infant CD4 count and viral load (RNA), it's clear that this two factors did not over (under)-estimate mother to infant HIV-1 infections. This gave an implication of the consistency of the infant CD4 count and viral load (RNA) in the modeling of mother to infant HIV infections while excluding the other factors. Table 6 gave the fitted model probability predictions of mother to infant new HIV infections. The predicted probability of an infant getting Mild, Moderate, None or Severe symptoms of the disease from an infected mother was 0.20535, 0.11885, 0.6593 or 0.016463 for the logistic ordinal model. For the probit ordinal model this were 0.2113, 0.11929, 0.6521 or 0.017350 respectively.

Residual Analysis
In the analysis of fitted model residuals, the surrogate residuals were used. Table 7 gave a summary of the surrogate residuals and Figure 2 gave the Q-Q plot of the fitted model surrogate residuals. The mean and median surrogate residuals were close to zero and the observed near symmetry of this residuals gave an indication of unbiasness of the fitted models in modeling mother to infant HIV-1 infections. From Figure 2, the two models gave a relatively better fit to the data with a few outliers being above and below the 45 degree reference line at the extreme ends of the surrogate residuals.

Conclusions and Commendations
This chapter is the final stage of the study. It aimed at getting a summary of the study, conclusions and recommendations for further studies.

Summary
Mother to infant HIV-1 infections remain to be of much concern in the Sub-Saharan Africa. Different methodologies have been thought as solutions to curbing the spread of the HIV infections from mothers to their newly born infants. This study gave an applicability of the ordinal regression with logistic and probit link functions in modeling of the mother to infant HIV-1 transmissions.
A total of three hundred and sixty one (361) lactating mothers and their newly born infants were enrolled in the study. The mother to infant HIV-1 infections were categorized into None, Mild, Moderate and Severe. The infants with None, Mild, Moderate and Severe symptoms of HIV infection were 234, 77, 44 and 6 respectively. The standard probability plot was used to give a visualization of the infections. The model parameter estimates were obtained by fitting the ordinal regression model with logistic and probit link functions to the data.
The model diagnostic plots of Residual Deviance, AIC, Log-likelihood and Pseudo R Squared were used in comparing the fit models so as to determine the one that gave a better fit. In order to determine the consistent factors in the estimation and prediction of mother to infant HIV-1 infections, model confidence intervals were used at a 95% confidence interval. The ordinal regression residuals were evaluated via the surrogate residuals whose summary was obtained and consequently the surrogate Q-Q plots for the fitted models.

Conclusion
In the modeling of categorical response variables with more than two categories the Logistic and Probit Ordinal Regression models should be used. However the Logistic Ordinal Regression models gives a better fit to categorical data than the Probit model. This was as with the modeling of mother to infant HIV infections which were categorized into None, Mild, Moderate and Severe.
The AIC, Residual Deviance and Log-likelihood model comparison and selection techniques ought to be used. Lower AIC and Log-Likelihood values would imply a better model fit. The surrogate residuals should be used in the analysis of ordinal regression model residuals and the model confidence intervals used to identify consistent factors in the estimation of the response variable.
In the modeling of the mother to infant HIV infections data the Logistic Ordinal model had the lowest AIC/Log-likelihood implying that it gave a better fit than the Probit ordinal model. Infant CD4 count and viral load were the only consistent factors in the estimation and prediction of mother to infant HIV infections as they had confidence intervals close to zero. This was as with the case for Fondoh et al [3], Izudi et al [13] and Obsa et al [8] who used a multivariate logistic approach to modeling of the mother to infant HIV infections.
The surrogate Q-Q plots gave an indication that the fitted models gave a better fit to data with less over (under) fitting at the extremes. Use of ARV had the highest unit effect in the mother to infant HIV-1 infections.

Recommendations
This study gave an application of the ordinal regression methodology with logistic and probit link functions to the modeling of a categorical response variable (Mother to Infant HIV-1 transmissions). Further studies ought to be done as an extension of the existing literature in the modeling of categorical variables as with mother to infant HIV infections. Other ordinal link functions such as the log-log, complementary log-log and cauchit can be given a consideration with regard to a multivariate ordinal regression model. This techniques can then be applied to other fields such as insurance and economics and political science.