Modelling Cases of Spontaneous Abortion Using Logistic Regression

Spontaneous abortion is the expulsion of a foetus before the 28th week of gestation. Studies approximate that 1025% of pregnancies are lost due to miscarriages. This phenomenon's aetiology remains a mystery hence uncertainty of detecting its cause. Furthermore, most pregnant women realize they have conceived later in the gestation period and some start antenatal care late during the pregnancy.In Kenya, total fertility rate has decreased for the last three decades from 8.1 to 3.9. However, with the decrease of total fertility rate, prevalence of maternal mortality and morbidity factors has greatly impacted on the pregnancy. Among them is spontaneous abortion. This study used secondary data from Kenyatta national hospital and employed logistic regression to model miscarriage's risk factors, investigate socio demographic and lifestyle factors, to investigate interactions among identified risk factors and fit a predictive model. Significant socio demographic factors identified were age and recurrent miscarriage. A woman who had experienced prior miscarriage had a 7.5-fold risk. Lifestyle factors identified were body mass index, diabetes mellitus and HIV. Underweight women had a 13.2-fold risk. There were significant interactions between gravidity and previous miscarriage; diabetes and body mass index. A predictive model was fit. The model has a good measure of separability, 80% classification accuracy and it is significant.


Introduction
The expectation of a pregnant woman is to hold her a baby when gestation period is over. However, for some pregnant women, this expectation is deprived from them. This is through pregnancy loss, which is among adverse pregnancy complications. Spontaneous abortion, also known as miscarriage is the main type of pregnancy loss [16]. Spontaneous abortion is an irreversible process which affects 10-25% of clinically recognized pregnancies [14]. However, the actual rate of miscarriage is even higher, as many women have very early miscarriages without ever realizing that they are pregnant. About 80% of miscarriages happen in the first trimester, but the risk of it declines as gestation time progresses.
Miscarriage is classified according to its frequency of occurrence experienced by pregnant women. The classification includes sporadic and recurrent miscarriage.
Sporadic miscarriage is single occurrence of pregnancy loss and affects 50% of women. Recurrent miscarriage is three or more occurrence of pregnancy losses and affects 1% of women. Furthermore, spontaneous abortion can be sub divided into threatened abortion, incomplete and complete miscarriages. Threatened miscarriage is abnormal vaginal bleeding with or without abdominal pain, and affects 20% of pregnancies. 5.5%-42.7% risk for subsequent complete miscarriage has been associated with spotting [10]. Inevitable miscarriage arises when severe cramps persist and accompanied by opening of the cervix.
Spontaneous abortion has negative effects on the affected pregnant woman. This is can be extended to her partner. It magnifies emotional distress, grief and anxiety which eventually can lead to depression to affected women [4]. This burdens heavily the affected woman, draining her energy and will to live. Moreover, due to the magnification of emotional distress and grief, breakage of families and divorces can arise from it. Miscarriage is an economical burden to couples who Using Logistic Regression want to salvage the pregnancy and to the affected women going for therapy and counselling sessions. Etiology of spontaneous abortion is unknown [6]. It is not possible to state a cause with surety due to its heterogeneous and complexity of its etiology. However, over the past years, studies have found some factors which facilitate and promote occurrence of miscarriages. Common risk factors are extreme of age (both paternal and maternal), diabetes mellitus, alcohol consumption, smoking, caffeine intake, extremes of body mass index (BMI), anti-phospholipid syndrome, hypertension, low serum progesterone levels, infections and stress.

Study Area and Data Source
Kenya is a country located in the East Africa. Kenyatta national hospital is located in Nairobi county. The data was obtained from Kenyatta national hospital.

Logistic Regression
Logistic function was invented for description of populations' growths. Logistic regression is a special model of generalized linear models with a link function as logarithm of odds. It was first suggested by [3] in the analysis of biological experiments, where later [5] diversified its implementation. It is popular in studies where the outcome variable (y) is binary or dichotomous in nature. Its popularity is because there is no necessity of assumptions [1].In this study, dependent variable (Y) is binary and Y=1 if response is "yes" and Y=0 if response is "no". Let π be probability of an event, Where β i , i=0, 1,…,n. Logistic regression model is defined as:

Univariate Analysis
For nominal and ordinal variables, we shall use contingency table of outcome Y versus the independent variables and among independent variables themselves. Then use likelihood chi-square test and Fisher's Exact. For continuous variables, univariate logistic regression will be done. Therefore, any candidate with p-value < 0.05 was statistically significant.

Multivariate Analysis
Forward stepwise method was done so as to include variables in the model. At every stage, test of significance, based on likelihood ratio test, was conducted for variable inclusion. The process stopped when all significant variables (p < 0.05) were included in the model. All variables were subjected to step wise regression using R.

Estimating Parameters
The model coefficients are unknown quantities and so, we will be needed to estimate them. Method of maximizing likelihood function is to be used in estimating parameters. The likelihood for a given model is interpreted as the joint probability of the observed outcomes expressed as a function of the chosen regression model [7]. Log likelihood function is to be maximized. It is in the form: Maximizing the log likelihood, we get two (p+1) equations which are non-linear and iterative solutions to them will be achieved using R statistical software. Thus fitted values of the logistic model are; With confidence level For j=0,1,…,p.

Model Assessment
It is obligatory to assess adequacy and appropriate of the model. To achieve this, likelihood ratio test will be conducted. We shall use likelihood ratio test to assess importance of each explanatory variable. The test statistic is: Hosmer-Lemeshow test is commonly used to assess goodness fit of the model. We shall group the percentiles of estimated probabilities in ten groups. Grouping method is most preferred to fixed cut points especially when the estimated probabilities are small. Hosmer-Lemeshow statistic has chi-square distribution with 8 degrees of freedom. Table 1 indicates frequencies of those who experienced miscarriage (YES) and those who did not (NO). From the table below, miscarriage cases recorded were 12.3%, less compared to those who did not experience at all. This is consistent to [14] findings.   Figure 1 shows that the average age of women who experienced miscarriage was higher unlike those who did not. Overall, the mean of women was 30 years with the youngest woman being 20 as the eldest being 45. Age was categorized into three groups; 20-30, 30-40 and 40+. Also the figure shows that the average BMI of women who experienced miscarriage was higher unlike those who did not. The mean of BMI of women was 22.21. This continuous variable was sub divided into underweight, overweight, obese and normal. Gravidity is continuous variable where its mean is 1.35≈1. Maximum recorded gravidity was 5 as 0 was the least among the women. It was also sub divided into two categories, that is nulligravid and multigravid. The rest predictor variables are dichotomous in nature with two factor levels.

Results
Women above 40 years had the highest risk followed by those between 30-40 years as compared to youngest category. Those who had previously miscarried have a 5.4-fold risk than those who never had. Multigravid women have an almost 3-fold risk as compared to nulligravid women. It followed that obese women were far much vulnerable as 12 of them lost their pregnancies. This constituted to half of the miscarriages. They have highest risk followed by those underweight and overweight women. Lifestyle factors included smoking, alcohol consumption, caffeine intake, diabetes and living with HIV virus. Diabetic women and HIV infected women have about 3-fold risk. This summarized in table 2. Overall significant variables are those with p < 0.05. These are age, previous miscarriage, BMI, diabetes and HIV status. Interactions among predictor variables were also assessed. Significant interactions were observed among gravidity and previous miscarriage (r=0.26, p= 0.000) and BMI and diabetes (r=0.31, p=0.000).
A multiple logistic regression was conducted to obtain adjusted odds ratio of variables. We observed that women above 40 had the greatest risk followed by those between 20-30 years of age compared to the youngest age group. Those who had previously experienced miscarriage have almost 8fold risk compared to those who never experienced it. Obese women have the highest risk and underweight women have 13-fold risk compared to women with normal BMI. Alcohol increased the risk thrice, however this study found it not statistically significant. Diabetic women have almost 10-fold Using Logistic Regression risk. HIV positive women have a 6-fold risk. This summarized in table 3.

Predictive Model
Statistically significant variables were included in the multiple logistic model through stepwise regression on the basis of likelihood ratio test. Only five variables were found to be significant (p< 0.05); age, previous miscarriage, BMI, diabetes and HIV status. The logistic regression output is shown on the table below. Some statistics were calculated to determine the accuracy and nature of the model. Table 5 summarizes the information about the statistics. From Table 5, we can observe that the model is significant. p-value of Hosmer-Lemeshow statistic is greater than 0.05. This is interpreted as the model not being a poor fit. Area under curve for ROC measures the level of separability. We can observe that the value of area under curve tends towards one. It has misclassification error of 20% which is good for the model.

Discussion
From this study, we found significant miscarriage risk factors to be age, previous miscarriage, body mass index, and diabetes and HIV status.
Maternal age is a major risk factor. Maternal aging is associated with increased infertility, miscarriage and poor prognosis of pregnancy. Studies on miscarriages all conclude that indeed it is a major risk factor. From this study, we found that women had a higher risk as compared to the younger expectant women. Moreover, advanced age has been linked to affecting ovarian aging which enhances rate of meiotic errors in the oocyte [9]. These errors result to fetal anomalies, which contributes more than 50% of spontaneous abortions. Maternal age and previous miscarriages independently decrease live birth rate of subsequent pregnancy [15]. Recurrence of miscarriage is a risk factor which affects 1% of pregnancies. In this study, women with a history of miscarriages had a higher significant risk compared to those with lack of the history. Nevertheless, a significant interaction between gravidity and history of miscarriage was observed.
Low BMI (<20 kgm -2 ) is risk factor for miscarriage in early weeks of gestation [2].Obesity is mostly associated with lower progesterone levels among expectant women [8]. Miscarriage risk is higher in underweight and obese women than those with normal BMI [12]. This was evident from our study.
HIV infection increases 6% risk of miscarriage annually [13]. From the study, it is evident that HIV infection is a significant risk factor. It is evident, diabetes mellitus is a significant risk factor. More so, there is a significant positive interaction between diabetes and BMI. This is consistent with previous studies.
Majorly, logistic regression is used in predicting response variable rather than estimating probability. Therefore, it was the best approach to adopt in this study. Its classification accuracy is best achieved when applied to small and moderate sized datasets [11]. Assessing the predictive model, we found that it is a significant model of no poor fit with classification accuracy is 80%.

Conclusion and Recommendation
Miscarriage is a frequent adverse outcome of pregnancy, as more than 12% of all recognized pregnancies end in a spontaneous abortion. Age and previous miscarriage were found to be significant socio-demographic risk factors. Women age forty and above are more vulnerable to experience miscarriage. Hence, women need to be advised to have children at legal desirable younger age other than when they are old. Also, recurrence of miscarriage increases the risks. It was found that gravidity increases recurrent miscarriage.
Significant lifestyle risk factors were body mass index, diabetes and HIV. Low BMI and obesity are contributory risk factors for miscarriages. This is evident from this study. Moreover, there is a positive relationship between BMI and diabetes. This implies that body mass index is a causative condition of diabetes mellitus. The latter is a significant miscarriage's risk factor. Diabetic women are prevalent in experiencing miscarriage. It is evident HIV increases miscarriage risk.
In studies where the response variable is binary and dataset is small to moderate sized, logistic regression is the best approach to adopt. This study exhibited the above characteristics. Since it is a challenging task to estimate probability of pregnancies being lost due to miscarriage, logistic regression was used to predict response variable. Application of logistic regression in building predictive model was a success. This is because our predictive model has a classification accuracy of 80% and a good measure of separability of 85.5%. More so, it is not of poor fit.