Time Series Analysis and Forecasting of Caesarian Section Births in Ghana

Caesarian Section (CS) rates have been known to have geographical varaitions. The purpose of this paper was to determine Ghana’s situation (regional trend) and also to provide a twoyear forcast estimates for the ten (10) regions of Ghana. The data was longitudinal and comprised monthly CS records of women from 2008 to 2017. The dataset was divided into training and testing dataset. A total of eighty four (84) months were used as the training dataset and the remaining thirty six (36) months were used as testing dataset. The ARIMA methodology was applied in the analysis. Augmented Dicker-Fuller (ADF), KPSS and the Philips-Perron (PP) unit root tests were employed to test for stationarity of the series plot. KPSS (which is known to give more robust results) and PP test consistently showed that the series was stationary (p < 0.05) for all ten (10) regions, although there were some conflicting results with the ADF test for some regions. Tentative models were formulated for each region and the model with the lowest AIC was selected as the “Best” model fit for respective regions of Ghana. The “best” Model fit for Greater Accra, Central and Eastern regions were respectively SARIMA (2, 0, 0) (0, 1, 1)12, SARIMA (2, 0, 0) (0, 1, 1)12 with a Drift and SARIMA (1, 1, 1) (0, 1, 1)12. Additionally, the best model fit for Northern and Volta regions were SARIMA (3,0,2) (0,1,1)12 with drift and SARIMA (0,1,1) (0,1,1)12. Ashanti, Upper East and Western regions failed the JB test or the normality test for the residuals. Upper West and Brong Ahafo Regions were not suitable for forecasting due failure to depict white noise and ARCH test failure, respectively. The best models fit were used to forecast for 2019 and 2020. The results showed that regional variations of CS exist in Ghana. The study recommended for future studies to apply methods that will allow for forecasting for regions which failed the test under the methods used in this study.


Introduction
The main conditions which may warrant the use of Ceasarian Section (CS) method of delivery include previous CS, breech presentation, fetal distress, dystocia and multiple pregnancy. However, there is a major concern raised about the abuse of CS delivery, evidenced by the fact that the rates have exceeded the 15% reccommended by WHO. Repeated CS accounts for 33% of all cases and 68% of all new cases. Dystocia like previous CS is a significant contributer to CS rates [1]. Contemporary CS rates make it obvious that there are other factors other than the strict medical need that influence the decision to use CS method. This increasing trend has motivated research in the area to identify workable interventions to control its excessive abuse. Although postpartum pain is common among women who deliver babies either by vaginal or ceasarian modes, it has been reported in many studies that postpartum pain is rampantly observed within an extended period of time among women who have undergone ceasarian birth delivery [1,2]. CS delivery had also been associated with higher risk of severe acute maternal morbidity (SAMM) than vaginal delivery particularly in women who are 35 years or older [3]. Studies recommend the control of the CS through the use of Robson classification or policy instruments at the health system level [4]. There is little, or no hard evidence in attributions to support the escalating increase in CS over the last two decades. Th likelyhood of CS is known to be directly proportional to mother's age and inversely proportional to parity [5].
Time series techniques have been applied in many research areas including engineering, epidemiological studies, Education and Health. Onwuka et al. (2013), [6] modeled vaginal births in North Western Nigeria. ARIMA (0,1,2) was selected as the best model (model with the lowest AIC) and so was used to perform a two-year forecast which showed a marginal increase in vaginal birth for the years 2011 and 2012. The study used monthly data from 2008 to 2010 for the analysis. KSS and ADF tests were performed to check stationarity of the series. The residuals of PACF, ACF passed normality test since their lags were situated within the confident interval of Q-Q plot. Another study by Essuman et al. (2017), [7] applied Box-Jenkiss time series methodology to an 11-year data in Ghana spannining from 2004-2014. Tentative models were modeled with the best model (with the lowest AICc) selected and used to forecast for the year 2015. SARIMA (2,1,1) × (1,0,1) was the best model. ADF test was used to decide on the stationarity of the series. Yet another study applied time series to find an appropriate model to forcast total fertility rates in Malaysia from 2013 to 2040. Various tentative models were formulated but the best model was ARAR with MAE of 0.075, RMSE of 0.083 and MAPE of 3.292%. This model was used to forcast and revealed a decline which slowly leveled off and was expected to be roughly 1.2 (average number of children per women) for the year 2040. A 95% confidence level for year the 2040 was within the interval of 0.5 to 1.9 children per women [8].
Although CS delivery has increased worldwide, there is evidence of country rates disparity or international and geographical variations [6] which requires the Ghana's situation be established clearly. Since medical indications may vary internationally, there is the need for the CS delivery situation in Ghana to be known as well as projected rates so as to know its severity and put in place interventions to reduce it without compromising quality of care. A thorough understanding of the variations in the overall CS rates would require separate models (regional based) of CS in Ghana.

Dataset
Data for the study are a longitudinal data set comprising vital maternal variables of interest such as caeserean section, antenatal clinic registration of pregnant women, IPTp uptake, age of pregnant women, family planning, number of visits at the clinic, number of pregnancies and births by pregnant women, distance from their homes to the hospital, male partner attendance at clinic which were recorded over the period of 10 years (from January 2008 to January 2017). These events were recorded on a monthly basis.
The dataset was divided into training and testing dataset. A total of eighty four (84) months were used as the training dataset and the remaining thirty six (36) months were used as testing dataset.

Method
George Box and Gwilyn Jenkins (1970) developed the ARIMA methodology and hence was named after them. This methodology makes no assumption of the pressence of a particular pattern in the historical data of the series to be forcasted. The underlying assumption of time series is that the variable's own past can enable one to explain its current value which explains why before exogenous explanatory variables can even be considered, it is a necessity to first model the series' own past and thereafter its endogenous dynamics can be captured. ARIMA models are particularly appropriate for time series exceeding 50 observations [9].
ARIMA models are often generally writtten as ARIMA (p,d,q) representing p autoregressive (AR) terms, d orders of integration (the frequency of differencing required for the series to achieve stationarity), and q lagged moving-average terms. In ARIMA modeling, it is first required to ascertain if the time series variable is stationary. The stationarity assumption is a fundamental property for ARIMA processes. Stationarity of a series has certain basic statistical properties: (1) its mean is not time variant (2) the variance of the variable is not time variant (3) the series has no trend. Stationarity is achieved by differencing a nonstationary series. The Augmented Dickey-Fuller (ADF) test is commonly applied to ascertain if a series is stationary. When ADF test qualifies a series to be non-stationary, then the series has a 'unit root' and would require integration of order one -I(1). Differencing is an important step in the modelbuilding process since it is a sure way for the series to achieve stationarity. The series may require integration of order two -I(2) provided it still remains nonstationary after I(1); Several versions of the ADF test exist Further, a variable may require log transformation, when it is not stationary in its variance [10].
After obatining a stationary series, one can proceed with determining the AR and moving average (MA) terms. AR model produces a new predictor variable by making use of the Y variable lagged one or more periods.
Similarly, the general form of moving avearage (MA) model is given by equation (2) A mixture of the two to pure AR and MA models can yield a model known as autoregressive moving average (ARMA) with the general form: The 0$10 &, 2 model, meaning that the most recent p observations and the most recent q error component values are being autogressed [11].
The main goal is to correctly model the series' own past, using every information from this past. Unused information is revealed as autocorrelation in the residuals. In this instance, the residuals are not mirroring white noise, a term representing a signal or process that does not correlate in time, with independent random values with a normal distribution. To this end, the residual of the model must bear resemblance to white noise (no autocorrelation) which is tested using Ljung-Box test.
Model building is particularly done by comparing ACF and PACF of the original series to patterns typical for different AR and MA models. In practice, it is often not easy to detect differences between the ACF and PACF figures, so there are a number of additional criteria for the best-fit model: the lowest AIC (Akaike Information Criterion) and Schwarz Bayesian Criterion indicators indicate the most parsimonious model.

Descriptive Analysis -Time Series Plots for All Regions
Clearly from the time series plot for the three regions in Figure 1 shows upward or increasing trend for incidence of caesarean sections in Brong Ahafo, Eastern and the Central regions. The series plot also shows seasonal pattern for Eastern where incidence of caesarean sections peaks in January. In the context of Central and Brong Ahafo, there are cyclic movements of incidence of caesarean sections.  The time series plot for the three regions in Figure 2 shows upward or increasing trend for incidence of caesarean sections in Greater Accra, Northern and Upper East regions. The series plot for Greater Accra is higher compared to the other two regions due to fact that the region highly populated than the other two regions. The series plot shows seasonal pattern for Greater Accra where incidence of caesarean sections peaks in May. The time series plot for the four regions in Figure 3 shows upward or increasing trend for incidence of caesarean sections in Upper West, Western, Volta and the Ashanti regions. The series plot shows two strands of seasonal pattern for Ashanti where incidence of caesarean sections peaks in May and October.

Test for Stationarity
In checking for the stationarity of the variables, the Augmented Dicker-Fuller unit root test, KPSS and the Philips-Perron test were employed. It is evident from According to [12 and 13], PP and ADF unit root tests which were designed on the basis of the null hypothesis that a series is stationary have a low power of rejecting the null. As suggested by these authors [12 and 13] that KPSS unit root test eliminates a possible low power against stationary unit root that occurs in the ADF and PP Therefore, if there is a situation with conflicting or inconsistent results, KPSS unit root test results should be used as it yields more robust results.
In this regard, this study used the KPSS results for the unit root test and hence concludes that the variables attained stationarity at level, that is, the P-values of original series were smaller than 5%, which indicates we can reject the null hypothesis and conclude that the data is stationary.   Taken cognizance that ARIMA models are atheoretic models, the following estimated parameters for the selected models were tabulated. The following Figures capture ACF and PACF plots of residuals of the five selected models. All the selected models had seasonality and some of this seasonality was accounted for by significant lags in both the ACF and PACF plots.   average (sma1) ( Table 5 and 6) were significant in SARIMA (0, 1, 1) (0, 1, 1) 12 models. Lastly, SARIMA (3, 0, 2) (0, 1, 1) 12 with Drift was selected as the best model with AIC value of 711.57, mean percent error of 2.6 and p-value (LJung-Box test) of 0.69 for the Northern region. A p-value > 0.05 indicates that there were no significant autocorrelation between residuals at different lag times and the residuals were white noise. The estimated parameters such as the autoregressive at lag one (ar1), autoregressive at lag three (ar3), moving average (ma1), the drift or the linear trend and second moving average (ma2) ( Table 7) were significant in SARIMA (3, 0, 2) (0, 1, 1) 12 with Drift.

Parameter Estimation and Model Validation
Results of the model diagnostics ( Table 2) suggest that models for Ashanti, Upper East and Western Regions failed the JB test or the normality test for the residuals. Implications are that these models cannot be used for forecasting. It must be noted that there are other methods that may be applied in such a situation. One of such methods is intervention time series analysis but this was not considered in the present study. The main reason behind the non-consideration of this method is that there is no evidence in any of the regions and by extension Ghana with regards to an intervention for caesarean section.
Another approach could have been adding exogenous variables to the model to facilitate the forecasting process. This approach was also not considered. We suggest these approaches should be considered in future studies for time series analysis of incidence of caesarean section in Ghana. We also suspect that some of the unusual values or outliers recorded in the data set maybe due to data entry error. Our conviction stems from the fact that some unusual values specifically low values of 68, 92 and 98 recorded for Western region triggered multiple (three) point interventions. It is worth noting that these three low values (outliers) in the Western data set was the highest number of low outliers in the entire data set for all the regions.
The residuals for Upper West Region do not characterize a white noise and for that matter was not used for forecasting. Lastly, the postulated model for Brong Ahafo Region suffered the ARCH effect or failed the ARCH test and can be well suited for ARCH/GARCH model which was not considered by the present study.

Conclusions
The study employed ARIMA methodology to determine Ghana's situation (regional trend) and also to provide a twoyear forcast estimates for the ten (10) regions of Ghana. The study formulated tentative models for each region and the model with the lowest AIC was selected as the "Best" model fit for respective regions of Ghana. The study used the best models fit to forecast for 2019 and 2020. The results showed that regional variations of CS exist in Ghana. The study recommended for future studies to apply methods that will allow for forecasting for regions which failed the test under the methods used in this study.
To lower the incidence of CS in the regions of Ghana, we recommend that non-clinical educational interventions such as educational games, materials, meetings are introduced to groups of low-risk women. Also in the various communities, opinion leaders are used to reduce caesarean rates in groups of low-risk women.