Modelling the Sex – Specific Prevalence of Cancer Types in Mpumalanga and Eastern Cape Provincial Hospitals in South Africa

Cancer has been identified to be a major community health issue of concern to many societies. This is of particular interest when it comes to the developing South Africa. The epidemiology of cancer cases has been made known, though still under study. This research intended to understand the prevalence of different cancers and suggest preventive measures to reduce the burden of the disease and furthermore, reduce the effect of destruction to those affected in good time. The methods for data collection and overall treatment classified the study to be a cross-sectional study whose data were collected by use of a questionnaire. The questionnaire focused on variables such as counts of breast cancer, cervix cancer counts, oesophageal cancer counts and counts of other types of cancer. The analysis was analysed by use of descriptive and inferential analyses. Outcomes were well tabulated and interpreted. The results were obtained by the application of a number of methods, which were used to perform the analysis for this study. The methods were: descriptive analysis, T-test comparisons and some were complemented by error bar plots and box-plots. The following were some of the observed results for the indicated variables: Breast Cancer: Mean (201.4545), Std Dev (18.62452), 95% Ci (164.21, 238.70); Kaposi Sarcoma: Mean (29.4167), Std Dev (6.76163), 95% Ci (15.89, 42.94); Prostate Cancer: Mean (7.7500), Std Dev (.71217), 95% Ci (-1.67, 17.17); Lung Cancer: Mean (6.9167), Std Dev (.67848), 95% Ci (1.56, 12.27); Choriocarcinoma: Mean (5.3333), Std Dev (2.77434), 95% Ci (-0.22, 0.88). It is quite fitting to understand that this research as a revelation of the establishment of some very important outcomes. Of great significance, was the discovery that breast cancer among women continued to destroy the female gender in the communities where the data were collected. Results further show that cervix cancer is another cancer on the rise with a higher prevalence rate in the stated communities.


Background
Cancer has become one of the deadliest and silent human killers in the world. It has killed children, women of all categories, men, including cancer medical specialists and more. It has not particularly discriminated against countries, making it a serious concern by all countries. Furthermore, it has not even discriminated against gender. Both males and females have breasts [1]. This calls for the attention of medical professionals, to curb this menace. High-level credit goes to hospitals for being custodians of medical professionalism equipped with well-trained doctors and nurses whose principal objective is to provide care for the sick and others, who may require medical and related professional engagements, including cancer [2]. Thus, medical professionals deal with both simple as well as complex medical situations resulting from cancer. Most hospitals have a hierarchical structure with well-defined roles and regulations. The distribution of power and authority depends on the placement in the hierarchy, and responsibilities are well-defined for all members [3]. This journal article intends to present a quantitative analysis and existing information on the effects caused by cancer as both a terminal disease and its prevalence in selected parts of South Africa. In addition, there will be exposure on the type and seriousness of different observed cancer types. A number of results have been observed through collected data and statistical computations. According to available data, and according to breast cancer is the most common cancer in women based on information available on the IARC and WHO database [4]. This study has also established similar information with regard to breast cancer [5].
Cancer, like any other disease, requires the collaborative approach for its management. According to all types of information management and decision-making on all levels in an organisation are interconnected [6]. Similarly, information is communicated to tactical and strategic levels if patient care needs at the operational level change, which might lead to changes in resource allocation in or between units. Within the 27 countries of the European Union (EU27), the highest female breast cancer European age-standardized mortality rates for 2008 were estimated to be in Ireland (31.1 deaths per 100 000 women) [7][8][9][10][11][12][13]. While the lowest was in Spain (18.4 deaths per 100 000 women), 26 Non-metastatic breast cancer is by far the most frequent cancer among women with an estimated 1.38 million new cancer cases diagnosed in 2008 (23% of all cancers). This ranks second overall (10.9% of all cancers) [14][15][16][17][18][19][20]. The incidence rates vary from 19.3 per 100 000 women in Eastern Africa to 89.7 per 100 000 women in Western Europe, and are high (greater than 80 per 100 000) in developed regions of the world (except Japan) and low (less than 40 per 100 000) in most of the developing regions [16,17]. As a result, breast cancer ranks as the fifth cause of death from cancer overall (458 000 deaths), but it is still the most frequent cause of cancer death in women for both developing (269 000 deaths, 12.7% of total) and developed countries. Furthermore, in developed regions, the estimated 189 000 deaths is almost equal to the estimated number of deaths from lung cancer (188 000 deaths) [21][22][23] The data for this study has been collected from South African government hospitals. Some government hospitals (such as the Nelson Mandela Academic Hospital have contributed to this cause. The data were collected on all available cancer types in the stated hospitals. Research assistants with the aid of questionnaires collected the data. The variables collected included: cancer counts of different types, counts of hormonal treatments, counts of chemotherapy treatments, the number of those who die from cancer-related causes, the number of referrals to different hospitals and others to social workers, etc. This can be observed from the accompanying analysis displayed in tables and charts.
The analysis consists of the general descriptive statistics, analysis of variance, the T-test of differences between two selected population means and correlation coefficient analysis. Charts and tables have been included for clarity. All these analyses have been supplemented with a spectrum of qualifying additional analyses, such as plots. Overall, this study intended to understand the destruction caused by cancers, the future effects of uncontrolled cancer types, and the establishment for a better solution to those affected and the determination of new approaches aimed at early detection and furthermore, to institute more preventive measures.

Methods
This was a cross-sectional study. A number of hospitals were involved in the collection of the data on different cancer types, making this to be an important study. The data were collected by a stratified random sampling design. The data were collected on several variables namely: cancer counts on women, cancer counts on men, use of different cancer management procedures including chemotherapy and hormonal therapy. Other variables included; counts of those who die from cancer, counts of children victims of cancer, cancer treatment by use of radiotherapy, where high-energy rays are often used to damage cancer cells and stop them from growth and multiple divisions.
The team used a questionnaire to collect the required data. The participants were drawn from patients, who were guided by either doctors or professional nurses. The researchers who designed the questionnaire were experienced doctors, professional nurses who were involved in teaching at medical schools and doctors who were participating in research work where questionnaires were used and were practicing doctors. Issues indicated in the questionnaire were well-considered and the results well-discussed. Members of the research team comprise of a multidisciplinary approach, having specialised in different fields.
The data collected were of the four scales of measurement. While some were of the nominal scale of measurement; others were ordinal; others were interval whereas others were of the ratio scale of measurement. The data were so collected that some comparisons could be performed to determine existing differences, to determine the existence of significant relationships and create other data summaries that carry sense and that could be used to make informed decisions. Plots and tables have been intentionally performed to add value to the analyses.
The participation of selected patients was an issue that had no problem both from the point of view of the study objectives and from the statistical principle of randomisation. The data were in line with the projected objectives and the randomisation idea was naturally taken care of by the random arrival of patients in a hospital. When patients leave their homes to travel to a hospital, the statistical principle of randomisation is fulfilled in the sense that no bias is involved, no body participated in asking the patient to be sick and go to hospital. It is a natural occurrence. Thus, records were completed without any influence. The research team involved in the study meets weekly to conceptualise the research issues and discuss the data, its capturing and the analysis performed by the Statistician. The analysis is usually accompanied by comments made by the Statistician.
The calculations include simple descriptive statistics, inferential analysis and plots of relevant types to the data and to the research questions.
Comparative analyses of cancer management in the affected hospitals Correlation analyses among pairs of selected cancer types The following table presents the output of correlation analysis determine over the stated variables. The table contains the Pearson correlation coefficients, the p-values and the sample sizes. The computation was processed in such a way that a p-value has either one asterisk or two asterisks. The one asterisk symbolises significance at 0.05 level of significance while two asterisks show that the observed correlation is significant at 0.01 level of significance. The table holds only a few correlations for some selected pairs of variables. Significance is determined by a comparison of the observed p-value and the chosen level of significance. The rule of thumb is that if the observed p-value is smaller than the level of significance, the null hypothesis is rejected and not rejected otherwise. Oncology comparative Data Analyses

Two-variable periodical comparisons of cancer prevalence
This study chose to make two-variable comparisons to understand the existence of any difference between the selected variables with regard to cancer. The following comparisons were made between two adjacent months and further, between the two-recorded genders. The comparisons were made by use of the independent T-test statistic, which compares means of the two selected populations. The independent T-test uses the logic that the comparison is focused on different gender means within the same period. The time in space comparison is to understand the influence of time specific and gender difference on the prevalence of cancer. The T-test performs this test by use of the following formula by which a t-test statistic is determined.
Where; t is the test statistic; S 2 is the pooled observed sample variance; X and X are the means of samples drawn from the two populations for comparison; and are two sample sizes for data drawn from the two populations.
Here t follows the T-distribution with + − 2 degrees of freedom.
The practical comparison is based on the use of either the t test-statistic or the observed p-value. These are compared to the tabulated t-value or to the suggested level of significance, depending on the choice of the test.
As for the present comparisons, the researchers used the pvalue to compare with the level of significance (0.05). Thus, an observed mean difference will be significant if the calculated p-value is smaller than the level of significant, leading to the rejection of the null hypothesis. However, if the Table 3.

Charts Used to Compare Population Means Using Error Bars
The chart below compares cancer prevalence in the months of April 2019 and May 2019. It is observed from the error bars that the April mean of 122 is higher than that of May with a mean of 115. The difference is 7. A t-test will determine whether a difference of 7 is significant or not. The following figure presents a chart, which compares cancer prevalence in the months of April and May 2019. In the chart, there are box-plots, which show the median values for the two months. A direct observation notices that the May 2019 median is higher than the April 2019 median. The chart below compares cancer prevalence in the months of June 2019 and July 2019. It is observed from the error bars that July with a mean of 141 is higher than that of June with a mean of 114. The difference is 27. It remains to use the T-test to understand the significance of the difference of 27 under the prevailing conditions. The following figure presents a chart, which compares cancer prevalence in the months of June and July 2019. In the chart, there are box-plots, which show the median values of the two months. A direct observation notices that the July 2019 median is higher than the June 2019 median. These box-boxes tell us that the July 2019 data were more dispersed than the June data. The whisker distances from the median are compared to arrive at this conclusion. The chart below compares cancer prevalence in the months of August 2019 and September 2019. It is observed from the error bars that the August mean of 115 is higher than that of September with a mean of 113. The difference is 2. It can be shown that a mean difference of 2 under the conditions of this test cannot be significant. The following error bar plot presents a comparison between the cancer counts for October and November 2019. The average count for October is observed to be 132, while that for December is 130. The difference between the two months is just two. There is a high possibility that the difference between the two months is not significant. The following box-plot compares quantile position averages over the two months of October and December. Both the box-plots show right-skewedness for each of the two months. The difference is that though they are both right-skewed, the spread for October data was more than that of November data. The following box-plot compares quantile position averages over the two months of December 2019 and January 2020. Both the boxplots demonstrate right-skewedness for each of the two months. The difference is that though they are both right-skewed, the spread for January data was far more than that of the December data. It can be observed from the plot that the median for December was 15 while that for Januarys was 60 counts. It is understood that individual quartile variations depend on the month of data count. While January 2020 showed a higher degree of variability over-all, the December data analysis shows a less significant variability. However, both of the two months have an advantage over the other separately. January has a high maximum observation whereas December has a lower variance. The following box-plot compares the quantiles of for the months of February and March. The figure has a lot in common with the above figure. This means that the interpretation will be of the same form and approach. Thus, it can be observed from the plot that the median for February 2020 is 30 while that for March 2020 is 60 counts. It is an understanding that individual quartile variations depend on the month of data count. While March 2020 shows a higher degree of variability over-all, the February data analysis shows a less significant variability. However, both of the two months have an advantage over the other separately. March has a higher maximum observation whereas February has a lower variance.

Discussion
This study was described to be exploratory. One of the main objectives of this study was stated earlier to be a comparison of the prevalence of the different cancer types. This knowledge would help in the promotion of a deeper understanding of the individual cancer types. The prioritisation of a more advanced understanding of the destruction caused by different cancer types will be decided by a direct comparison of the observed statistics from the analysis. In the table below, breast cancer is seen to be the most prevalent with an observed mean of 201.46, a standard deviation of 18.63 and an estimated 95% confidence interval of (164.21, 238.70). The second most important cancer is observed to be Cervix cancer, which averaged 101.58 with a standard deviation of 22.08. This type of cancer had a 95% confidence interval estimate of (57.43, 145.74). The third cancer type in the order of decreasing average was Kaposi Sarcoma, which had a mean statistic of 29.42, a standard deviation of 6.76 and 95% confidence interval estimate of (15.89, 42.94). Other statistics can easily be read from the table below. These findings have been supported by who claims that breast cancer and cervix cancer have the highest contributions to cancer among women. It is documented, however, that oesophageal cancer is least prevalent [5]. This claim has further been proved by this research. The high level of prevalence of breast cancer and followed by cervix cancer is strongly supported by, who claimed that breast cancer is the most common cancer, which mostly affected women [1].
The following box-plot compares quantile position averages over the two months of October and December. Both the box-plots show right-skewedness for each of the two months. The difference is that though they are both right-skewed, the spread for October data was more than that of November data. It can be observed from the plot that the median for October is 50 while that for November is 70 counts. The individual quartile variations depend on the month of data count. While November shows a higher degree of consistency, the October data analysis shows a more significant variability. Both the months have an advantage over the other.

Conclusion
This research has established some very important outcomes. Of great significance, was the discovery that breast cancer to women continued to be destructive to women in the community where the data were collected. Another established cancer type is cervix cancer, which was ranked second to breast cancer. Breast cancer has affected men as well, though the data collected did not provide the statistical opportunity to establish good comparative results. Different treatments have been compared. Most inferential analysis using the T-test over gender and period have shown no significance at the 0.05 level of significance. The included error plots and boxplots have further confirmed this. It has been noticed that due to emerging questions, the questionnaire has been reconstructed to include other variables of importance.

Author's Contributions
WC, JSN and ZJ designed and analyzed the statistical data for the study. BS, SS, OM, NW and LMB supervised the study. All authors have read and approved the final and revised version of the manuscript.

Conflict of Interest
The authors declare no conflict of interest.