Classification of Incident Types of Hematologic Malignancy Using Discriminant Analysis at Kinshasa University Clinics, DR Congo

The objective of this study was to identify important biomarker differences between absence of HM and expected morphopathologic types of HM. A retrospective analysis study of adult patients aged ≥ 20 years was managed by cytologic aspects such as normal myelogram vs. HM types between 2009 and 2015. Out of 105 patients, 63 (60%) experienced incident HM while 42, 14, 18, 10, 10, 6, and 5 patients had normal myelogram, multiple myeloma (MM), acute myeloid leukaemia (AML), myelodysplastic syndromes (MDS), chronic myeloid leukaemia (CML), acute myeloid leukaemia (CLL) and acute lymphoid leukaemia (ALL), respectively. In Discriminant Analysis (DA), only levels of transfusion, Hb, and WCC discriminated significantly (Wilks lambda =0.159; P < 0.0001) the study groups through Function 1 [Eigen value (EV) = 2.591; cumulative variance (CV) = 78, 7% and Canonical correlation (CC) = 0.849], Function 2 (EV = 0.619; CV = 97.5%; CC = 0.618), and Function 3 (EV = 0.081; CV = 100%; CC = 0.274). The highest Mahalanobis distance (Min D Squared = 0.162) was observed between CML and MDS. For early diagnosis, precise medicine, and good practice in hematologic oncology, DA separated CML, MDS, MM, AML, CLL, and ALL from normal myelogram in Congolese patients.


Introduction
Globally malignant tumors are becoming a leading cause of morbidity, mortality and disability in both rich and disadvantaged countries [1][2][3][4][5]. Specifically, hematologic malignancy sub-types are emerging whilst their epidemiologic features [5][6][7][8] and diagnosis remain challenging in low-income settings such as Asia and sub-Saharan Africa [8][9][10][11]. Lymphoid types found in 70.96% of patients [12] define the burden of hematologic malignancy classification. However, there are no comprehensive classifications of hematologic malignancy sub-types among patients managed in Kinshasa University Clinics (KUC), a tertiary academic hospital in Democratic Republic of the Congo (DRC). Therefore, the objective of this study was to identify independent variables, which are significantly associated with hematologic malignancies using discriminant analysis (DA) among black patients diagnosed with anemia at KUC.

Materials and Methods
This was a retrospective analysis study of adult patients aged ≥ 20 years diagnosed with anemia. Patients were investigated further using cytology in order to either confirm or rule out the presence of an underlying HM subtype. The study was conducted between 2009 and 2015 at KUC, Department of Chemical pathology. Because there was insufficient and inaccurate diagnosis as well as other information of incident HM from African literature [1,13].
For the sample size (n), the total number of patients was NI=4x(Zx) 2 x IC (1-IC)/w 2 where 2x=1.96. The standard normal deviation for a two-sided α=0.05, 1-0.05=95% Confidence level, and W=total with of confidence level =0.20. Then the total number of patients with anemia required was Ni=96 rounded to 100 (+20%) of potential misses = 120.
The variables of interest were gender, age, year of initial diagnosis, outcome (fatal or non-fatal), clinical features, blood counts, peripheral blood films and bone narrow morphology including cytochemical staining techniques. Results from the latter tests were red and interpreted by two independent senior pathologists in the department. In cases where discrepant findings were observed between the two pathologists, a third opinion was sought.

Definitions
Anemia was defined by hemoglobin < 12 mg/dL in men and hemoglobin < 11mg/dL in women. Blood variables included hematologic parameters such as hemoglobin hematocrit, white Cell Count (WCC), platelets, and erythrocyte rate sedimentation (ERS) at 1 hour (1H). Hematological malignancy morphological sub-types were diagnosed according to French America British (FAB) Classification (2008) and supplemented by WHO criteria for hematologic cancers (2008).

Statical Analysis
Continuous variables were expressed as means ± standard deviation (SD) when normally distributed, while categorical variables were presented as frequency (count=n) and proportions (%).
In univariate analysis, student's t-test was calculated to assess differences between 2 groups and analysis of variance (ANOVA) to compare means between ≥3 groups. Multiple comparisons of means were computed using Post Hoc Bonferroni pair means at considering TYPE I error rate of 0.05. Chi-Square test was used to compare percentages of categorical variables between groups.
In multivariate analysis, DA was used as the model of the conventional classification techniques at discriminating a single categorical variable using multiple attributers such as normal myelogram and different sub-types of hematological cancers. DA used canonical variables that would maximally differentiate (classify) group membership within patients with anemia. The important underlying assumptions of DA were stated as follow: (i) each predictive variable was normally distributed; (ii) there must be homogeneity of covariance between sub-types and normal myelogram; (iii) there must be at least 2 groups with each sub-group belonging to only one group so that the groups were mutually exclusive and collectively exhaustive; (iv) the groups should be characterized before collecting the data; (v) the predictive variables considered to separate the groups should classify quite clearly between the groups so that each category overlap was clearly non-existent or minimal; (vi) and groups sizes of the dependents should not be grossly different and should be at least 5 times the number of independent variables.
The box's test of Equality for covariance Matrices was considered to check the assumption for homogeneity of covariance across the categories.
Mahalanobis distances were computed at supporting the classification of canonical variates into distinct sub-types and normal myelogram among groups' centroids at determining the degree of segregation with each Wilk's Lambda value closer to zero being the evidence for well-separated groups. Thus, a multi-dimensional generalization of the idea of measuring the distance between a point P and a distribution D mean (how many standard deviations away from P).
A P-value < 0.05 was considered with significant differences. All analyses were computes using Social Package for Social Sciences (SPSS) version 22.0.

Results
In total, 105 patients were diagnosed with anemia. Some patients had an underlying infectious syndrome (fever, leucopenia, and hyperleucocytosis) whilst HM was suspected of being the underlying cause of anemia among the rest. All patients were managed at CUK. Proportions of HM types did not vary (P > 0.05) between males and females (results not shown). The median age was 50 years for HM patients. Table 1 present the comparison of means values of conventional markers across the management of HM types using ANOVA following a univariate analysis. As compared to cases without HM, all patients with HM were of old age irrespective of the type of HM. The lowest hemoglobin level was found in patients with AML, the highest number of previous blood transfusion was determined in MM cases, and as expected, the highest levels of white cell counts (WCC) were determined in patients with CML. Paradoxically, WCC were elevated in patients diagnosed with CLL but showed to be the lowest in cases with ALL and AML. The highest levels of ESR were found in patients with MM.
Furthermore, Post Hoc test Bonferroni did not show significant differences for the majority of markers between HM groups. However, this Post Hoc test demonstrated significant delineating means of each HM type in comparison with counterpart levels from bone marrow normality group.
In multivariate DA, tables 2-6 summarized tests of equality group means with Wilks Lambda, boxes test of equality of covariance matrices, summary of canonical discriminant functions, canonical discriminant function, and classification function coefficients. For caution, first 3 canonical discriminant functions were used in the analysis showed in the table 4.

Discussion
This is the first Sub-Saharan study to use Anova and discriminant function analysis at distinguishing among HM subtypes based on some conventional hematologic and therapeutic markers at CUK, DRC.

Univariate Analysis
The present study showed significant associations between aging, decrease in hemoglobin but marked increase in both blood transfusion, white cell count, platelet count, ESR and HM. However, sex did not impact on these incident HM types. Discriminant Analysis at Kinshasa University Clinics, DR Congo Thus, the present study confirmed the role of longevity on HM epidemics in both developed countries [9,14], and low and middle-income countries including African countries such as DRC [8,15]. Concurrent presence of anemia and aging is explained by emerging cancers well reported by previous different Authors [16,17]. Aging itself is a strong oxidant process [18] There is also evidence about insignificant relationship of aging and the pathobiology of the clonal myeloid diseases (leukemias) [19].
This study conformed highest proportion of AML and MM as previously reported in Kinshasa, RDC [15] and in the other parts of Africa [20].

Multivariate Analysis
The present study showed that multivariate mathematical functions obtained by combining numbers of previous blood transfusion, hemoglobin levels and WCC were capable of discriminating different types of HM including MM, AML, MDS, CML, CLL, ALL as well as patients with normal bone marrow findings (absence of HM) with a dummy-coded dependent variable.

Explanation and Prediction
In this study, DA derived which set of conventional markers most and significantly related to these HM subtypes and way for predicting HM subtype membership.
Thus, the most discriminant functions derived were one less than 7 groups in the dependent variable HM or bone marrow normality of independent variables (blood transfusion, hemoglobin and WCC).

Function by Function
The first discriminant function (cumulative variance = 78.7%) derived from the data explained most of the betweengroup variances; the second discriminant function (cumulative variance = 97.5%) explained the next largest piece of variance and as did the third discriminant function (cumulative variance = 100%). This means that these functions were not correlated with each other. Furthermore, all the markers were entered in DA models using a stepwise strategy starting with the most discriminating markers.

Eigenvalues
Eigenvalues and their related correlations were applied to judge the most discriminating diagnostic indicators (previous blood transfusion, hemoglobin levels and WCC) and hematologic markers (diagnosis and severity markers).
Indeed, Eigenvalues represented the amount of variance explained by a discriminant function: Eigen value= 2.591 for Function 1, Eigenvalue = 0.619 for Function 2, and Eigenvalue= 0.081 for Function 3.
In this multiple regression, each marker was weighted when those weights produced a discriminant score for each subject as compared to the centroid group (mean of the discriminant scores of give subtype).
Both Factor analysis and DA used Principal Component's analysis on a matrix of indices rather than among markers.
Then, Varimax Rotation was used to increase the interpretability of obtained functions in this study.

Wilks' Lambda
It was necessary to use Wilks' lambda at measuring the correlation between all markers (independent variables) and HM subtypes (dependent variable) in the present study. The scores helped the members of each HM subtype to be classified as the process was well functioning with correct and incorrect percentages.
DA produced raw coefficient (like b in multiple linear regression), standard coefficient (like betas), and structured coefficients (like in canonical correlation). The raw coefficient served to calculate scores for each marker. The standardized coefficients were used to express the relative importance of the independent markers with they were correlated. However, like betas, standardized coefficients, unstable, were interpreted with caution.
Structure coefficients or loadings represented the association between the discriminant score for each marker and the scores on the original variables. The square of those coefficients was the proportion of variance in a particular marker explained by the discriminant functions.
The discriminant function estimations were very sensitive to the assumption of normality. Hosmet and Zemeshow calculated the magnitude of overestimated association.

Implication for Chemical Pathology and Perspectives for Public Health
The present findings will impact routine practice, training, capacity building, and research related to HM subtypes at CUK, RDC, a country facing complex social, economic, and political crises. Therefore, integrated and collaborative research, advocacy and prevention, are needed between Congolese universities, African states and international organizations (American Society of Hematology, NCI, West Africa College of Physician, Africa Organizations) for Research and training in Cancer related to hemato-oncology [1].
Indeed, RDC and other low and middle-income countries do experience diagnostic challenges since their capacity in hematopathology remains limited, and pathologist scarcity is of major concern due to critical health professional shortages [1].
In fact, research on HM subtypes will be deeply carried to improve early diagnosis and treatment of HM at CUK, DRC.

Strength and Limitations
Use of discriminant analysis was the strength of the present study with calculated sample size. However; this study was limited to some degree. The lack of cytogenetic, detailed clinical information, poverty, were many of these limitations. The present findings cannot be generalized to other Congolese hospitals as well as to the general Congolese population.

Conclusion
This is the first African study with aging and emerging HM subtypes among anemic patients with highest proportion of acute myeloid leukemia and multiple myeloma. For early diagnosis, precise medicine, and good practice in hematologic oncology are warranted. As a result, DA is found promissing in separating CML, MM, CLL, MDS, AML, and ALL from normal myelogram in Congolese patients.