Mammogram Quantitative Features Associated with Histological High-Grade Breast Cancer

High grade breast cancer is recognized as more aggressive cancer type and is the worst survival prognostic. To explore the association of quantitative features extracted from mammograms with histological high-grade breast cancer. We conducted a retrospective study using an open source data got from figshare repository. These anonymized data were collected and used for a study approved by the institutional review board. Cranio-Caudal (CC) and Medio-lateral (MLO) mammograms and their tumor segmented images from 66 patients subdivided in two groups high histological grade (n=23) low-grade (low and intermediate, n=41). From breast cancer image segmentation, we extracted 480 features using python software radiomics package Pyradiomics 2.2. With the features extracted from CC and MLO images, we used them separately for histological high-grade breast, relevant feature selection. We performed univariate feature selection based on ANOVA test using machine learning python package: sklearn. A feature was considered relevant when P value is at least 0.05. At the end we represented the boxplot of the distribution of the low-and high-grade subject using each relevant feature selected. Twenty (20) CC images features were selected, seventen (17) were based on wavelets and three (3) were from original image. Their p values were ranged between 0.017 and 0.046. In the case of MLO features, four (04) relevant features were exclusively based on wavelets with 0.046 as the maximum p-value and 0.006 as minimum. These results suggested mammogram quantitative feature based on wavelets will be useful for high-grade breast cancer identification on mammographic image. In this study we explored the association between IBSI 2D quantitative features from mammogram with the histological high-grade breast cancer. Finally, we recorded twenty (20) relevant features from CC projection and four for MLO mammogram projection. Wavelets based features were more represented in relevant quantitative feature.


Introduction
Breast cancer is the most common cancer in women and a leading cause of cancer death worldwide [1]. Management of breast cancer relies on the availability of robust clinical and pathological prognostic and predictive factors to guide patient decision making and the selection of treatment. Histological grade is one of important prognostic factor. It is based on the degree of differentiation of the tumor tissue and based on the evaluation of three morphological features: (a) degree of tubule or gland formation, (b) nuclear pleomorphism, and (c) mitotic count. It is used to categorize breast cancer patient in three clinical groups grade I (low), grade II (intermediate) and grade III (high) [2]. High grade breast cancer is recognized as more aggressive cancer type and is the worst survival prognostic [3,4].
Motived by the high-grade breast cancer characteristic description on medical image, Lamb et al found in their study that its classical appearance is a mass with round shape on mammography [5]. SHIN et al. had also attempted to described morphological aspect on mammogram because mammography is one of the primary breast imaging modalities used in breast cancer diagnosis. They found that having Fairly slow developing, grade I tumors (low grade) and grade II tumors (intermediate grade) present a stroma reaction resulting in imaging by spicules while high grade with rapid evolution, do not develop a stroma reaction and have a round shape [6]. These finding suggested that the high-grade breast cancer presents a particular characteristic on mammogram.
With the radiomic advent, quantitative features were used to describe the breast cancer characteristic. Those features are created from computer analysis of images, either alone or with the guidance of a reader to assist in segmentation. These imaging variables must be well-defined to limit inter-or intra-observer variability [7]. In this context several features and different software were proposed for using in tumor biological phenotyping [8]. Among all its many proposals, the Image Biomarker Standardization Initiative (IBSI) [9] had performed a selection in order to contribute to the reproducibility research. Some software such as Lifex, Pyradiomics had been developed based on IBSI.
In this study, we explored the association of IBSI quantitative features extracted from mammograms with histological high-grade breast cancer. We used pyradiomics for feature extraction and univariate feature selection method for relevant feature identification.

Patients Data
We conducted a retrospective study using an open source data downloaded from figshare repository [10]. These anonymized data were collected and used withing a study that was approved by the institutional review board. It aimed to establish an association between digital mammography radiomic and breast cancer OncotypeDX and PAM50 (Prosigna Breast Cancer Prognostic Gene Signature Assay) recurrence scores. The study englobes a total of 71 breast cancer cases with clinicopathologic information (age, Tumor size, regional lymph Node status, and distant Metastases staging, Estrogen Receptor, Progesterone Receptor, and Human epidermal growth factor receptor 2 status), digital mammograms (Cranio-caudal CC and Medio-Lateral Oblique MLO), microarray data and tumor segmentation on mammograms images. A digital mammography system (Selenia, Hologic, Bedford, MA), with an automatic intensity adjustment was used to acquire mammogram of 70 microns per pixel and 12-bits grayscale for codification. Manuel segmentation of tumors were performed by an experienced breast radiologist [11]. Five (05) patients were excluded because their histological grading status is missing. Amongst the sixty-six (66) patients of our cohort, twenty-three (23) were high grade, three (03) were intermediate grade, and six (06) had low histological grading status with respective mean age of 50, 50.5 and 54 years.

Radiomic Morphological Features Extraction
Dicom Mammograms and tumor segmentation images were decompressed with the open source Dicom viewer software MicroDicom 2.7.9 Tumor segmented images were rescaled between 0 and 1 grayscale with the python package ITKsimple [12]. The tumor region segmented from each mammogram view was used to extracted 480 features using python software radiomics package Pyradiomics 2.2 [13] ( Figure 1). With the features extracted from CC and MLO images, we used them separately for histological high-grade breast, relevant feature selection.

Statistical Analysis
We performed univariate feature selection based on ANOVA test using machine learning python package: sklearn [14]. This method had been used by some authors for relevant features selection [15,16]. A feature was considered relevant when P value is at least 0.05. At the end we represented the boxplot of the distribution of the low and high-grade subject using each relevant feature selected.  (3) were from original image. Their p values were ranged between 0.017 and 0.046. In the case of MLO features, four (04) relevant features were exclusively based on wavelets with 0.046 as the maximum p-value and 0.006 as minimum.     CC mammogram features selected p-values were ranged between 0.017 and 0.046 (Figure 2). Regarding the case of MLO features, p-values were ranged between 0.006 and 0.046 ( Figure 3). These results suggested mammogram quantitative feature based on wavelets will be useful for high-grade breast cancer identification on mammographic image.

Results
The differentiation capacity of features selected were demonstrated with low and high-grade boxplot using each feature (figures 4&5). Means of low and high-grade groups in according to each feature were statistically different.

Discussion
In this study, we assessed the association of IBSI 2D quantitative imaging feature with high grade breast cancer. Our works represent a preliminary study of high-grade breast cancer identification using quantitative imaging features extracted from mammogram. We recorded that some features specially those based on wavelets were relevant to histological grade. These results suggested mammogram quantitative feature based on wavelets will be useful for high-grade breast cancer identification on mammographic image. This finding is agree with Huang et al. study which used quantitative features, extracted from PET and MRI, for breast cancer histological grading decoding [17]. They extracted 104 features from both modalities images for each patient. Ten (10) features were selected like the best relevant for the breast cancer histological grade prediction. Those relevant features were only from PET images and allowed to achieve the best predictive performance (AUC=0.76). Also Fan et al. used in recent study used the same kind of quantitative feature extracted from dynamic contrastenhanced magnetic resonance image (DCE-MRI) for breast cancer histological grade prediction [18]. They have selected 24 optimal features over the total of 130 image features extracted to achieve their optimal predictive performance (AUC=0.82). After those previous studies, our study showed that mammography which is morphological medical imaging modality could be used for high grade breast cancer decoding. This finding could be helpful in developing countries, because mammography is more accessible than MRI and PET.
There are limitations to this study. Manual segmentation does not allow to find with more accuracy breast cancer margin on all mammograms mainly of young subject who has more dense breast. Mammography is planar medical imaging modality leading the superposition of several glandular structures with breast tumor. These two realities could contribute sometimes to the inaccessibility of the real breast cancer margin. The small size of our cohort is also a limitation.

Conclusion
In this study we explored the association between IBSI 2D quantitative features from mammogram with the histological high-grade breast cancer. Finally, we recorded twenty (20) relevant features from CC projection and four for MLO mammogram projection. Wavelets based features were more represented in relevant quantitative feature.