Synthesis of Mammographic Images Based on the Fractional Brownian Motion

: This paper presents a new approach for synthesizing breast tissue images based on a random fractal process, the fractional Brownian motion (fBm). This work deals with modeling Regions of Interest (ROIs) of mammographic images. Diverse synthetic ROIs were generated: healthy ones and others with microcalcifications according to fatty and dense tissue. Microcalcifications were injected in several dispositions in order to model benign and malignant cases. The aim of this study resides in two points: (1) the generation of synthetic images of mammograms for researchers and radiologists in order to test their tools and orient the choice of their parameters to enhance the diagnostic accuracy; and (2) to compare two microcalcification segmentation approaches: ‘Sq-Sq’ approach based on multifractal analysis and the ‘MM’ approach based on Mathematical Morphology. In fact, the results proved that the ‘Sq-Sq’ method can detect microcalcifications with different arrangements for any type of tissue and were evaluated using a qualitative test by an expert and a quantitative one based on the Area Overlap Measure (AOM) and the Dice coefficient. The ‘Sq-Sq’ approach yield a mean of 0.8±0.06 for AOM and 0.8446 for Dice coefficient for all segmented images.


Introduction
Breast cancer is among the leading cause of death for women [1]. The first signs of breast cancer are microcalcifications, which are small calcium deposits in the breast tissue. Mammography is the most widely used for their detection [2,3]. Clinical studies have confirmed that the survival rate is considerably increased if anomalies are detected at early stages [4]. Their detection is a hard task in medical imaging due to several factors such as their irregular form and their small size which varies from 0.1 to 1 mm [4,5]. They are approximately nodular, but with irregular shape arrangements. Furthermore, microcalcifications frequently appear with a local contrast. This contrast is often low and varies according to the breast tissue type. Therefore, these clusters should be detected to establish a correct diagnosis.
There are several types of breast tissue [6], but from the perspective of x-ray attenuation it can be modeled as consisting solely of two types of tissues: fatty and dense tissues, which vary according to their breast density [6]. They are also characterized by several physical properties and various distributions of the grayscale level. Such diversity produces different complexity degrees when detecting microcalcification in the mammogram, especially in the case of dense tissue. For these reasons, microcalcification detection is not easy even for trained radiologists, and they may go undetected. Therefore, the clinical interpretation of mammograms remains rather subjective and diagnosis is often debatable [7]. This has encouraged us to build synthetic ROIs of mammograms.
The variety of background tissue structure, the irregular shapes and the various arrangements of microcalcifications led to the use of a random fractal process: the fractional Brownian motion (fBm). This process has three main properties that closely resemble the natural texture of mammographic images: fractal dimension, scale invariance and self-similarity [8,9], which seems to be appropriate to model this type of images [8]. The fBm model proposed by Mandelbrot, B.B et al [10] is used to describe natural fractal phenomenon. Moreover, it was applied [11][12][13][14] to generate synthetic images and compare the accuracy of several fractal methods.
In this work, the fBm was investigated to synthesize ROIs of mammographic images. Four groups of synthetic images were generated: healthy group and with microcalcifications group for the both type of tissue (fatty and dense). These images can be helpful for radiologists to test their tools and guide the choice of their parameters in order to enhance the diagnostic accuracy of breast cancer at an early stage. Also, they help researchers rate the accuracy of their approaches in detecting anomalies. Besides, these mammogram models can be used to compare microcalcification segmentation approaches.
In this paper, the synthetic images were applied on two algorithms of microcalcification segmentation the new approach 'Sq-Sq' [28] and the reference approach 'MM' [29] to compare them. The 'Sq-Sq' method used a novel multifractal spectrum measure based on the q-structure function [33], which is a well-known tool for analyzing an object's irregularity. The 'MM' approach [29] is based on the Mathematical Morphology and the Otsu algorithm [34]. The evaluation of these approaches was checked using the Area Overlap Measure (AOM) [35] and the Dice coefficient [36]. This paper is organized as follows: in section 2, the fBm characteristics are explained and the generation of synthetic images of mammograms is detailed. An overview of the segmentation methods applied on the synthetic images of mammograms is presented in section 3. The results are shown and discussed in sections 4 and 5. Finally, a conclusion that highlights our contribution and exposes our perspectives was presented.

Breast Tissue Modeling
Some works were interested also in generating synthetic images of mammograms as [37] and more recently [38] which are based on the fractional Brownian motion and presented good results. Developing synthetic images for breast tissue with microcalcifications remains complex as well as interesting because of the variety of background tissues and irregular shapes of microcalcifications. As previously mentioned, the fBm can be adequate to simulate ROIs of mammograms [8]. In this section, some basic notions about the fBm that justify the selected choice were presented. In this work, the fBm process was used to create different classes of tissue: class of healthy and class of patients (i.e. with microcalcifications) for the both type of tissue (fatty and dense).

Fractional Brownian Motion (fBm)
fBm is a generalization of classical Brownian motion. It is a stochastic fractal process with long-range dependence and self-similar behaviors [10]. The Hurst coefficient H (0<H<1) is the unique parameter of interest of fBm process. In fact, it describes the roughness of the resultant motion: the higher is H, the smoother is the motion.
The FBm spectral representation B (t) is given by [39]: It is a Gaussian, continuous, centered and non-stationary second-order process which starts at zero (B (0) = 0) and has the following covariance function: To model background tissue of mammograms which are self-similar object [8], the Stein's method [40] was used. It is a fast and exact approach for simulating fractional Brownian surfaces [40].

Healthy Synthetic ROIs
Kestener, P., et al. [41] proved that normal regions in digitized mammograms are characterized by the Hurst coefficient H=0.3±0.1 in fatty tissue whereas dense region have H=0.65±0.1.
Based on the study [41], generation of synthetic mammographic ROIs is proposed using the values of H cited above. The background tissues (dense and fatty) were generated based on the Stein method [40]. These ones are simulated with the corresponding Hurst coefficient H and the grayscale variation of ROIs selected from Mini-MIAS [42]. The Hurst coefficient H of each selected ROI was estimated with the quadratic variation method [43].
The synthetic ROIs are sized of (128×128) pixels with resolution of 0.2 mm. The gray level 'NG' of the pixel is normalized between 0 and 1. Table 1 shows some examples of healthy real ROIs of Mini-MIAS and the corresponding models.
As shown in Table 1, these synthetic images (third colon) accurately modeled real mammographic backgrounds (second colon). H is the Hurst coefficient and NG is the interval of the normalized gray level. For the rest of this work, the variability of the gray level of the background tissue was controlled according to ROIs selected randomly from healthy mammograms of Mini-MIAS dataset [42] which allow discrimination between fatty and dense tissue. After analyzing 100 healthy ROIs for each type of tissue, it was concluded that the mean of grayscale level of fatty tissue was in the interval [115,200] and that dense tissue was characterized by grayscale level usually in the interval [160,222].
Although scanned and digital mammograms were generally obtained in 12-bit and usually stored as 16-bit images, there were no unfavorable effects when reduced to 8-bit. For faster computational analysis [19,27,44,45] or reduction of storage demands [46], the synthetic ROIs were generated as an 8-bit grayscale image.

Synthetic ROIs with Microcalcifications
Microcalcifications have different sizes and irregular dispositions [47], which can differentiate between benignity and malignancy. If they are round, oval, or slightly lobular, the anomaly is probably benign. If the microcalcifications are arranged on an irregular or tubular shape, then they are suggestive of malignancy. Microcalcifications are small light details and are highly invisible within the background tissue.
According to radiologists [47], microcalcifications have usually circular forms. For this reason, microcalcifications are modeled as small circles with a radius equal to 1 pixel obtained from the same background tissue but slightly clearer especially in the case of dense tissue. So, the visibility of anomalies is more difficult with dense cases. Each healthy synthetic ROI is injected with these ones to obtain synthetic ROIs with microcalcifications. It should be noted that synthetic microcalcifications are arranged in several dispositions to obtain malign and benign cases. Malignant cases are generally characterized by almost linear arrangement of microcalcifications [41]. However, benign cases have approximately a circular disposition [47]. Figure 1. presents a model of the whole part of mammogram sized (512×512) which can contain anomalies. As it is clear, it is difficult to detect microcalcifications by naked eyes. To simplify the study, ROIs sized (128×128) were modeled.  Table 2 shows two real ROIs and their corresponding synthetic images which are generated with the same parameters of real ROIs (H coefficient and grayscale variability). As shown, synthetic images seem 'like' real ones. In addition, they have the same mathematical characteristics.
According to radiologists, the dense tissue with the brightest grayscale is the hard case. As already mentioned, it is possible to generate an infinite number of images since a random process, the fBm, was used. Table 3 shows other synthetic images of benign and malignant cases for fatty and dense tissue. Synthetic microcalcifications are small circles (radius equal to 1 pixel) selected from the same tissue of the background but with pixel intensity slightly clearer.

Microcalcification Segmentation Methods Applied on the Synthetic Images
In the present work, the generated synthetic images were used to evaluate and compare the 'Sq-Sq' approach [28] based on the multifractal analysis and another reference work based on the morphological operators; the 'MM' [29] approach. This assists to orient the choice of their initial parameters in order to ameliorate their results in future works.

Approach based on Multifractal Analysis
In a previous work [28], a segmentation approach based on the combination of multifractal analysis with the k-means algorithm followed by morphological operators, noted 'Sq-Sq', was proposed. This method was applied on real mammograms from the reference database MiniMias [42]. This segmentation approach consists mainly of four steps. After the construction of 'α_image' and the 'ƒ(α)_image' based respectively on multifractal and fractal analysis, the k-means algorithm followed by morphological operators (closing and opening) was applied to the 'ƒ(α)_image' to segment the anomalies. The two first steps enhance the visualization of microcalcifications and facilitate their extraction. Figure 2. shows the flowchart of the 'Sq-Sq' approach.

Approach based on Mathematical Morphology 'MM'
The microcalcification segmentation approach 'MM' [29] was based on morphological operators and the Otsu's method. The authors of [29] applied a pre-processing with top-hat operators which enhance the contrast and reduce the background noise. In [29], the microcalcifications were selected automatically based on the Otsu's method that finds the more adequate grayscale level threshold to segment the image.

Results
The used database of generated synthetic images contains 300 synthetic images with 100 images showing healthy tissue (dense and fatty) and 200 synthetic images with microcalcifications (benign and malignant). To the best of the knowledge of authors; an infinite number of models can be generated since a stochastic process was used.
The segmentation methods [28,29] were applied to the simulated images in order to compare them. Table 4 shows the segmentation result of some synthetic images for the two types of tissue (fatty and dense). The original image and the original image with superimposed contour lines around segmented microcalcifications by each approach are illustrated in Table 4.
Mammographic image

Morphological operators
Microcalcifications segmented According to radiologists' experience, the 'Sq-Sq' [28] approach provides good segmentation results and the 'MM' method [29] succeeded in segmenting microcalcifications in only two dense cases. Consequently, this method is able to detect microcalcifications with very small size only in smooth backgrounds: high Hurst coefficient and high gray level. Also, these incomplete results of segmentation may be due to the size of images (see Table 5). Since, in all Marcelo Duarte et al works [19,29], the authors used ROIs with small size 41×41 pixels, hence other synthetic images were generated (see Table5) to further check the features of the segmentation approaches 'Sq-Sq' [28] and 'MM' [29]. As the exact size 41×41pixels can't be obtained with the Stein method [40] (the image size can only be 2^ (n-1); n integer), these synthetic images are sized 32×32 pixels (n=6).
Note that M. Duarte et al [19] approach was not applied on the simulated images because it is based on Geodesic Active Contours (GAC) segmentation method which needs a seed point.
Furthermore, this method is sensitive to the choice of the last one. In other words, if a seed point is selected in different places on the image, results will be different.
So, it is important to try hard to put them near the centroid point of the lesion to segment.
As 'MM' method [29] was much more robust than GAC for segmenting microcalcifications [19]. Table 5 shows examples of models sized (32×32) pixels.  Table 6 presents some examples of dense synthetic images with smooth background which will be applied to the 'Sq-Sq' [28] and the 'MM' [29] approaches.
According to Table 5 and Table 6, the 'MM' approach [29] can segment microcalcifications only in smooth background no matter what the size is, whereas the 'Sq-Sq' approach [28] can extract anomalies whatever the type of tissue is. Good results are obtained for sized (128×128) pixels models but there are incomplete results for smaller dense models sized (32×32). This can be explained by the Sq equation (i.e, equation (1) in [28]) which needs to use the grayscale of the neighbor pixels. The larger the image is the better the results are. Table 6. Segmentation of dense models (benign and malignant cases).

Benign cases Malignant cases
Original Models 'Sq-Sq' segmentation 'MM' segmentation

Statistical evaluation
Besides the qualitative evaluation, a quantitative evaluation of the two approaches [28,29] based on the area overlap measure (AOM) [35] and Dice Similarity coefficient [36] was conducted. The 200 synthetic ROIs, with injected microcalcifications, were manually delineated (i.e., segmented) with GIMP 2.8 software. The sizes as well as the locations of microcalcification were known in advance. These images were considered as the ground truth for calculating the AOM and Dice coefficient.

Area Overlap Measure (AOM)
The AOM is expressed as: In this equation, G_Th denotes the microcalcification manually delineated and IM_Seg represents the segmentation obtained using the proposed method. The symbol ∩ denotes the intersection, i.e., the number of common pixels between G_Th and IM_Seg, and the symbol ∩ represents the union of the G_Th and IM_Seg areas. So, if there is no overlap between the delineated microcalcification and the one from the proposed method, AOM = 0.

Dice Similarity Coefficient
Dice similarity coefficient measures the similitude between two sets G_Th and IM_Seg, the Dice coefficient is calculated as in the following equation:  Table 7 shows the statistical evaluation based on AOM measure and Dice coefficient of the 'Sq-Sq' approach [28] and the 'MM' approach [29].

Discussion
The present work deals with ROIs of mammogram modeling based on the fBm process. Synthetic ROIs with microcalcifications and healthy ones were generated. Two types of tissues (fatty and dense) and the two types of severity (benign and malignant) were considered. The synthetic ROIs were modeled based on the Stein method [40]. The Hurst coefficients were chosen according to [41]; H=0.3±0.1 for fatty tissue and H=0.65±0.1 for dense tissue. The discrimination between tissues' types was controlled according to the grayscale deduced from the study of real cases, mammographic images of Mini-MIAS database [42]. In order to obtain synthetic ROIs similar to the real cases, a large number of models were generated randomly with different values of the Hurst coefficient H, diverse grayscale variations and several arrangements of microcalcifications.
Two referenced approaches of microcalcification segmentation: the 'Sq-Sq' [28] and the 'MM' [29] methods were applied on the developed synthetic images in order to compare them. The evaluation was carried out qualitatively according to the opinion of a skilled radiologist, as well as quantitatively based on two evaluation criteria: AOM and Dice coefficients. The results showed that the 'Sq-Sq' method was able to detect anomalies on synthetic ROIs sized (128×128) pixels independently of the tissue type. However, for models sized (32×32) pixels, the 'Sq-Sq' succeeded in extracting microcalcifications from only fatty models. This limit can be explained by the fact that the 'Sq-Sq' approach is based on calculating the difference between the pixel's grayscale.
The 'MM' method [29] yielded incomplete results for the segmentation of models sized (128×128) pixels especially for fatty tissue (small value of H: rough surface). The authors of [29] usually used ROIs sized (41×41) pixels, which urged us to construct another set of models sized (32×32) pixels, in order to check if the failed results were related to either the size or type of tissue. These results also showed incomplete segmentation of microcalcifications in rough surfaces (fatty tissue). The 'MM' approach [29] presented good results of segmentation in a smooth background (dense tissue) independently to the size. The 'Sq-Sq' approach yielded pertinent findings for large fatty and dense models. However, for models sized (32×32) pixels the 'Sq-Sq' succeeded only in fatty models (low value of H). The 'Sq-Sq' is based on the calculation of the difference between grayscale neighbors' pixels, which can explain why segmentation failed in dense small models sized (32×32) pixels.
The 'Sq-Sq' approach yielded an average of 0.8±0.06 for AOM measure and Dice coefficient of 0.8446. The 'Sq-Sq' method had a good performance regardless of the type of tissue and the severity's type. The latter depends on the size of the images. The 'MM' approach presented satisfactory results for images with a smooth background regardless of the size. It achieved an average of 0.7±0.05 for AOM measure and Dice coefficient of 0.5423. However, the 'MM' approach could give pertinent segmentation only for models having smooth background i.e. high grayscale variability even for very small microcalcifications.

Conclusion and perspectives
In this work, a novel approach of synthetic images generation based on fBm process was proposed. The main objective was twofold: (1) to offer reference images for researchers and radiologists to test respectively the accuracy of their algorithms and the precision of their diagnostic tools; (2) to compare two microcalcification segmentation approaches: 'Sq-Sq' [28] and 'MM' [29], which helps to orient the choice of their parameters so as to obtain better results. Segmented images were evaluated in two ways: a qualitative and quantitative evaluation. Satisfactory segmentation was achieved by applying the 'Sq-Sq' method to the synthetic images. In fact, the 'Sq-Sq' approach gave good results for fatty and dense tissue. According to experts, the generated synthetic ROIs of mammographic images can help radiologists to test the precision of their diagnosis.
It can be concluded that the synthetic ROIs can be applied on several approaches of microcalcification segmentations to test their parameters and check if there are any gaps.
As perspectives, we suggest to generate a larger number of synthetic images and further diversify our examples by mainly varying the arrangement of the injected microcalcifications. This would allow us to study other synthetic images of mammograms especially for benign and malignant cases. We also propose modeling a whole mammogram with the various types of tissue.