Prevalence of Mutations of the MED12 and CYP17A1 Genes in Mammary Fibroadenomas in Senegalese Women

Fibroadenoma is the most common benign breast tumor in women under 30 years. This study aimed to contribute to the knowledge of the genetic factors involved in the occurrence and progression of mammary fibroadenomas. MED12 and CYP17A1 were sequenced in fibroadenomas and blood in 43 Senegalese women. The Alamut-visual software, which includes the pathogenicity prediction software SIFT, Polyphen2 and MutationTaster, was used to search for mutations. DnaSP version 5.10.01, MEGA version 7.0.14 and Arlequin version 3.5.1.3 were used to determine phylogenetic parameters including indices of genetic variability and diversity and genetic differentiation parameters. A deletion in the poly-A tail of MED12 was identified in our study population. An alteration of Methionine (M1) was observed on exon 1 of CYP17A1. Our results also show that most of the variants found on exon 2 of MED12 and exon 1 of CYP17A1 have the probability of causing the appearance of breast fibroadenomas according by the three pathogenicity prediction software. We found 23 new variants on the MED12 gene and 109 new variants on the CYP17A1 gene. The amino acid frequency distribution between blood and fibroadenomas shows a statistically significant difference in Glycine, Arginine and Valine for MED12 and Cysteine, Phenylalanine, Histidine, Asparagine, Arginine, Tryptophan and Tyrosine for CYP17A1. In addition the selection test shows that codon 20 of exon 1 of CYP17A1 which codes for Arginine (p.20Arg) is under positive selection in mammary fibroadenomas. Genetic differentiation parameters show a clear difference between blood and breast fibroadenomas. These results show for the first time the involvement of the CYP17A1 gene in breast fibroadenomas and confirm the involvement of MED12. Codon 20 of exon 1 of CYP17A1 being under positive selection could be used as a biomarker in breast fibroadenomas.


Introduction
The breast can be the site of the development of different types of benign or malignant tumors, including breast fibroadenoma. It is the most common benign tumor pathology. Fibroadenoma is the most common benign breast tumor in women under 30 years. It represents 68% of all breast masses and 44% to 94% of breast biopsies in adolescents [1]. A recent pathological review indexed fibroadenomas as the most common lesion followed by cystosarcomaphylloides and fibrocystic disease [2]. In the United States, each year more than 1.5 million breast biopsies are evaluated [3], most of which are not malignant but have a number of pathological lesions that constitute benign breast disease. A study in Africa, specifically in Nigeria, showed that fibroadenoma accounted for 55.6% of all benign breast disorders, which is much higher than in western populations [4]. In Senegal, a study of benign tumors showed that fibroadenoma is the most common benign Gueye Rokhaya et al.: Prevalence of Mutations of the MED12 and CYP17A1 Genes in Mammary Fibroadenomas in Senegalese Women tumors [5]. Although these are benign tumors, there is a large pathological difference between adolescent and adult, pregnant or breastfeeding women and menopausal, black and white skin women. Thus, age, physiological status, reproductive factors and body mass index (BMI) appear to influence the prevalence of fibroadenomas [2,6,7].
Despite efforts to study the etiological factors involved in the occurrence of mammary fibroadenoma, understanding of all mechanisms is far from complete. Nevertheless, genetic factors appear to be the most likely determinants. Since estrogens and their metabolites are both inducers and promoters of breast tumors growth [8], genes encoding enzymes involved in estrogen metabolism have been hypothesized to be involved in breast tumors pathogenesis. In this study we contribute to the knowledge of the genetic factors involved in the occurrence and progression of mammary fibroadenomas by investigating mutations of MED12 and CYP17A1 genes, assessing the variability of these genes and determining the degree of involvement of these two genes in the occurrence and/or progression of breast fibroadenomas.

Samples Collection
This study involves Senegalese women with breast fibroadenoma. These patients were recruited from the surgical department of the Juliot Curie Institute at Aristide Le Dantec Hospital. Each patient programmed for surgery was interviewed. A duly completed and signed informed consent is required for admission to the study, after which a structured questionnaire is administered. After approval, each patient gave a blood sample forming the control group named B and a breast fibroadenoma tissue sample named BF. Blood samples are placed in EDTA tubes and stored at -4°C. Tumor tissue samples were stored in 96% alcohol. These samples are sent to the BIOPASS (Biology of Sahelo-Sudanian Animal Populations) laboratory of the IRD (Research Institute for Development) for the various stages of genetic analysis. The ethical approval of this study was obtained after review in accordance with the rules laid down by the National Ethics Committee for Health Research (NECHR) of Senegal and in accordance with the procedures established by the University Cheikh Anta DIOP of Dakar (UCAD) for any research involving human participants.

DNA Extraction and Sequencing of MED12 and CYP17A1 Genes
Total DNA was extracted from patients tissues and blood using the DNase Blood and Tissue Kit (Qiagen). The Exon 2 and its flanking regions of MED12 have been amplified using the forward 5'-GCCCTTTCACCTTGTTTTCCTT-3' and reverse 5'-TGTCCCTATAAGTCTTCCCAACC-3' primers. The exon 1 and the 5' UTR region of CYP17A1 have been amplified using the forward 5'-CCACAAGGCAAGAGAGAGAGATAACA-3' and reverse 5'-AGGGTAAGCAGCAGCAAGAGAGAGC-3' primers. An electrophoretic migration on 1.5% agarose gel was performed to confirm amplification. Sequencing was performed from 30µl of the PCR product with the forward primer for the MED12 gene and the reverse primer for the CYP17A1 gene. Sequencing reactions were performed in a Thermal cycler MJ Research PTC-225 Peltier type with ABI PRISM BigDye TM Terminator Cycle kits. Each sample was sequenced using the primer corresponding. Fluorescent fragments were purified with the BigDye Xterminator purification protocol. The samples were suspended in distilled water and subjected to electrophoresis in ABI 3730xl sequencer (Applied Biosystems).

Molecular Analysis
For the search for mutations, chromatograms obtained with MutationSurveyor version 5.0.1 (®DNA Variant Analysis Software) were submitted to Alamut-Visual version 2.12.0 (©Interactive Biosoftware) [9]. For our study, this software provides us with the location of variants, their predicted protein change, their accession number if they have already been listed in the dbSNP variant database [10], their clinical significance through ClinVar [11] and the prediction of variant pathogenicity through SIFT prediction software [12], Polyphen2 [13] and MutationTaster [14]. Any variation not listed in the variant database is considered as new.
The sequences obtained were cleaned and corrected using BioEdit software version 7.1.9 [15]. Then these sequences are aligned using the ClustalW algorithm [16] to highlight similarities, thus showing the position of insertions, deletions or substitutions. Phylogenetic analysis were performed including the determination of variability index and genetic diversity and genetic differentiation parameters. Genetic variability parameters including the number of sites (N), the number of invariable and variable sites, the number of haplotype (h), the average number of nucleotide difference (k), the number of mutations (Eta), the nature of mutations (%transitions; %transversions) and the estimated Transition/Transversion bias (R), the amino acids frequencies and the codon selection test were obtained through the DnaSP software version 5.10.01 [17] and MEGA version 7.0.14 [18]. For the frequency distribution of amino acids, the RStudio software version 1.0.153 [19] was used, Shapiro Wilk's normality test was performed to see if the data follow a normal distribution. In the case of a normal distribution, the Student t-test is performed for the comparison of averages; otherwise the Wilcoxon test is used. A materiality threshold of 5% has been used. The estimated Transition/Transversion bias (R) was estimated under the kimura 2-parameter model. The codon selection test, obtained with MEGA 7.0.14 was determined for MED12 exon 2 and CYP17A1 exon 1. For each codon, estimates of the numbers of synonymous (dS) and nonsynonymous (dN) substitutions were made. These estimates are produced using the joint Maximum Likelihood (ML) reconstructions of ancestral states under a Muse-Gaut model of codon substitution and Tamura-Nei model of nucleotide substitution. The test statistics dN -dS are used to detecting codons that have under positive selection. A positive value for the test statistic indicates an overabundance of nonsynonymous substitutions. In this case, the probability of rejecting the null hypothesis of neutral evolution (P-value) is calculated. Values of P less than 0.05 are considered significant at a level of 5%. To determine the degree of genetic differentiation between controls and fibroadenomas, the Nei genetic distance obtained with MEGA version 7.0.14 and the factor of genetic differentiation (Fst) obtained with Arlequin software version 3.5.1.3 [20] were extracted. Values of P less than 0.05 are considered significant at a 5% level.

Variants of MED12
A total of 26 variants were found on the MED12 gene, including 7 (26.92%) variants at intron 1, 17 (65.38%) at exon 2 and 2 (7.69%) at intron 2. Only 3 of these variants are known in the variant database (dbSNP), namely the 2 variants found on exon 2: c.131G>A and c.131G>T located at codon 44 and c.204+25G>T located at intron 2. The remaining 23 variants are considered as new variants. All variants at exon 2 have the probability of influencing disease according to the three pathogenicity prediction software programs. (Table 1) Legend table 1: *mutation already listed in the variants database (dbSNP) Calculation of the frequency of mutations in %: (Number of sequences where the mutation is found / Total number of sequences (37)) x 100   Table 2) Legend table 2: *mutation already listed in the variants database (dbSNP) Calculation of the frequency of mutations in %: (Number of sequences where the mutation is found / Total number of sequences (26)) x 100

Index de Variability and Genetic Diversity
The relative values of the variability and genetic diversity of MED12 and CYP17A1 in fibroadenomas are shown in Table 3. These parameters indicate that there is a slight variability in the controls, especially for CYP17A1. The nature of the mutations shows that the transversions are much higher than the transitions. The estimated Transitions/Transversions bias (R) confirms the superiority of transversions. The polymorphism analysis revealed a high value of haplotypic diversity (hd) and a low value of nucleotide diversity (Pi). The average number of nucleotide differences (k) is very high for CYP17A1. The amino acids frequency of the MED12 and CYP17A1 genes between controls and fibroadenomas is shown in Table  4. For MED12, Glycine, Arginine and Valine are the only amino acids with a statistically significant difference. Isoleucine is absent in controls and present in tumor tissue. For CYP17A1, Cysteine, Phenylalanine, Histidine, Asparagine, Arginine, Tryptophan and Tyrosine are the only amino acids with a statistically significant difference.
Legend table 4: P-value < 0.05 Statistically significant difference (* intensity of significance) Analysis of the intragroup genetic distance shows that there is almost no difference between controls for the MED12 gene. For the CYP17A1 gene there is a slight difference between the controls. Tumors tissues, on the other hand, show genetic differentiation. In addition, a statistically significant differentiation is noted between controls and tissues. (Table 6) Legend table 6: *significant P-value < 0.05

Discussion
The overall objective of this study is to contribute to research on the prevalence of mutations of the MED12 and CYP17A1 genes in mammary fibroadenomas in Senegalese women. The choice of genes to be studied is based on these two genes involved in estrogen synthesis. Thus, we looked for mutations and determined the degree of involvement of these two genes in the occurrence of fibroadenomas. The variability, genetic diversity and genetic differentiation of these genes were studied.
The sequencing of MED12 which codes for a protein involved in transcription regulation showed 26 variants including 25 (96.15%) substitutions and 1 (3.85%) deletion. This deletion is located at the poly-A tail of intron 1: c.100-28del. This result is consistent with that of Kénémé and al. [21] who showed for the first time a deletion located at the poly-A tail of intron 1. This mutation could influence the alternative splicing mechanism of the intronic regions resulting in an aberrant MED12 protein. Thus, further studies will be needed to determine the exact function of this region in the synthesis of the MED12 protein. Our results also show that the variants found on MED12 exon 2 are predicted to be involved in the pathogenicity of breast fibroadenoma. The variants found on intron 1 are the most frequent because they are found in several patients: c.100-33A>G (18.92%), c.100-28del (21.62%) and c.100-12T>G (21.62%). Variants c.131G>A (21.62%) and c. 131 G>T (13.51%) are already listed in the variant database. They have been found in uterine fibroids [22,23]. This follows the assumption that fibroadenomas and uterine fibroids may share a common genetic etiology. Moreover, the mutations of codon 44, which is the most preserved codon of exon 2, are found in these two pathologies, showing the important role they play in their occurrence. A study by Bourbon [24], involving different species, showed that codon 44 is the most preserved codon of the MED12 gene, which stipulates that this codon would play an important role in the normal functioning of the protein. Thus, the false-sense mutations observed on this codon 44 in particular may render the translated protein non-functional or result in a gain of function in tumor tissue, indicating the specific importance of this amino acid for MED12 function. Moreover, according to the work of Turunen and al [25], the binding domain of cyclin C resides in the N-terminal region encoded by exons 1 and 2 of the MED12 gene and codon 44 would play a role in this adherence.
The sequencing of CYP17A1 involved in estrogen metabolism showed 131 variants all being substitutions. 7 variants have a clinical scope including 2 located in the 5' UTR region (c.-34T>A and c.-34T>C) which are benign (ClinVar) mutations. Indeed, the c.-34T>C mutation located in the 5' promoter region at 34 bp upstream of the translation initiation site and 27 bp downstream of the transcription initiation site has been found in several pathologies. This mutation creates a new CCACC box site and therefore an additional promoter. Carey and al [26] first identified this T27C SNP in the 5' UTR of CYP17α and hypothesized that the C allele could over-regulate gene expression by first increasing serum hormones including androstenedione and estradiol (E2). This polymorphism is common: the CC genotype is present in 11-19% of white North American women and 6-16% of African Americans [27]. Several studies have hypothesized that the C allele of CYP17α may be a marker for increased steroidogenesis [28,29]. Sun and al. [30] concluded that rs743572 (-34T>C) may increase the risk of breast cancer in postmenopausal women. Kaur and al. [31] showed that polymorphism -34T>C is associated with polycystic ovary syndrome in North India. This syndrome affects women of childbearing age and causes menstrual disorders, infertility and overproduction of androgen by the ovaries. According to the three pathogenicity prediction software programs, almost all exon 1 mutations have an impact on the appearance of fibroadenoma except for certain variations that do not induce a change in amino acids. The variant c.51G>A (26.92%) which corresponds to a premature Tryptophan stop codon in codon 17 (p.W17X) has been identified in ClinVar and is considered pathogenic. This same variant was found in the study by Suzuki and al. [32] and was reported to induce the genetic disorder of cytochrome P450c17 resulting in a 17 alpha-hydroxylase/17,20-lyase deficiency. 81C>A (19.23%) is also a pathogenic variant also found in the study of Müssig and al. [33] causing a deficiency in 17 alpha-hydroxylase / 17,20lyase. The variants c.195G>A (25/6.15%) and c.195G>T (1/3.85%) are recognized as benign in ClinVar and are identified in congenital adrenal hyperplasia. The mutations c.-58G>A, c.8A>C, c.24G>A, c.68C>T, c.100C>A, c.187A>G, c.188T>C, c.191A>C and c.195G>A, are also found in over 90% of our population demonstrating the altered mechanism of the CYP17A1 gene in mammary fibroadenomas. Indeed, CYP17A1 possesses both 17-alphahydroxylase activities necessary for the production of glucocorticoids and 17,20-lyase, a key activity of steroidogenesis producing progestins, androgens and estrogens. Mutations in this gene are associated with an isolated steroid-17-alpha-hydroxylase deficiency and 17alpha-hydroxylase/17,20-lyase deficiency. This deficiency in enzyme activity is thought to induce an inhibition of the catabolism of progesterone to 17hydroxy-progesterone and thus an overproduction of the latter in tumor tissues. Progesterone has been shown to play an important role in the developmental physiology of breast fibroadenomas.
Analysis of the variability of MED12 and CYP17A1 in mammary fibroadenomas shows that transversions are greater than transitions. In other words, the majority of mutations induce a conformational change in the 3D structure of the mutated protein. This is in accordance with the general characteristics of nuclear DNA mutations which seem to be quality mutations. Indeed, nuclear DNA is in a protected environment, surrounded by the nuclear envelope and subject to a repair mechanism. So the fact that there are many transversions stipulates the role of the MED12 and CYP17A1 mutations in the occurrence of fibroadenomas.
The polymorphism of MED12 and CYP17A1 in breast fibroadenoma in Senegalese women is characterized by high haplotypic diversity (Hd=0.988+-0.016; Hd=0.996+-0.013) and low nucleotide diversity (Pi=0.0199+-0.0023; Pi=0.0469+-0.0034). This suggests a rapid evolution of tumor pathology in patients due to an accumulation of different genetic mutations from one patient to another. This could be explained by the heterogeneity of the breast fibroadenoma. There are different types of fibroadenoma, ranging from simple to complex in which are found cysts, sclerosing adenosis, fibrosis and epithelial calcifications. Some may therefore evolve more rapidly than others, resulting in differences in size and histology. It would be interesting to include the type of fibroadenoma in our parameters in order to understand its involvement in the evolution of breast fibroadenoma.
The frequency distribution of amino acids for MED12 shows a statistically significant difference in Glycine, Arginine and Valine between controls and tissues. And for the CYP17A1 gene, Cysteine, Phenylalanine, Histidine, Asparagine, Arginine, Tryptophan and Tyrosine have a statistically significant difference between controls and benign tumors. Arginine is a conditionally essential amino acid, which means that the body can synthesize sufficient amounts of Arginine to meet basal metabolic demands and in some cases it will have to be reported through the diet. Many studies have shown that Arginine is necessary for cell growth and can become limiting in rapidly growing states such as malignancy. Given its vital role in cell growth, proliferation and immune responses. Arginine has been investigated as a potential target for cancer treatments [34]. Glycine is a nonessential amino acid because it can be synthesized by the body. RNA, DNA, creatine, serine and heme are generated by several pathways using Glycine. It acts as a neurotransmitter in the central nervous system and plays many roles as an antioxidant, anti-inflammatory, cryoprotective and immunomodulator in peripheral and nervous tissues [35]. It has recently been shown that the absorption of Glycine and its catabolism are capable of promoting tumorigenesis and malignancy, suggesting that Glycine metabolism could in principle be a target for therapeutic intervention [36]. Valine is an essential amino acid and therefore must be provided by the diet. It has an elementary role in the neurotransmitters of the brain. Cysteine is a non-essential amino acid, its use is important to maintain homeostasis and cancer cell survival [37]. Phenylalanine is an essential amino acid that is involved in the nervous system and neurotransmitters. It has been shown that any increase in Tryptophan and Phenylalanine levels in cancerous tissue may be correlated with an increased risk of breast cancer [38]. Histidine is an essential amino acid which is involved in the construction of proteins as well as several metabolic functions of our body. Asparagine is a nonessential amino acid. It is involved in the stability and biosynthesis of proteins and enzymes. Krall and al. [39] showed that Asparagine is an important regulator of amino acid homeostasis in cancer cells, anabolic metabolism and proliferation. Tryptophan plays an important role in the proliferation of T cells, which are the main actors in immune rejection reactions that can lead to the elimination of tumor tissue. The absence or decrease in Tryptophan levels in tumor tissues may be a risk factor for tumor progression. Tyrosine, which is a non-essential amino acid, is present only in low doses on fibroadenoma. A particular observation is made on Methionine, which is considered to be the codon that initiates protein translation. It is absent for exon 2 of MED12 and an alteration of it has been observed in exon 1 (codon 1) of CYP17A1 when looking for mutations. This reflects the importance of these amino acids in mammary fibroadenoma. Therefore, a much more in-depth study on the role of these amino acids in mammary fibroadenoma would be necessary.
The codon selection test showed that codon 20 of CYP17A1 gene, which codes for Arginine is under positive selection (p=0.0076), characterizing the superiority of nonsynonymous mutations compared to synonymous mutations. In other words, all the mutations affecting codon 20 cause a modification of the amino acids and thus an aberrant function of the CYP17A1 protein. On codon 20, Arginine is mutated into Glycine corresponding to the variant c.58A>G found in 61.54% of our study population, Lysine c.59G>A (38.46%) and Isoleucine c.59G>T (11.54%). The mutations in codon 20 can change the function of the CYP17A1 protein. In addition, the enzyme CYP17A1 has 20-lyase activity which allows it to convert dihydroepiandosterone (DHEA) into estrogen, and alterations in this region could confer a gain in function to the enzyme and thus an overproduction of sexual steroid hormones. Although the direct role of estrogen on the incidence of mammary fibroadenoma has not been specified and none of the steroidal hormone receptors have been expressed in breast fibroadenoma [40,41], estrogen dependence has been suggested for their growth. In fact, the estrogen receptor (ER-β) is the only hormone receptor expressed by the breast fibroadenoma stroma at both the protein and mRNA levels [42]. Cases of breast fibroadenoma in young patients with highly ER-β-positive stroma cells indicate a hormone-receptor mechanism involved in growth regulation [40]. In addition, epidemiological studies show that 90% of mammary pathologies could be caused by environmental pollution and that the main development factor is cumulative exposure to endogenous and exogenous sources of estrogens [43] as well as aryl hydrocarbon receptors (AhR) such as dioxins, polychlorinated biphenyls (PCBs) etc., which mimic estrogenic activities [44]. Compared to control cells, breast cells exposed to long-term estrogens had higher levels of AhR in vitro [45]. Recently studies by Bidgoli and al. [41] have shown that higher AhR levels in young premenopausal women with fibroadenoma deregulate the expression of other tumor proliferation genes and increase the risk of tumor growth.
The study of differentiation parameters shows a differentiation between controls and breast fibroadenomas. This differentiation could be explained by the fact that breast fibroadenomas are sensitive to hormones. When they occur in teenagers or pregnant patients, they can grow remarkably large, due to the rapid increase in hormonal stimulation.

Conclusion
Advances in tumor pathology research technologies have enabled researchers to detect several genetic polymorphisms (SNPs) that influence the development of certain tumors. This led to the idea of studying the genes involved in estrogen metabolism in malignant and benign tumors. And it is by following this logic that our study had the general objective of contributing to the knowledge of the genetic factors involved in the occurrence and even the progression of breast fibroadenomas. At the end of this study, our results show that in addition to the SNP -34T>C of CYP17A1 and codon 44 of MED12, 23 new variants of the MED12 gene and 109 new variants of the CYP17A1 gene were found. Our results also open up prospects for a study on mammary fibroadenomas in order to better understand its evolution and prevalence. Mutations in the Arginine of codon 20 influence the pathogenicity of this disease. To further investigate the role of CYP17A1 in mammary fibroadenomas, codon 20 would be a good target to establish biomarkers. It would also be interesting to study this codon for therapeutic perspectives. Further elucidation of the involvement of Arginine in mammary fibroadenoma tumorigenesis would be of interest. It is clear that further studies would be necessary to determine the factors influencing the development of mammary fibroadenomas. To this end, broadening the study population and including clinico-pathological parameters would be a good way to better understand their evolutionary process in order to find a treatment other than surgical excision.