Inter-species Transferability of Microsatellite Markers Derived from Wild Relatives to Cultivated Species of Finger Millet

The accessibility to public sequence information has paved the way for development of new genomic resources and its cross transferability among closely related genera. In the present study, the nucleotide and EST sequences derived from nine different species of Eleusine were utilized for identifying microsatellite markers and their transferability in E. coracana. The frequency and distribution of repeat motifs of Di, tri, tetra and penta-nucleotide repeats were compared across species. The nucleotide/EST sequences, classified based on their function, were majorly involved in abiotic stress followed by carbohydrate biosynthesis in all the species. Of 2133 primers designed, tri-nucleotide repeats were more abundant (1043) followed by repeats in compound format (963). A highest number of 1660 primer pairs were identified in E. coracana subsp. coracana and nearly 50% of which contained compound repeats, majorly comprising of di-nucleotides. The frequency of microsatellite repeats and the number of primers designed per sequence were maximum in E. kigeziensis (138%) followed by E. floccifolia (126%) while the same was minimum in E. coracana subsp. africana (53%). While the transferability of microsatellites derived from other Eleusine species to cultivated coracana species ranged from 50 to 100%, the primers derived from cultivated species were more informative than from that of wild relatives.


Introduction
With the availability of limited genomic resources in a species, utilization of information from closely related species and wild relatives plays pivotal role in the development of genomic resources and its application. Comparative mapping is a powerful tool for integrating genetic data among related taxa [1]. Studies have shown that certain chromosome segments are conserved among related taxa allowing development of common markers. The transferability of such common markers developed from one species to another has been observed in many species of Brassicaceae, Fabaceae, Solanaceae [1], and Poaceae [2]. Further, the genetic maps developed in one species can be compared by means of orthologous markers with closely related species enabling map-based prediction of candidate genes controlling important genes/QTLs of economically important traits among various species of same genera. The recent developments in genome sequencing by using highthroughput next generation sequencing (NGS) platforms has led to the discovery of thousands of single nucleotide polymorphism (SNP) markers in many crop species. In spite of new advancements in DNA markers like SNPs, microsatellite markers also called as simple sequence repeats (SSRs) are still the markers of choice for various molecular genetic studies including genome mapping, DNA fingerprinting, genetic diversity, population structure and phylogeny studies. This preference is due to the advantages of these markers in terms of their high reproducibility, extensive genome coverage, neutrality and hyper-variability that detect multiple alleles per locus and its utility by simple PCR techniques. With conventional SSR markers becoming less popular due to cost and resource constraints, the computational based expressed sequence tag (EST) based-SSRs offer cheap and better platforms.
Finger millet is an allotetraploid species with two genomes ("A" & "B") wherein the most probable donors of 'A' genome likely to be E. indica and E. tristachya while that of 'B' genome is not yet clear [3]. The crop is nutritionally rich with high calcium, and iron content along with essential amino acids such as methionine and tryptophan that are absent in regular starch based diets. The polyphenol content and high fibre in the grain brings anti-cancer and antidiabetic activity from the health perspectives [4]. India is the largest producer of finger millet with a collection of 22,583 genotypes, which is more than 65% of world's collection [5]. From the perspective of genomic research, Finger millet is still an orphan crop with much of its genetic diversity untapped from its wild relatives. Although large and diverse germplasm collections are available in both public and private organizations, no concerted efforts have been made to assess the diversity of this species at the molecular level and integrate these results with morphological characterization in breeding programmes for higher yield and disease resistance [6]. The past decade has witnessed continuous efforts towards developing both conventional and EST-SSR markers in finger millet [7,8,5,9]. Studies on comparative genomics have indicated high genomic colinearity between rice and finger millet [10] and cross-transferability of SSR markers from major cereal crops to E. coracana indicated 57% reproducible cross-species/genus amplicons [11] and 73-95% cross-amplification in finger millet from pearl millet SSRs [12].
The recent study on whole genome sequencing in finger millet highlights abundance of repetitive content with the discovery of 114083 SSRs distributed across the genome and the confirmatory results of these markers in wild and cultivated accessions revealed prevalence of high genetic diversity in wild species. However, the study further stresses on in-depth studies on wild relatives of E. coracana [13] for understanding its probable progenitors. One of the studies on genetic variation within Eleusine species using genomic SSRs indicate high intra-specific polymorphism among the cultivated africana (32%) and coracana (17%) species than wild species viz., E. intermedia, E. indica, E. multiflora, and E. floccifolia [14]. Nevertheless, there are no studies on interspecies transferability of EST-SSRs originating from wild relatives to cultivated species of Indian origin finger millet. Therefore, the present work highlights on occurrence of SSR repeats in wild and cultivated species of finger millet and its transferability of SSR markers from wild species to cultivated E. coracana subsp. coracana.

Microsatellite Repeat Mining and Primer Designing
A total of 2333 sequences were obtained from both EST and nucleotide domains of national center for biotechnology information (NCBI) that covered nine different species of Eleusine (Table 1). The repeat mining was carried out using the microsatellite identification tool (MISA) for di-, tri-, tetra-, penta-nucleotide repeats with minimum number of repeats of 5, 3, 3 and 3 for respective repeat motifs [15]. Primers were designed in SSR containing sequences using the Primer3 tool with the default standards of 50% GC content, melting temperature of 58 o C, primer length of 20 bases and the PCR product size ranging from 150 to 300 bases. [16].

DNA Isolation and PCR Reactions
Genomic DNA was isolated from seven commercial varieties of E. coracana subsp. coracana of Indian origin viz., Indaf-5, Indaf-7, MR-1, MR-6, KMR-204, KMR-301 and ML-365 using CTAB method [17]. Quantification of DNA was carried out using standard DNA markers on 0.8% agarose gel. Polymerase chain reaction (PCR) was performed with the components of template DNA (20 ng), primers (5 pM), 10X PCR buffer, dNTPs (1 µM), MgCl2 (1.5mM) and Amplitaq Gold Taq DNA polymerase (0.3 U) in a reaction volume of 20 µL. The PCR profile with the step-down annealing conditions (TA) was performed with the initial denaturing of template DNA at 94oC for 5 min, followed by 35 cycles of 94oC for 60 sec, 57-58oC for 30 sec and 72oC for 60 sec, followed by final extension at 72oC for 8 min. The amplicons were resolved on 2% high resolution agarose along with the size standards and the allelic data was extracted based on the size.

Sequence Analysis and Classification
A total of 2333 sequences reported in nine different Eleusine species were fetched from both EST and nucleotide domains of NCBI database. Of these, maximum sequences (1663) were found in coracana sub species followed by africana type (412) of cultivated E. coracana. Among wild species, E. indica had the highest number of 112 sequences followed by other wild types. The sequences from all the species were categorized based on their origin, irrespective of the functional categories in gene ontology. In general, the ESTs from coracana sub species were functionally related to abiotic stress including drought / salt stress tolerance followed by developing seed cDNAs. This is expected as finger millet is drought tolerant crop with umpteen reports on abiotic stress tolerance [18]. Whereas in case of africana sub species and other Eleusine species, majority of the ESTs were related to carbohydrate biosynthesis including both photosynthesis and starch biosynthesis followed by internal transcribed spacer (ITS) regions of ribosomal RNAs (Figure 1).

Frequency and Distribution of Tandem Repeats
Among all repeat types, the most abundant was trinucleotide repeat (1043) followed by di, tetra and penta repeats in all the species. The abundant presence of trinucleotide repeats when compared to other repeats has been reported in finger millet [19], but the recent genome-wide study has reported the presence of 58.5% of di-and 35.5% of tri-nucleotide repeats in finger millet genome [13]. However, di-nucleotide repeats were virtually absent in all the wild species except in few compound formats wherein they were found in close association along with other repeats (Figure 2). Among several repeats, the frequency of AG/CT and GA/TC were more frequent with GT (37) and CT (17) with highest repeat lengths in coracana and africana types respectively. Whereas in other types, AGG & AAG in tri, AAAG in tetra and CGTCA in penta type were most frequent in rest of the species. These results were in accordance with the previous reports in finger millet and other major crops as the mining for repeats in genomic region yielded 39 to 66% of AG/CT repeats [9,13]. The comparative low number of di-nucleotide SSRs was attributed to the reason that these repeats were often present in close vicinity of other repeats, especially penta nucleotides forming compound SSRs. About 18 to 32% of SSRs identified were functionally related to abiotic stress tolerance in coracana type and 27% of SSRs were involved in carbohydrate synthesis in case of africana type, but in other species, the majority of SSRs were found to be in genic regions responsible for carbohydrate synthesis and lipid metabolism.

Microsatellite Primers and Transferability in E. coracana
From a total of 2333 sequences derived from nine species, 2133 primer pairs were successfully designed and of which, nearly 50% of the primers were covered by tri-nucleotide repeats due to their abundance presence. Among nine species, maximum SSRs were found in coracana sub-species (1660 pairs), of which compound repeat-based SSRs turned out be highest (829 pairs) followed by tri-nucleotide (718 pairs) repeats ( Table 1). As the distance between two or more repeats was less than 100 bases, the formation of compound SSRs was high. Most of the di-nucleotide repeats were found to be either with simple compound repeats or complex compound repeats, yielding to its high number. The average success rate of primer designing was 91% across all species. Among nine species, E. kigeziensis and E. floccifolia had highest repeat densities of 1.38 and 1.26 respectively whereas E. coracana subsp. africana had the lowest of 0.53 SSRs per sequence followed by E. jaegeri (0.74). In general, SSRs were successfully found in 97% of wild species than cultivated types (76%) indicating the prevalence of more repetitive sequences in wild relatives than domesticated species. The details of thirty one primers that were successfully verified in E. coracana coracana are given in table 2.   Table 2 ECML001 CTGGATCATGCACGAGTACC  CTTCCTCCTCCTTTGCAGTG  58  180  +  ECML004 TGCATGCAAGCTTTCCCTAT  ACACCACCAACCCACACA  58  260  +  ECML005 CTAGTAGCAGAATCACGCCC  TGTCATCATCACTCGCATGG  58  223  +  ECML006 CCTCCCTAGCAGAATCAGGT  TCAGCCTTATACACCGTTGC  58  164  +  ECML007 TTGCTTCTGATGCATCTCCC  CCCTGACACAGAAACAGGAC  58  257  +  ECML010 GGATCCAAACACCCCATCAA  TTGTCGTGGTAAGCTTTGCT  58  173  +  ECML011 CTTTGGTGCTGGTGTAGAGG  TTCATCTCAGCAGCAGCAAA  58  274  +  ECML015 GAGGATGACTCCCATTGCTG  CTCTCGCCCACCATAGTAGA  58  209  +  ECML021 GCGTAAGTCTTGCGGAGTAT  CCCTCACCCTCCTATGGTAG  58  277  +  ECML024 AGGTGACCTCAAGACCAAGA  TGGATCCACAGTTGAAGCAC  58  Perhaps the most important feature of gene-derived SSRs is that these markers are transferable across distantly related species that has been demonstrated in previous studies [20]. In order to test the transferability in E. coracana coracana, a set of 32 primers derived from both cultivated and wild species were tested in seven commercial cultivars of Indian origin which indicated 50 to 100% transferability with the average of 82% (Figure 3). Nearly 10% of the primers derived from both cultivated and wild species amplified alleles from both genomes of coracana sub species. Overall, the primers derived from wild species had higher transferability rate (88.3%) than those derived from cultivated species (63.3%), the trend which is similar to the results of microsatellite identification in this study. However, the microsatellites derived from cultivated species were more informative than those derived from the wild types as they had higher polymorphic information content (PIC) value of 0.34 (average) as against 0.29. Hittalmani et al. (2017) reported high polymorphism in wild species of finger millet than E. coracana germplasm with genomic SSRs derived from a cultivated finger millet variety.

Origin species Primer ID Forward seq. (5'-3') Reverse seq. (5'-3') TA ( o C)* Size (bp) Transferability E. coracana coracana
The microsatellites derived from E. tristachya, E. multiflora, E. floccifolia and E. jaegeri were found completely transferrable in E. coracana but only 50% of primers derived from E. indica cross amplified in E. coracana. Only 60% of SSRs derived from E. coracana coracana were transferable among seven genotypes, the remaining 30% of the primers produced no amplicons while 10% had multiple amplicons with unexpected amplicon sizes. Similarly, a set of 87 genomic SSRs when tested in Indian cultivars, were more informative than non-Indian cultivars by Ramakrishnan et al., [5]. The low success rate with ESTderived SSRs is the possible concern as it yields to occurrence of null alleles due to either variation in primer site / disruption of priming site by unrecognized intron splicing [21] and such issues with EST-SSRs can be easily addressed with the recent availability of genome sequence information in finger millet [13]. Further, unlike in other major crops, more and more systemic efforts are required in finger millet for molecular characterization of wild relatives and their utilization in trait improvement for biotic and abiotic stress tolerance [22].

Conclusions
The publicly available finger millet sequences have been previously used for developing EST-derived SSR markers but no study focused on species-wise classification of tandem repeats, identification of EST-SSRs and their transferability in E. coracana. The present study highlights the frequency and distribution of EST-based microsatellites derived from both wild and cultivated species of finger millet. The transferability of these markers were tested in the commercial cultivars of finger millet of Indian origin which indicates the importance of using sequence information from wild relatives for designing markers in the paucity of genomic resources for marker assisted selection based crop improvement. Additionally, the high transferability of SSRs from wild relatives to E. coracana would be informative to assess genetic diversity and phylogenic relationship between wild and cultivated finger millet. While the current data shows promising rates of microsatellite transferability from wild to cultivated species of coracana, but is limited by nontesting of SSRs in other wild Eleusine species. Nevertheless, the results reveal valuable information on diversity of microsatellites across nine finger millet species, which might be worthwhile in future for characterizing evolutionary pattern of microsatellites between wild and cultivated Eleusine species with the availability of whole genome sequence of today's tetraploid species.