The Complete Chloroplast Genome Sequence of Viburnum odoratissimum and Phylogenetic Relationship with Other Close Species in the Adoxaceae Family

The chloroplast genome structure and gene content are highly conserved among land plants, providing valuable information for the studies of taxonomy and plant evolution. Viburnum odoratissimum is a well-known evergreen shrub widely distributed in Asia. It possesses excellent medicinal properties used as traditional medicine for menstrual, stomach, and kidney cramps. In this study, the complete chloroplast genome (cpDNA) of V. odoratissimum is reported and compared with five close Viburnum species and an outgroup. The cpDNA of V. odoratissimum is 158,744 bp in length and contains 130 genes with 17 genes duplicated in the inverted repeat region. The gene content, gene organization and GC content in V. odoratissimum are highly similar to other Viburnum species. A total of 270 tandem repeats is found in these plastomes, most of which are distributed in intergenic space. Differences in the location of the IR/SC boundaries reflect expansions and contractions of IR regions in all species studied. Phylogenetic analysis based on complete chloroplast genomes and the combination of barcodes indicates a sister relationship between V. odoratissimum and V. brachybotryum. Furthermore, a comparative cpDNA analysis identifies three DNA regions (trnC-petN-psbM, trnH-psbA, ndhC-trnV) containing high divergence among seven studied species that could be used as potential phylogenetic markers in taxonomic studies.


Introduction
The genus Viburnum comprise about 200 species of deciduous shrubs, evergreen and small trees, which are broadly distributed in subtropical and temperate Northern Hemisphere and spread across the mountain regions of South Asia and South America, Mexico, and Columbia [1,2]. Viburnum together with Sambucus and Adoxa were members of Caprifoliaceae but was recently moved to the new group, Adoxaceae, according to phylogenetic analyses [3]. Most Viburnum species have become popular ornamental plants because of their eye-catching flowers with a light fragrance and berries. Moreover, many species in the genus Viburnum have utilized as the traditional folk in China, Russia and Ukraine for a number of diseases, such as menstruation, hypertension, flu, tuberculosis, renal infection, stomach ache, duodenal ulcers [4][5][6]. These species possessed a considerable number of secondary metabolites: monoterpenes, sesquiterpenes, diterpenes [6], diterpenoids, triterpenoids, iridoids [7,8], chlorogenic [5], amyrin, lupeol [4], resulting in many biological properties, including antiinflammatory, antibacterial, antioxidant, antitussive activities [5,9,10].
The phylogenetic relationship within the genus Viburnum has been extensively elucidated using not only morphological characters [11] but also nuclear DNA regions, such as the nuclear ribosomal internal transcribed spacer (ITS), the granule-bound starch synthase gene (GBSSI) [1,12]. Recently, the chloroplast nucleotide sequences are highly supportive in deciphering the phylogenetic relationship among Viburnum genus using trnK, matK, rbcL, psbA-trnH, rpl32-trnL [1,13,14]. However, molecular phylogenetic studies based on several chloroplast markers remain a number of issues that can cause misleading evaluation of the relationship [15]. Complete chloroplast genomes have been widely used in phylogeny reconstruction to overcome this problem because it provides valuable information on plant evolution, and a rich source of data to estimate of phylogenetic relationships [16].
In this study, we report the complete nucleotide sequence of Viburnum odoratissimum, along with a comparative analysis with other species in the genus Viburnum. The comparison with other published chloroplast genomes in related families is performed to expand understanding of the plastid genome diversity of Viburnum species. Furthermore, some new DNA barcodes containing high nucleotide divergence are identified. These hotspots could be considered as potential molecular markers for phylogenetic tree reconstruction within the Viburnum genus.

Sampling and Sequencing
The sample of V. odoratissimum was collected from National Institute of Biological Resources, Incheon, Korea (NIBRGR0000081148).
Approximately 5g of the leaves was used for isolation total genomic DNA following a modified CTAB method [17] with a minor modification. The quality of the extracted DNA was assessed by using spectrophotometry and electrophoresis on 1 % (w/v) agarose gel. A total of 10 µg purified genomic DNA was utilized to sequence the chloroplast genome using PacBio RS II system. The quality of the raw data was assessed to remove low-quality reads. The published complete chloroplast genome of V. erosum (MN641480.1) was downloaded from NCBI for a comparison.

Chloroplast Genome Assembly and Annotation
The filtered subreads were mapped to the reference genome using BWA Aligner [18]. The matched subreads were selected for the de novo assembly with CANU version 1.8 [19]. All contigs were checked overlapped region using nucmer and mummerplot. Annotation and visualization for the assembled chloroplast genome were performed with an Annotation tool -GeSeq [20]. Finally, the circular gene map was drawn with OGDraw version 1.3.1 [21]. The complete chloroplast genome of V. odoratissimum was deposited in GenBank with accession number MN836381.
Tandem Repeat Finder version 4.09 [22] was used to search tandem repeats. Additionally, simple sequence repeats (SSRs) were detected by MISA [23] with the following settings for numbers of repetitions: 10 for mono-, 6 for di-, 5 for all tri-, tetra-, penta-, and hexanucleotide.
The seven complete chloroplast genomes were aligned and visualized with the online comparison tool mVISTA [24] using V. odoratissimum as a reference. To analyze nucleotide variability, seven studied chloroplast genome were aligned using ClustalX 1.81 [25] and then conducted a sliding window analysis using DnaSP version 6.10.03 [26].

Characteristics of Viburnum Odoratissimum Chloroplast Genome
The complete cpDNA of V. odoratissimum is 158,744 bp in size, with a pair of inverted repeat regions (IRs) of 26,494 bp that separate a large single-copy (LSC) region of 87,348 bp from a small single-copy (SSC) region of 18,267 bp ( Figure 1). The total GC content is 38.1 %, with the highest content in IR regions (43%), followed by an LSC (36.4%), and an SSC accounting for 32.1%. All the sequences of protein-coding genes and tRNA genes in the V. odoratissimum cp genome are encoded by 26,278 codons. Leucine is the most frequent amino acid with 10.5% (2768) of the codon and cysteine is the least frequent with 1.1% (294).
The V. odoratissimum cp genome encodes 129 genes, consisting of 84 protein-coding genes, 37 tRNA genes, and eight rRNA genes. Among these genes, 17 genes are duplicated in IR regions, six functional genes, seven tRNA genes, and four RNA genes. In total, there are 22 introncontaining genes, 19 of which contain one intron, and three of which contain two introns (clpP, rps12, and ycf3). The largest intron is largest in trnK-UUU which itself contains the matK gene.

Repeat Structure
In these studied chloroplast genomes, a total of 270 tandem repeat sequences is identified, with each accession containing 31-47 repeats (Figure 2A). The length of repeated sequences ranges mainly from 31-50 bp, consistent with reports in other angiosperms [28]. These repeats are primarily distributed in the intergenic spaces, but a few are located in the coding region (rpoC1, rps18, ycf2, ycf1, psaA), intron (clpP). In terms of quadripartite structure, tandem repeat equally distributed in the LSC and IR regions, accounting for around 42% each, while the SSC region has only 16.3%.

Figure 2. Length and distribution of tandem repeats.
Simple sequence repeats (SSRs) are repeating short DNA motifs of 1-6 nucleotides that are excellent molecular markers in plant genetics and polymorphism researche [29]. Herein, the types and quantity of SSRs are analyzed using MISA software. We found a total of 296 SSRs in the seven studied species, with each accession containing 39-50 SSRs. Most of these SSRs are distributed in the single-copy regions. Mononucleotide is the most frequent repeat, accounting for approximately 94.6% of all SSRs, followed by dinucleotide (4.0%) and trinucleotide (1.4%). A total of 274 mononucleotide repeats (97.9%) is A/T repeat, and all di-and tri-nucleotide repeats are AT/AT repeat and AAT/ATT repeat, respectively.

Ir Contraction and Expansion
The shrinkage and expansion of the IR/SC boundary regions of seven studied species are presented in Figure 3. The rps19 gene, located in the LSC region, extends into the IRb region by 245 bp -247 bp in all Viburnum species, while the distance from rps19 gene to the border is 97 bp in T. omeiensis. The IRb/SSC boundary region is highly similar between these species. Briefly, the trnN gene and ndhF gene are located on either side of this boundary, separated by 1,372 bp (V. brachybotryum) to 1,977 bp (V. japonicum). The ycf1 gene spans to the regions at the junction of the SSC/IRa region in all seven species with 4,273 bp (V. betulifolium) to 4,733 bp (T. omeiensis) located in the SSC region. The IRa/LSC boundary is quite conserved between Viburnum species. The trnH gene is located in the LSC region and it is 0-80 bp apart from the IR/LSC junction in Viburnum, and 330 bp in Tetradoxa omeiensis.

Divergence Hotspot Regions
To determine the divergent regions that could be applied to the phylogenetic study, the seven chloroplast genomes were aligned with mVISTA ( Figure 4). The comparison shows that the IR regions are less divergent than the single-copy regions and the non-coding regions contain more hypervariable regions than the coding regions. The significant difference between these species includes trnH-psaA, atpH-atpI, trnC-petN-psbM, rbcL-accD, psbE-petL and ndhF-rpl32-trnL.
The nucleotide variability values in all seven accessions were detected with DnaSP software to quantify the diversity at the sequence level ( Figure 5). The Pi value ranges from 0 to 0.277, indicating a partial difference among these plastomes. As expected, the LSC and the SSC regions are higher divergences than the IR regions. The region trnC-petN-psbM is the most divergent region with a Pi value of 0.277. We also detect some regions that differ among seven studied species, including intergenic spacers trnH-psbA, ndhC-trnV, trnE-trnT, ndhF-rpl32-trnL and coding regions rpl16 and rpl22. These regions with a high degree of nucleotide variation could be used as potential molecular markers to reconstruct a phylogenetic tree in the Viburnum genus. Choi et al. used the regions of trnK, matK, and rbcL to distinguish Viburnum species [13] but our study shows that these sequences in the Viburnum chloroplast genomes exhibit low divergence.

Phylogenetic Analysis
To identify the phylogenetic relationship between V. odoratissimum and other species within the Viburnum genus, Randomized Axelerated maximum likelihood (RAxML) method was performed based on plastid genomes of 10 species, with T. omeiensis and S. nigra used as outgroups. The resulting phylogenetic tree is shown in Figure 6   DNA barcodes have proven to possess an expanding range of application in taxonomical studies. In plants, most DNA barcoding regions are located in the chloroplast genome and a few are in the ITS regions of nuclear ribosomal genes [30]. Several chloroplast-derived barcodes were identified and recommended for species discrimination, including coding regions (matK, rpoB, ycf1, accD, rbcL, and ndhJ) and noncoding regions (trnH-psbA, apF-atpH) [31][32][33]. However, no single DNA region is able to be a promising candidate for all plants. As a result, the combination of a DNA barcode sequence of more than one barcode should be typically used to provide more accurate species identification [34] In the Viburnum genus, a single barcode trnK, a combination of rbcL, matK and trnH-psbA, and the ITS region [1,13] were used to discriminate Viburnum species, but these regions still remain some unresolved issues and conflicting relationships. In this study, a combination of three hotspot regions (trnH-psbA, trnC-petN, and ndhC-trnV) that exhibit high divergence by a sliding window analysis in the Viburnum species were used to construct a phylogenetic tree of above 10 species. The result ( Figure 7) shows a similar pattern with the data based on the complete chloroplast genomes, revealing a high discriminatory power of this combination that could be a promising genetic marker for phylogenetic relationship studies.

Conclusion
In this study, using PacBio RS II system sequencing technology, we report the complete chloroplast genome of V. odoratissimum. Compare with the other Adoxaceae genomes, the size of V. Odoratissimum plastid genome and coding regions is largest, but the gene content and organization are highly similar, except for the abundance of pseudogenes in V. brachybotryum. The divergence hotspot region analysis provides a combination of barcodes (trnH-psbA, trnC-petN, and ndhC-trnV) that can be used as a potential genetic marker for discrimination of Viburnum species and phylogenetic tree reconstruction.