Identification of Nigella sativa Seed and Its Adulterants Using DNA Barcode Marker

Adulteration, misidentification, and substitution are the biggest challenges in maintaining safety and therapeutic efficacy of medicinal herbs. Nigella sativa seed, which is well known medicinal herb susceptible to adulteration or substitution due to its great therapeutic value. Adulteration and substitution by morphologically similar seeds are the primary concern in commercially available Nigella sativa seed. In this study, we have used DNA barcode marker to find out adulteration, misidentification, and substitution of Nigella sativa seed sold in various markets. We collected 10 samples, which were labelled as Black seed/Nigella sativa seed from open markets in India (1 No.), Pakistan (1 No.), Saudi Arabia(1 No.), Egypt (2 No.), Turkey (1 No.), Syria (1 No.), Tunisia(2 No.) and Oman (1 No.). All samples collected from different geographies were studied morphologically. Although few samples were quickly identified as Nigella sativa seeds, few were tough to detect and differentiate accurately. This is where DNA barcode marker proved to be useful. Plant DNA were obtained from seed coat cells of samples, was amplified by PCR with forward and reverse rbcl and matK primers as recommended by CBOL (The Consortium for the Barcode of Life). PCR amplification of plastid genome with matK was not very successful, while PCR amplification with rbcl primers was quite successful. We used rbcl sequences for alignment and further analysis. PCR products obtained were subjected to electrophoresis on 1.5% agarose plate. PCR products were sent to Macrogen (Seoul, South Korea) for DNA sequencing. DNA reads obtained with rbcl sequences were aligned and analyzed for nucleotide composition, conserved sites, variable sites, singleton sites and parsimony-informative sites, genetic distance and phylogenetic tree using MEGA 7. The phylogenetic tree was constructed using UPGMA method. NCBI Blast along with phylogenic tree and nucleotide characteristic were used to identify Nigella sativa seeds from different geographies and discriminate two adulterants as Allium cepa seed and Clitoria guianensis seed. Both of these adulterants are different regarding their active medicinal contents and therapeutic utility from Nigella sativa seed. This study proved the utility of DNA marker, especially rbcl loci in accurately identifying medicinal herb and its adulterants.


Introduction
Worldwide trade of medicinal herb is about $ 60 billion dollar business annually. There are about 1000 companies from different countries involved in the trading of medicinal herbs, Business of medicinal herbs is growing at the rate of 15 to 20% per year [1]. This growth in the trade of herbal medicine is due to significant demand for natural, safe and reliable therapeutic agents. Patients want a more safe, secure and natural way of treatment of diseases.
Nigella sativa seed has been used by humankind for centuries as herbal medicine and spice. It is commonly known as Black Seed, Fennel Flower Black cumin, Love-ina-mist., nutmeg flower, Roman coriander, a Barakah Shooneez, Habba Sauda, Habb al-barka, Krishana -Jiraka, Upakunchika and Kalonji. It is an annual flowering plant belong to buttercup (Ranunculaceae) family. Nigella sativa is a native to South and Southwest Asia but also domesticated in Europe.
Prophet Mohammed has described Black seed as the seed of blessings, which can cure all the ailments except death [2]. This belief has triggered a lot of research regarding establishing its therapeutic utility. Although Prophet Mohammed did not mention any particular seed, looking at significant therapeutic utility, Nigella sativa seed had been considered as cited seed. Nigella sativa seed has been reported to have antimicrobial, antioxidant, anti-aging, hair growth promoter, sun protection, anti-cancer activity, cardiovascular activity, anti-inflammatory activity, immunomodulatory activity, antioxytocic activity and wound healing [3,4]. Because of its high medicinal value, it is used in its raw form as well as in other forms like seed oil, seed paste and different extracts. This great use in commercial formulations, keep this commodity high in demand and hence it is proven for commercial adulteration and substitution [5]. In few geographies due its nomenclature, it can be misunderstood with morphologically similar but biologically different seeds [6]. Hence it is necessary to identify Nigella sativa accurately before its use.
Identification of Herbal medicinal product can be made by following methods [7], 1) Microscopic and macroscopic analysis, where a botanical expert can identify the herb or compare with standard specimen samples. 2) Identification using phytochemical profiling, where one need to perform series of physical and chemical tests. Both of the above methods had a limitation regarding requirements of the expert taxonomist, long analysis time and proven to error if the samples are in crushed or slurry form, aged, exposed to various environmental factors which would impact anatomical features, marker chemical composition of medicinal herb [8,9]. Hence it is necessary to look for a new innovative approach for identification of medicinal herbs. DNA barcoding is one of the recent technology, which can help in accurate identification of plant and animal species [10,11]. Species level identification can be made from the small fragment of Plastid or Genomic DNA. This DNA fragments(loci) are highly specific to particular species. These loci are ITS2, ITS, trnH-psbA, rbcl, and matK. DNA Barcode Identification can help the industry to overcome the problem of adulteration and substitution. DNA marker can surely be used as a method of accurate Identification of herbal medicinal ingredients and their adulterants or substitutions. In recent past, DNA barcodes have been used as an authentication tool for Crocus sativus [12], Tulipa edulis [13], Cinnamomum species [14] and Ricinus communis [15], which encouraged us to take the study of identification of Nigella sativa seeds and its adulterants using DNA barcode markers like rbcL and matK genes. MatK gene and rbcl genes are considered as standard loci generating quality sequence providing a high level of species discrimination by CBOL [16].
We collected Nigella sativa samples from various markets in India (1 No  This study is to report, the utility of rbcl and matK DNA barcode marker to identify substitution and adulteration in the Nigella sativa seed of various geographies.

Samples
Samples of Nigella sativa seeds were collected from various geographies like India (

DNA Extraction and PCR Amplification
Plant DNA was separated from Seed coat cells using plant/fungi DNA isolation kit from Norgen Biotek, Canada (DNA Isolation Kit Product # 26200) following manufacturer's protocol. Purified DNA was preserved at -20°C till further use. Further, extracted DNA was examined using 0.8% agarose gel electrophoresis stained with ethidium bromide.

DNA Amplification and Sequencing
The target DNA regions, namely rbcL and matK were amplified with respective universal DNA barcoding primers as prescribed by CBOL Plant working group, 2009 [16]. Universal primers for rbcl gene, rbcLa-F: ATGTCACCACAAACAGAGACTAAAGC and rbcLa-R: GTAAAATCAAGTCCACCRCG; for matK gene, matK-KIM1R: ACCCAGTCCATCTGGAAATCTTGGTTC and matK-KIM3F: CGTACAGTACTTTTGTGTTTACGAG were used. PCR was performed using a reaction mixture of a total volume of 50 µl for either of the genes: 25 µl of Taq PCR Master Mix (Norgen Biotek, Canada), 22 µl distilled water, 1 µl forward primer (10µM), 1 µl reverse primer (10 µM) and 1µl of the DNA template (50-80 ng/ µl). The PCR conditions maintained were as follows, one cycle (94°C for 3 min), 35 cycles (94°C for 1 min, 55°C for 1 min, 72°C for 1 min) and one cycle 72°C for 7 min. Amplified PCR products of rbcL and matK primers, each of 5 µl were checked on 1.5% agarose gel electrophoresis for the respective bands and sent to Macrogen (Seoul, South Korea) for DNA sequencing.  From above figures, it is clear that PCR amplification with rbcl primer was observed to be good in quality, while PCR amplification with matK primer not of high quality. Nucleotide bands of DNA with matK primer were not well separated. DNA sequencing was done by sending PCR products to specialized research laboratory Macrogen (Seoul, South Korea). As high-quality reads were obtained with the single direction, markers were sequenced in the single direction only.
Obtained sequences were aligned by MUSCLE [17], which generates multiple alignments of amino acid and nucleotide sequences. MUSCLE program is much better regarding speed and accuracy when compared with T-Coffee, MAFFT, and CLUSTALW in all tests. Aligned sequences by MUSCLE were used to locate conserved, variable, singleton, parsimony informative site and compared with other obtained sequences of other Nigella sativa seed and its adulterant samples using MEGA 7 [18]. Primary sequence analysis of nucleotide composition, conserved sites, variable sites, singleton sites, parsimony informative sites and phylogenetic tree provided adequate information to discriminate Nigella sativa seeds from adulterants. Further all aligned sequences were submitted to NCBI (National Center for Biotechnology Information) website and identified using blastn application (https://blast.ncbi.nlm.nih.gov/Blast.cgi).
From rbcl sequence data, we have constructed the phylogenetic tree using UPGMA [20] method. Bootstrap support scoring was done for individual clade by running standard 500 bootstrap replicates of the data. The evolutionary distances were computed using the Kimura 2parameter method [19] and in the units of the number of base substitutions per site.
The evolutionary history was inferred using the UPGMA method [21]. The optimal tree with the sum of branch length = 0.15209382 was drawn. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (500 replicates) were shown next to the branches [22]. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Kimura 2-parameter method [23] and in the units of the number of base substitutions per site.

Sequence Characteristic
The Length of the rbcl sequence for all 13 samples were 533 bp. Average content of nucleotide were T(U) -28.4%, C -21.4%, A -26.5% and G -23.7%.  From Table 1 and Figure 5, it is quite clear that nucleotide composition of two samples (NS06 and NS08 ) are entirely different from all 11 samples majorly in terms of A and G nucleotide, which indicate that these two samples (NS06 and NS08) have different genetic makeup and hence could belong to different plant genus or species. Sample NS01, NS02, NS03, NS04, NS05, NS07, NS09, NS10 and Accession KU499880.1, KM360895.1, FJ626586.1 shows very similar nucleotide composition, which indicates all these samples are belong to single plant genus or species.  In the entire group, the pair nucleotide frequencies provide the proper indication about diversity in the genetic makeup of various samples. In NS group, the nucleotide pair frequencies provide the precise evidence about highest % identical sites and lowest rate of Transversional Pairs in the group.

(.)Identical site
Sample NS01, NS02, NS03, NS04, NS05, NS07, NS09, NS10 and Accession KU499880.1, KM360895.1 showed high level of conserved sites, except at site number 405 where NS07 samples got G in place of T. In case of sample NS02, G is replaced by T at site 405 and G is replaced by C at 527 site. This qualifies NS07 and NS02 as varieties under Nigella sativa. Accession FJ626586.1 has got five variable sites at 33 (G is replaced with C), 74 (C is replaced with G), 81 (A is replaced with G), 123 (T is replaced with G) and 276 (A is replaced with G). This accession is reported as a separate species as Nigella damascene. Sample NS06 and NS08 have 52 and 45 variable sites respectively, which shows wide genetic variation and hence can be considered as different plant genus or species from other NS01, NS02, NS03, NS04, NS05, NS07, NS09, NS10 and Accession KU499880.1, KM360895.1, FJ626586.1. This proves the capability of rbcl gene to discriminate plant genus and species.

Singleton Sites
A singleton site contains at least two types of nucleotides with, at most, one occurring multiple times. MEGA identifies a site as a singleton site if at least three sequences contain unambiguous nucleotides or amino acids.    .

T A G G A C C A C C T T C G T T A A T T G C T C G
. .

Parsimony-Informative Site
A site is parsimony-informative if it contains at least two types of nucleotides (or amino acids), and at least two of them occur with a minimum frequency of two.

NS06 NS Adulterant 1 C A A T T A G T T T A G C T T C C C C T C T C C A NS08 NS Adulterant 2 C A A T T A G T T T A G C T T C C C C T C T C C A
Parsimony-informative sites indicate NS01, NS02, NS03, NS04, NS05, NS07, NS09, NS10 and Accession KU499880.1, KM360895.1 are quite similar in genetic makeup while, NS06 and NS08 are quite different.

Estimation of Genetic Distance Between Sequences
Estimation of genetic distance between sequences is done basis number of base substitutions per site from between sequences. Analyses were conducted using the Kimura 2-parameter model [23].  From the above chart, it is clear that intra-specific genetic distance is from 0.0000 to 0.01325, which is tiny while interspecific genetic distance among all sequences were almost 0.1161, which is quite high. This genetic distance further helped in identification of adulterants as sequence from NS06 samples had shown genetic distances of 0.10720 to 0.11617, while NS08 sample showed genetic distances of 0.09013 to 0.09870, which are quite high as compared to inter -specific genetic distance of a maximum of 0.01323.

Phylogenic Tree with UPGMA Method
Typically the evolutionary history is inferred using the UPGMA method [20]. On this study, we wanted to understand whether adulterants show different cluster in the phylogenic tree. The bootstrap consensus tree inferred from 500 replicates is taken to represent the evolutionary history of the taxa analyzed Branches corresponding to partitions reproduced in less than 50% bootstrap replicates. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (500 replicates) are shown next to the branches. The evolutionary distances were computed using the Kimura 2-parameter method [19] and are in the units of the number of base substitutions per site.  In figure 6 and 7, it can be clearly seen that sample NS06 and NS08 formed a separate clade since they are genetically different and evolved from different ancestor. This observation further confirms species discrimination power of rbcl sequences.
Further to above sequence analysis, aligned rbcl sequences of seed samples were blasted individually on NCBI website (http://blast.ncbi.nlm.nih.gov/Blast.cgi) for identification of samples. Following are the details of identification of rbcl sequences from NCBI blastn tool. Blastn tool identified rbcl sequences of two seed samples NS06 and NS08 as Allium cepa and Clitoria guianensis. These samples were further compared with standard morphological features of Allium cepa and Clitoria guianensis. Morphological features of samples found matching with physical samples used in study.

Discussion
Prophet Mohammed in Islamic literature has described black seed as the seed of blessings which has a property of curing any disease of humankind. Nigella sativa seed looking at its therapeutic utility can be considered as mentioned black seed.
Nigella sativa seed is one of the noble herbs which is extensively used as medicine and spices in the Middle East, South East Asia, and Europe It is one of the great spices used for culinary purposes. Recently many researchers proved the great therapeutic uses of Nigella sativa seeds and its extracts. Looking at huge benefits, its demand is growing and hence can be easily susceptible to substitutions and adulteration for commercial benefits. Hence there is immediate need to have a quick, reliable and reproducible method of identification of Nigella sativa seeds and its adulterants. Traditional methods of identification of herbal medicine have few shortcomings regarding final output basis expertise of individual examiner, long time, difficulties in identification of sample in powdered/crushed/aged and slurry condition. To overcome these problems use of DNA barcoding has been suggested. Use of DNA barcode marker offers numbers of applications in the field plant taxonomy, identification, and authentication of herbal medicinal ingredients. Recently DNA barcoding was successfully used for identification of herbal medicinal ingredients few study to quote are Identification of Achyranthis Bidentatae Radix [24], DNA barcoding of 347 medicinal plants using rbcL marker [25], Identification of Physalis (Solanaceae) from its adulterants [26], Identification of Botanicals in Herbal Medicine and Dietary Supplements [27] and DNA-based identification of Gentiana robusta and related species [28].
In fact, Chinese Pharmacopoeia 2010 edition adopted allele-specific diagnostic PCR as a new method of identifying Zaocys dhumnades (Cantor). Recently US Food and Drug Administration has approved DNA barcoding as a method of seafood identification [29].
In this study we collected 10 samples of Black seed i.e.  (1 No.). In these countries, Nigella sativa seeds are regularly consumed as spices and herbal medicine. Morphological and microscopic study of seeds were done, where we could identify eight samples quickly, but few were tough to identify. This is where we decided to check the identity of seeds samples by DNA barcode marker.
There are lots of choices of molecular marker selections, like plastid DNA regions (atpF-atpH spacer, rbcL gene, matK gene, rpoB gene, rpoC1 gene, trnH-psbA spacer and psbK-psbI spacer) but on the basis of assessment conducted by CBOL Plant Working Group, rbcL and matK plant barcode were selected for assessments due to recoverability, sequence quality, and levels of species discrimination. matK has been considered as most rapidly evolving plastid coding region having consistently high levels of discrimination power among angiosperm species, we got very poor amplification and low level of identification. This is in line with reported low routine success [30] and more patchy recovery [31,32]. Performance of rbcl was entirely satisfactory regarding amplification, separation of and also provided good discriminating power.
DNA sequence reads obtained from with rbcl were of high quality. MEGA 7 software offered modules to check nucleotide composition, conserved sites, variable sites, singleton sites and parsimony-informative sites to discriminate Nigella sativa seeds from adulterants.
DNA sequences of rbcl were blasted on NCBI website. Blastn identified query DNA sequences as DNA sequences of adulterant Number 1 as Allium cepa seed and adulterant Number 2 as Clitoria guianensis seed.
Further based on, Molecular identification we checked the morphology of adulterants, which has matched with the study samples.
Both of these adulterants are different from Nigella seeds regarding their active medicinal contents and therapeutic utility.

Conclusion
Based on the present research study, we can conclude that for quick, accurate identification of herbal medicine, like Nigella sativa seeds and its adulterants, DNA Barcode marker especially rbcl was found to be useful. It is recommended to include DNA barcode marker method of identification in the monograph of official books of herbal medicine standards. This would improve the quality of Nigella sativa in various markets and avoid the impact of consumption of adulterated Nigella sativa seeds.