Evolutionary Relationship of Genomic Insulin Sequence in Different Mammalian Species: A Computational Approach
M. A Hashem1, *, Neena Islam2, Md. Moinul Abedin Shuvo1, Md. Arifuzzaman1
1Department of Biochemistry and Biotechnology, University of Science and Technology Chittagong (USTC), Foy’s Lake, Chittagong, Bangladesh
2National Centre for Control of Rheumatic Fever and Heart Diseases, Sher-E-Bangla Nagar, Dhaka, Bangladesh
To cite this article:
M. A Hashem, Neena Islam, Md. Moinul Abedin Shuvo,Md. Arifuzzaman. Evolutionary Relationship of Genomic Insulin Sequence in Different Mammalian Species: A Computational Approach.Ecology and Evolutionary Biology. Vol. 1, No. 2, 2016, pp. 23-28.doi: 10.11648/j.eeb.20160102.13
Received: August 17, 2016; Accepted: September 5, 2016; Published: September 22, 2016
Abstract: Genomic insulin is located on the short arm of chromosome 11 in human genome. It is a well studied polypeptide hormone, consists of 110 amino acids which start with signaling peptide of 1-24 amino acids, B-chain of 25-54 amino acids, C-peptide of 55-89 amino acids and end with A-chain of 90-110 amino acids. Insulin, produced by the beta cell of the pancreas in response to glucose stimuli, binds to its receptor rapidly due to receptor autophosphorylation and primordially regulates nutritional metabolic pathways. In this study we have depicted and explored evolutionary conservation rate, insight into structure and phylogenetic connection of insulin molecule among eight mammalian species; Homo sapiens (Human), Bos taurus (Cattle), Cavia porcellus (Guinea pig), Canis lupus familiaris (Dog), Gorilla gorilla (Western gorilla), Ovis aries (Sheep), Pan troglodytes (Chimpanzee), Pongo pygmaeus (Orangutan) using Computational Biology. The analysis of physico-chemical characteristics, secondary and 3-D structure prediction of insulin in different species identified phylogenetically most related species. The major findings are that genomic insulin from Human and Dog has a lowest genetic distance of 0.13 of the mammalian species studied. Human and Guinea pig has the next lowest genetic distance of 0.39 and 69.1% identical at the amino acid level. Whereas Human and Western gorilla has genetic distances of 0.00 and 100% identical at the amino acid level and share a common node on the phylogenetic tree. Physico-chemical study also shows that these sequences show high leucine content (18.2%) with high instability index (>40) except Sheep and Cattle has low leucine and instability index (<40). The sequence analysis among species has allowed us to know the manner in which the insulin has evolved over million–year period. This study result provides rapid comprehensive information to calculate the amino acid sequences in relations to evolutionary conservation rates as well as molecular phylogenetics.
Keywords: Molecular Phylogenetics, Genomic Insulin, Multiple Sequence Alignment, Pairwise Distances, Physico-chemical Characteristics, Secondary Structure, 3D Structure Prediction
Since the first appearance of life on Earth, development has been from simple life forms to more complex ones. There has been a progressive change over time, and over many generations, to produce different species from a common ancestor. In Evolutionary Biology, the phylogenetic tree represents the evolutionary histories of all organisms on the earth using molecular evidence. Evolutionary evidence obtained from comparative studies of specific features as well as homologous characteristics of existing organisms to determine the evolutionary relationship among different species. Analyses of evolutionary relationship of central molecular networks uncover underlying ancestral variation that can be targeted by biomedical study to develop insights and interventions into disease . The study of evolution changed dramatically with the discovery of genomic insulin and represents as an evolutionary evidence. Genomic sequences of insulin and insulin-like signaling molecules are rapidly evolving within amniotes . Evolutionary relationships in different mammalian species can be studied by comparing their genomic insulin sequences.
Insulin is a very old unique protein that may have derived more than a billion years ago , and share the same modular organization of their precursor, including N-terminal signal peptide followed by three domains (A, B and C domains). Insulin is synthesized within the beta cells of the islets of Langerhans as a precursor preproinsulin, and after removal of the signal peptide, proinsulin folds to form the exact tertiary structure, and the removal of C-peptide by cleaved by endoproteolytic enzymes . The yielding of A and B domain are covalently linked to form mature insulin .
The mature structure of insulin consists of two polypeptide chains (B-chain of 30 amino acids, and A-chain of 21 amino acids), cross-linked by disulfide bonds. Genomic insulin of different mammalian species is structural building blocks that define as the function of a protein through their various combinations and also be the units of the evolutionary history of proteins as well as genomes that contain them.
Illustrating the entire evolutionary history of each insulin domain, from the genesis of a new domain or domain combination to the loss and transfer of a domain from a specific genome, can further our understanding of some of the fundamental unsolved problems in an evolutionary relationship, such as the trial in the early evolution of life.
2. Methods and Materials
2.1. Obtaining Genomic Insulin Sequences
Genomic Insulin sequences of eight different species retrieved from NCBI protein database in FASTA format. [http://www.ncbi.nlm.nih.gov/]. The accession numbers of these sequences are Homo sapiens (AAA59172.1), Bos taurus (ACD35246.1), Cavia porcellus (AAA37041.1), Canis lupus familiaris (P01321.1), Gorilla gorilla (Q8HXV2.1), Ovisaries (AAB60625.1), Pan troglodytes (P30410.1), Pongo pygmaeus (Q8HXV2.1);
2.2. Multiple Sequence Alignment
These genomic insulin sequences have analyzed on ClustalW [http://www.ebiac.uk/clustalw/] for the multiple sequence alignment. Sequences also analyzed using Geneious 7.1.2  and a ClustalW algorithm are use to align multiple sequences in parallel.
2.3. Construction of Phylogenetic Tree
Sequences have aligned with ClustalW by the MEGA5 and output file of this program has used for the construct phylogenetic tree by using the Maximum Likelihood method and 10000 replicates use for bootstrap statistical test.
2.4. Pairwise Distances
Genetic distances have measured among these sequences by using MEGA 5 .
2.5. Physico-chemical Characteristics
The ExPASy ProtParam tool has used to compute different properties of genomic insulin including theoretical [pI], a number of positively [Arg + Lys] and negatively charged [Asp + Glu] amino acids, extinction co-efficient, instability index, aliphatic index and Grand Average of Hydropathcity [GRAVY].
The crystallization tendency of these sequences have determined by the CRYSTAL2 web server  [http://biomine-ws.ece.ulberta.ca/CRYSTALP2.html].
2.6. Characterization of Secondary Structure
Secondary structure properties of these sequences, including Alpha helix, 310 Helix, Pi helix, Beta bridge, Extended strand, Beta turns, Beta region, Random coil, Ambiguous states and other states have analyzed by the SOPMA  [http://npsapbil.ibcp.fr/cgibin/npsa_automat.pI?page=/NPSA/npsa_sopma.html].
2.7. 3D Structure Prediction
The 3D structures of these mammalian species are predicted by using the 3D-JIGSAW [version 3] comparative modeling server  [http: //bmm.cancerresearchuk.org/~populus/populus_submit.html].
3. Results & Discussion
3.1. Multiple Sequence Alignment
Multiple sequence alignment of different color boxes are green, green–brown, and red color represent that 100%, 30-99%, and 0-29% identical respectively between species (Figure 1).
Genomic insulin has found identical among eight mammalian species. It is also found that Homo sapiens (Human) and Gorilla gorilla (Western gorilla) are highly conserved in their sequences.
3.2. Phylogenetic Connection
Today, molecular phylogenetics helps to construct and evaluates hypothesis about historical patterns of ancestry, divergence and descent in the form of the phylogenetic tree. It shows evolutionary relationships predicted from the multiple sequence alignment.
The phylogenetic tree based on genomic insulin sequences of the eight mammalian species such as Homo sapiens (Human), Bos taurus (Cattle), Cavia porcellus (Guinea pig), Canis lupus familiaris (Dog), Gorilla gorilla (Western gorilla), Ovis aries (Sheep), Pan troglodytes (Chimpanzee), and Pongo pygmaeus (Orangutan) form a phylogenetically related clusters or sub-groups (Figure 2). This phylogenetic tree revealed that two monophyletic of six major phylogenetic groups produced. These monophyletic groups are Homo sapiens (Human), Gorilla gorilla (Western gorilla) and Bos taurus (Cattle), Ovis aries (Sheep) and relatively recent common ancestor and therefore seen to be phylogenetically closer to each other. The result also exhibits a fundamental diversity among all the genomic insulin as Homo sapiens (Human) is closely related to the Gorilla gorilla (Western gorilla) rather than to the Pan troglodytes (Chimpanzee) and Pongo pygmaeus (Orangutan). Bos taurus (Cattle) is closely related to the Ovis aries (Sheep) rather than to the Canis lupus familiaris (Dog). The remaining Cavia porcellus (Guinea pig) as the out-group that is the most distantly related of the eight mammalian species.
3.3. Pairwise Distances
The pairwise distances analysis among eight mammalian species revealed percent identity and divergences of each sequences pair. Result from the Table 1 exhibits that genomic insulin from Homo sapiens (Human) and Canis lupus familiaris (Dog) has a lowest genetic distance of 0.13 of the mammalian species studies. Homo sapiens (Human) and Cavia porcellus (Guinea pig) have the next lowest genetic distances of 0.39 and 69.1% identical at the amino acid levels. Whereas Homo sapiens (Human) and Gorilla gorilla (Western gorilla) have genetic distances of 0.00 and 100% identical at the amino acid levels (Table 1).
The numbers of amino acid substitutions per site from between sequences are shown. Standard error estimates are shown above the diagonal and were obtained by a bootstrap procedure (10000 replicates). Analyses were conducted using the Poisson correction model . The analysis involved 8 amino acid sequences. All positions containing gaps and missing data were eliminated. There were a total of 105 positions in the final dataset. Evolutionary analyses were conducted in MEGA5 .
3.4. Physico-chemical Characteristics
In the physico-chemical characteristics (Table 2.) shows that all of the genomic insulin have found that less acidic molecules which represented as theoretical (pI) that is calculated by positively (Asp+Glu) and negatively (Arg+Lys) charged amino acids. From the study of instability index, surprising that all of the genomic insulin are unstable with high leucine content (18.2%) (Table 3.) except Bos taurus (Cattle) and Ovisaries (Sheep) .
|Genomic insulin||Theoretical (pI)||‘-’ charged residues (Asp+Glu)||‘+’ charged residues (Arg+Lys)||Extinction Coefficient||Instability Index||Aliphatic Index||(GRAVY)||Crystalization Coefficent|
|Canis lupus familiaris||5.61||11||9||17335||44.73||107.36||0.207||0.728|
|Canis lupus familiaris||6.4||4.5||3.6||4.5||5.5||8.2||5.5||9.1||2.7||1.8||16.4||1.8||3.6||2.7||3.6||3.6||6.4||1.8||2.7||5.5|
Additionally, high aliphatic index indicates that more thermally stable. Aliphatic index of Canis lupus familiaris (107.36), Pan troglodytes (103.73), Homo sapiens (102.91), Gorilla gorilla (102.91) and Pongo pygmaeus (102.00) classifies them as most thermostable, closely followed by other mammalian species, Ovis aries (97.62), Bos taurus (95.81), and Cavia porcellus (93.03).
Furthermore, Grand average of hydropathicity index (GRAVY) indicates hydrophobic or hydrophilic character of protein. GRAVY of all genomic insulin have found hydrophobic except Cavia porcellus (-0.017). Extinction coefficient for all sequences are observed high rather than Cavia porcellus (Guinea pig). This prediction is useful for protein-protein interaction.
Crystallization coefficient values of genomic insulin are observed within the range of 0.406 to 0.751 (Table 2).
3.5. Characterization of Secondary Structure
In the analysis of secondary structure of eight mammalian genomic insulin sequences result in the predominance of random coil which is followed by alpha helix, extended strand, and beta sheet, respectively (Table 4).The high value of random coil bears crucial significance in protein tertiary structure and related functions. For Canis lupus familiaris (Dog), it is found that alpha helix (52.73%) exceed the random coil.
|Genomic insulin’s||α helix||310 Helix||Pi Helix||β bridge||Extended strand||β turn||Bend region||Random coil||Ambiguous states|
|Canis lupus familiaris||52.73||0.00||0.00||0.00||11.82||10.00||0.00||25.45||0.00|
3.6. 3D Structure of Genomic Insulin
3D structure prediction results show that five model for each of the given sequences and ranked them according to the scores of ramachandran plot. High ranked 3D model has selected for these sequences. (Figure 3.–Figure 10.)
Molecular phylogeny is useful in explaining similarities and differences between organisms and for establishing relationships among the population which can be considered as further evidence of evolution. This study manifests that, this mammalian species that have genomic insulin which are similar in the sequence are more closely related than species with fewer similarities in sequence and allowed us to know the manner of evolution over a million-year period. The investigations of these mammalian genomic insulins provide rapid comprehensive information to calculate the amino acid sequence in the relations to evolutionary divergence rate as well as molecular phylogenetics.
The authors respect and honor to founder, Vice-Chancellor, Professor Nurul Islam, University of Science and Technology Chittagong, Bangladesh, Professor Nurul Absar, Head and all the teachers of the Department of Biochemistry and Biotechnology, University of Science and Technology Chittagong, Bangladesh for their helpful suggestions and useful comments regarding of this research.