Functional Annotation and Classification of the Hypothetical Proteins of Neisseria meningitidis H44/76
Archana Singh1, #, Bharti Singal2, #, Onkar Nath2, Indrakant Kumar Singh2, *
1Department of Botany, Hans Raj College, University of Delhi, New Delhi, India
2Molecular Biology Research Laboratory, Department of Zoology, Deshbandhu College, (University of Delhi), Kalkaji, New Delhi India
To cite this article:
Archana Singh, Bharti Singal, Onkar Nath, Indrakant Kumar Singh. Functional Annotation and Classification of the Hypothetical Proteins of Neisseria meningitidis H44/76.American Journal of Bioscience and Bioengineering.Vol.3, No. 5, 2015, pp. 57-64. doi: 10.11648/j.bio.20150305.16
Abstract: Neisseria meningitidis is a parasitic gram-negative bacterium of the family Neisseriaceae (Proteobacteria) and it causes many human diseases including meningitidis and septicemia. One of its strains, H44/76, has natural transformation capacity, thus it is important to identify possible novel drug targets and to develop serogroup B vaccines against this opportunist pathogen. In the complete genome of N. meningitidis strain H44/76, there are 1961 coding genes out of which 544 encodes for hypothetical proteins (HPs). Due to their less homology and relatedness to other known proteins, HPs may serve as potential drug targets. We performed extensive functional analysis of these HPs with the help of Bioinformatics tools and assigned functions to 235 HPs, out of which 202 were annotated with high confidence whereas 33 with less confidence. In this study, we have used a combination of latest tools to acquire information about the conserved regions, families, pathways, interactions, localization and virulence related to a particular protein. We also categorized these proteins as transporters, regulators, enzymes, binding proteins, virulent proteins. The outcome of this intensive study may help in the comprehensive understanding of pathogenesis, drug resistance, adaptability to host, epidemic causes and drug discovery for treatment of the diseases.
Keywords: Neisseria meningitidis, Hypothetical Proteins, Functional Annotation, Drug Targets
N. meningitidis is a parasitic bacterium and it is an obligate nasopharyngeal human pathogen, which leads to severe diseases like septicemia and meningitis [1,2]. Among children and infants, it can get its way to the brain by invading the respiratory epithelial tissues and then crossing the blood brain barrier. The common symptoms are high fever, lethargy, confusion, nausea, neck stiffness, vomiting, and petechial rash . Surveillance is of utmost importance for getting better grasp of meningococcal diseases as they may lead to epidemics and outbreaks . The strain H44/76 is very much related to strain MC58 that also belongs to serogroup B. There are 13 serogroups in which N. meningitidis can be classified on the basis of the immunological reactivity of their capsular polysaccharides out of which 5 (A, B, C, Y and W) are found to be the most common causes of diseases .
The efficiency of this strain to transform itself naturally in a favorable manner makes it important for the development of the serogroup B vaccines . The vaccine for serogroup B has been developed recently which has the potential to minimize mortality and morbidity associated to the diseases caused by serogroup B strains . N. meningitidis has enormous capability to change its surface structures thus enabling it to escape the defense mechanism of the host.
The whole genome sequencing has taken its pace with the use of high-throughput techniques. It is becoming necessary to give meaningful direction to this web reservoir of genomic information. The computational sequence analysis tools are playing crucial role in annotating the novel genes. There is a huge repertoire of Hypothetical proteins (HPs, proteins which are derived from translating nucleic acid sequences and yet not characterized functionally and biochemically), which need to be identified to gain ample knowledge about the complete genomic and proteomic content of an organism . The functional annotation of these HPs not only helps in understanding the unknown metabolic pathways in which they are involved, but also helps in identifying the unfamiliar functions of the previously annotated proteins. The functionally annotated HPs can be used as drug targets for novel drug discovery process. These HPs can also be used as potential biological markers [10, 11].
The genome size of H44/76 is 2.18 Mb with 2,480 reads . There are 1961 proteins expressed under certain conditions out of which 544 are putative HPs. Here, in this study, analysis of the HPs found in N. meningitidis strain H44/76 was done using advance bioinformatics tools. A wide range of tools are used to predict physicochemical properties, subcellular localization, domains, motifs, presence of helices and family of the HPs. In strain H44/76, we have successfully annotated 235 HPs. Our study has taken account of approximately 27% of the whole proteome of strain H44/76 of N. meningitidis which is remarkably the huge percentage of what is unknown. This shows that we cannot underestimate the annotation of HPs in an organism to identify probable targets for pharmacological studies.
2. Materials and Methods
The URLs to all tools/servers/databases that were used in the functional annotation of the HPs found in the Genome of N. meningitidis H44/76 from NCBI Genome Bio-project: PRJNA61079 given in Table 1.
|S.NO.||TOOL/ SERVERS/ DATABASES||URL|
|Sequence Homology Search|
|1||BLAST: Basic Local Alignment Search Tool||http://www.ncbi.nlm.nih.gov/BLAST/|
|2||ExPASy – ProtParam tool||http://web.expasy.org/protparam/|
|3||PSORT B v3.0||http://www.psort.org/psortb/|
|18||Conserved domain database||http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi|
2.1. Sequence Retrieval
We have found that out of 1961 proteins of this strain, 544 proteins have been designated as HPs but upon analysis of these proteins, it was observed that only 525 HPs are unique. So, further analysis was performed for these 525 unique HPs only. The Gene IDs for these 525 HPs of H44/76 were retrieved from the NCBI genome database (http://www.ncbi.nlm.nih.gov/genome). These IDs were queried against UniProt database (http://www.uniprot.org/uniprot) to retrieve their UniProt IDs, primary accession number and protein sequences in FASTA format.
2.2. Homology Search
The functional annotation of HPs can be performed by using homology search i.e. searching for functions by looking at the conserved regions in the sequences of various organisms. Here, we have used BLASTp for the sequence similarity in which protein sequences are queried against database to find out homology among various organisms . BLASTp was queried at 0.005 e-value against non-redundant (nr) protein sequences database. The homologs for each HP were analyzed and best hits were taken for the function prediction depending upon their percent query coverage and identity. The results are presented in Supplementary Table A.3.
2.3. Physicochemical Characterization
Physicochemical characterization was carried out by using ProtParam server on Expasy . It was used to measure theoretical physicochemical properties such as Molecular weight (Mw), pI, Extinction coefficient (M-1 cm-1), Instability index classification, aliphatic index and Grand Average of Hydropathicity (GRAVY). The results of the above analysis are summarized in Supplementary Table A.1.
2.4. Sub-cellular Localization
Protein sub-cellular localization prediction is important as it helps in understanding the location of the protein in the cell. This further aids in predicting the function of a particular protein in an organism. We can also utilize this information in refining the list of drug targets in a cell. Surface proteins can be of great importance for this purpose. The features present in the primary structure of the protein such as presence of trans-membrane helices or signal peptides determines the sub-cellular localization of the protein . We have used PSORT B v3.0 , PSLpred , and CELLO  for sub-cellular localization prediction. SignalP 4.1 server was used for predicting the availability as well as location of signal peptides which was based on artificial neural network analysis . SecretomeP was used to predict non-classical (not signal peptide triggered) proteins by considering post-translational and localization information of the sequence . SOSUI  was used to classify protein into membrane and soluble proteins and also predicting the number of trans-membrane helices. HMMTOP  and TMHMM [21,22] were used for predicting trans-membrane helices in the HPs. The analyzed results of above tools are presented in Supplementary Table T2.
2.5. Functional Prediction
We have used variety of tools in order to do precise and accurate functional prediction of 544 HPs. The databases and tools used for this purpose are SMART , INTERPRO , MOTIF [25-28], CATH , SUPERFAMILY , PANTHER , Pfam , Conserved domain database [33, 34], ScanProsite , HAMAP , ProtoNet  and SVMprot . SMART search was used to detect presence of domains. It could be used to identify signal peptides, coiled coil regions trans-membrane helices and compositionally biased regions . The results are summarized in Supplementary Table T3.2.
2.6. Virulence Analysis
Virulence factors are gene products that facilitate the microorganism to launch itself as a pathogen. These factors enable its interaction with the cells inside a host. There is huge list of virulence factors, which includes cell surface protein and carbohydrates for attachment and protection respectively, bacterial toxins and hydrolytic enzymes. For identification of these factors in N. meningitidis H44/76, we had used VICMpred  and VIRULENTpred . These methods are based on Support Vector Machine (SVM). The accuracy rate of VICMpred and VIRULENTpred is 70.75% and 81.8% respectively. The results are produced in Supplementary Table A.6.
2.7. Protein-Protein Interaction Prediction
It is crucial to gather system level information of the cellular functions. It requires correct annotation of functional interactions among proteins. In bioinformatics, we have several computational tools to detect these interacting partners. Here, we have used STRING 9.1 database to search known and predicted protein interacting partners (Supplementary Table A.5). These relations can be physical or functional interactions. STRING is an integrated tool that uses data from genomic context, high throughput experiments, conserved/co-expressed and from previous knowledge available in the text .
3. Results and Discussion
3.1. Sequence Analysis
The BLASTp was used to analyze HP sequences on the basis of homology. We had analyzed 525 HPs out of which 334 had been predicted to have their homologs (Supplementary Table A.3). These results were used for precise functional annotation of HPs in N. meningitidis along with the results obtained from other analysis. For example, HP E6MU35 was found similar to alanine racemase protein. HP E6MU60 was found to be homologous to cyanate hydratase.
3.2. Physiochemical Characterization
On the basis of instability index computed by ProtParam, 311 proteins were found stable (Supplementary Table A.1). The instability index for HPs was ranging from 40.01 to 82.07. The HPs belonging to this range were found to be unstable. Other analysis performed by ProtParam for the physiochemical characterization of HPs. The pI prediction helps in developing the buffer systems which are crucial for Isoelectric Focusing. Extinction coefficient was predicted on the basis of the concentration of Cys, Trp and Tyr amino acids within the protein sequence. It is important to know about extinction coefficient of the proteins in the drug development process. The Extinction coefficient helps in studying the protein-ligand and protein-protein interactions. Aliphatic index depends on the presence of aliphatic groups in the sequence. The number of aliphatic residues in the protein sequences increases the incidence of thermal stability in the case of globular proteins. The higher the aliphatic index, greater will be the protein’s thermal stability. GRAVY value helps in determining the extent of protein-water interaction. The HPs, predicted with low GRAVY score are likely to interact better with water .
3.3. Subcellular Localization Analysis
We have characterized 525 HPs into five subcellular localizations namely cytoplasmic, inner-membrane, outer-membrane, periplasmic and extracellular proteins. These five locations are characteristic to gram-negative bacteria. Out of these 525, 219 HPs are predicted to be in cytoplasm and 84 HPs as membrane bound proteins including inner membrane and outer membrane proteins. 56 HPs are predicted as periplasmic proteins and rest found to be extracellular (Supplementary Table A.2). We have found 150 HPs, which are having transmembrane helices (TMH). The prediction of membrane proteins is important, as they are vital for survival. TMH constitute these membrane proteins, which are key component in cell-cell signaling, ion and solute transportation and self-recognition. The membrane bound receptors are very significant in pharmaceutical industry .
3.4. Functional Analysis
We have analyzed 525 hypothetical protein sequences of N. meningitidis strain H44/76 for further characterization by searching functional motifs, domains, families, superfamilies and clusters present in them. The results are deduced on the basis of predictions made by latest bioinformatics tools (Supplementary Table A.3, Table A.4). We have successfully assigned functions to 202 proteins with high confidence. We have categorized these HPs into 91 enzymes, 25 binding proteins, 28 transporters, 9 immunity proteins, 6 bacteriophage related proteins and 43 regulatory proteins illustrated in Fig. 1. We have found well-conserved domains in 167 HPs. Many HPs contain domains with enzymatic activities and predicted as hydrolases, oxidoreductases, transferases, isomerases, lyases, ligases, kinases, phosphatases etc.
In every organism, enzymes affect the metabolic processes directly or indirectly. Enzymes produced by bacterial species are crucial for their sustenance in the host organism. These pathogens utilize the nutrients taken from the host for triggering pathogenicity, survival, maintenance and growth in the host. Information of these enzymes provides broad understanding of the interactions between pathogen and host .
|S.No.||Sub-category||Predicted Hypothetical protein|
|1.||Oxidoreductases||HP E6MWI9, HP E6N024, HP E6MTY1, HP F0MKA2, HP F0MKA2 etc.|
|2.||Transferases||HP E6MY14, HP F0MJ56, HP E6MZG2 etc.|
|3.||Hydrolases||HP E6MZ19, HP F0MM28, HP F0MKI8, HP E6MUL4, HP E6MX06, HP F0MIS3 etc.|
|4.||Lyases||HP E6MU60, HP E6MWH8, HP E6MXD6, HP F0MNU9|
|5.||Isomerases||HP E6MYS2, HP E6MYS3, HP F0ML13, HP E6MWT7, HP E6MZY1, HP E6MU35|
|6.||Ligases||HP E6MV50, HP E6MY93|
Here, we have identified 91 HPs as enzymes, which are classified further into sub categories. Few are listed in the Table 2.
The survival of the pathogen, N. meningitidis H44/76, also depends on the ability of the pathogen to extract essential nutrients from the host  (http://mmbr.asm.org/content/68/1/154.full). Import and export of these substances takes place with the help of transporter proteins. In our thorough analysis of H44/76 strain, we have annotated 28 HPs as transporters, which are functioning as membrane transporters, carriers and receptors. HP E6MWB7 and HP E6MXV5 were predicted as a divalent cation transporter. HP E6MX75, HP E6MUK8, HP E6MX75 and HP E6MZ77 were identified as the member of sulfite exporter TauE/SafE family.
3.4.3. Binding Proteins
We have identified four HPs namely HP E6MYN4, HP E6MYR5, HP E6MUN8 and HP E6MYY5 containing tetratricopeptide repeat. These repeats are responsible for protein-protein interactions and assembly of multiprotein complexes. Proteins carrying such repeats are identified to participate in cell cycle regulation, mitochondrial and peroxisomal protein transport, transcriptional control and protein folding . HP FOMNK9, HP FOMJK0, HP FOMJK3, HP FOMJW3, HP FOMJW5 and HP FOMJW8 have been identified to have Hemagglutinin repeat. Such proteins are found to act as adhesins, filamentous haemagglutinins or Haem/haemopexin-binding proteins .
3.4.4. Regulatory Proteins
Some proteins are involved in various cellular processes like cell cycle, signaling pathway, replication, transcription, translation etc. These proteins are of crucial in the pathogenesis of the organism. HP E6MX02, HP FOMXL4 and HP E6N0E7 were identified to have Hedgehog/intein (Hint) domain. These proteins are required for embryonic cell differentiation. These proteins are released as inactive precursors having N terminal signaling domain and C terminal auto-processing domain .
There are four HPs that are identified as transcriptional regulator. HP E6MUE0 belongs to Ic1R family of transcriptional regulator as predicted by using various tools. The proteins of this family share winged helix-turn-helix DNA-binding domain which is responsible for their activity. It works as a repressor of the acetate operon in Escherichia coli and Salmonella typhimurium .
HP E6MWQ9 was predicted as lipoate regulatory protein YbeD. YbeD protein is homologous to 3-phosphoglycerate dehydrogenase and the function of this enzyme is to regulate the synthesis of lipoic acid .
3.4.5. Virulent Proteins
We have identified 26 HPs as virulence causing factors (Table 3). These are concluded on the basis of VIRULENTpred and VICMpred predictions (Supplementary Table A.6). Virulence causing proteins can serve as potent drug targets for the drug discovery process. Lipooligosaccharides are found in the outer membrane of N. meningitidis and responsible for septic shock and hemorrhage. This is due to the destruction of red blood cells by this endotoxin . Its polysaccharide capsule and fimbriae also helps the pathogen in spreading the virulence [52,53].
|Amino acid composition||Dipeptide Composition||Higher order Dipeptide composition||PSI-BLAST PSSM profiles||Cascade of SVMs and PSI BLAST|
|21||F0MJ95||Virulent||Non-Virulent||Non-Virulent||Virulent||Virulent||Information and storage|
|22||F0MM93||Virulent||Virulent||Virulent||Virulent||Virulent||Information and storage|
|26||E6MUE2||Non-Virulent||Virulent||Virulent||Virulent||Virulent||Information and storage|
We have found 6 HPs being associated with Mu-like prophage FluMu protein which are still functionally uncharacterized. Out of these 6, 3 HPs namely HP E6MWY8, HP E6MUE2 and HP E6MUE1 are found to be virulent. We have also found 9 HPs working as immunity proteins. These 9 proteins namely E6MWC3, F0MJ74, F0MJ88, F0MJ91, F0MJ94, F0MJ95, F0MM93, E6MVL2 and F0MLM6 have been found to participate in bacterial polymorphic toxin system and present next to toxin gene. These immunity proteins are characterized on the basis of the all alpha helical fold and a conserved proline residue. They usually contain Tox-REAse-1 or Tox-REase-6 family domain . VIRULENTpred predicted HP E6MVL2 as non-virulent.
With the advent of new genomic data, it becomes mandatory to annotate the HPs sequenced in the genome of parasitic pathogens. The structural and functional characterization will allow us to understand the role of these HPs in causing pathogenicity and thereby finding new drug targets. We precisely analyzed 525 HPs of N. meningitidis H44/76 strain which might help in prioritizing the targets for experimentation among the reservoir of proteins. 202 HPs for which the functional prediction is performed with high confidence can be used in development of pathogen-system targeted drugs. Other 33HPs need to be analyzed further to get that confidence level in prediction. Characterization done on the basis of subcellular localization and physiochemical properties is going to help in distinguishing the membrane proteins and transporters specifically. By using docking studies, the structure of these proteins can aid in finding the possible inhibitors that act on these targets. Further, 26 Virulence proteins that are being found are of utmost importance in understanding the pathways in which they are involved and to understand how they are helping the pathogen to survive. All these findings and annotations will definitely help in further characterization studies.
We express our sincere thanks to Dr. Ajay K. Arora, Principal, Deshbandhu College, University of Delhi, Kalkaji, New Delhi- 110019 for his help in providing the infrastructure and computational facility required for this work.