<i>In Silico</i> Analysis of Occurrence of Tricorn Protease and Its Homologs

Tricorn protease is an archaeal protease acting downstream of the proteasome and together with its interacting aminopeptidases, degrades oligopeptides to free amino acids thus playing an important role in protein turnover. This study reports a wide distribution of tricorn protease and its homologs in archaea and bacteria. The homologs were identified through a combination of PSI-BLAST, orthology clustering and domain predictions. Functionally important sites were identified through multiple sequence alignment conducted by MAFFT v. 7. The aligned sequences were used to predict the phylogenetic relationship of tricorn protease and its homologs using MEGA v. 7. The functional associations of tricorn protease were predicted through STRING network v.10.0. This study identified several tricorn protease homologs in archaea and in all the bacterial phyla complete with β-propeller, PDZ and catalytic domains. However, in eukaryotes, tricorn protease-like homologs seemed limited to viridiplantae, stramenopile and in a basal metazoa and were classified as non-peptidase homologs with unknown functions. Conserved domain architecture retrieval revealed detectable homology of tricorn protease C-terminal half with the carboxyl-terminal proteases with similar PDZ domains. Therefore, this study predicts functional conservation of tricorn core catalytic domain in prokaryotes and given its role in cellular functions, targeting this protein or its functional homologs in prokaryotic pathogens could lead to development of alternative therapeutic agents.


Introduction
The degradation of cytosolic proteins is mainly carried out by ATP-dependent proteases employing molecular sieving techniques. These proteases include the proteasomes which are responsible for removing misfolded or unneeded proteins, however, the lengths of these peptides range from 7 to 9 amino acid long thus require further processing for them to be of any use to the cell [1]. Studies have shown that these peptides get rapidly cleared since their accumulation of in the cytosol could interfere with important protein-protein interactions and also their degradation provides amino acids for use in the synthesis of new proteins thus is essential for cell viability [2]. In Thermoplasma acidophilum, tricorn protease and its interacting factors, F1, F2, and F3 act in vivo downstream of the proteasome degrading proteasomal products to free amino acids for other metabolic processes [3]. Tricorn protease is a hexameric protease of 720 kDa and can assemble into a giant icosahedral capsid, which might serve as the organizing center of a multi-proteolytic complex [4 -5]. The monomeric 120 kDa polypeptide has a mosaic structure with two open Velcro beta-propeller structures of six and seven blades, a helical bundle, a PDZ domain and an alpha-beta sandwich structure (PDB ID: 1 K32 A). These five domains combine to form one of six sub-units, which further assemble to form 3-2 symmetric core protein. [6 -7]. Tricorn protease N-terminal domain folds as a six-bladed βpropeller (β6) followed by a seven-bladed β-propeller (β7). The β6 represents a gated exit from while β7 represents a passage route into the catalytic chamber and the PDZ domain is inter-sparsed between two carboxyl-terminal mixed α-β domains C1 and C2 [7]. The nucleophilic serine is positioned at a helix entrance within subdomain C2 and the arrangement of the tetrad active site molecules suggests peptide bond hydrolysis following the classical trypsin-like serine proteases [7].
Tricorn protease acts as a carboxypeptidase with di-and tripeptidase activity [8] and it seems limited to some archaea and eubacteria thus functional analogues exist. Indeed, tetrahedral aminopeptidase (TET) has been shown to be a complementary protein degradation machinery to tricorn protease in Pyrococcus horikoshii and it degrades peptides to free amino acids [9]. In eukaryotes, tripeptidyl peptidase II (TPP II) is a tricorn protease functional analog, and has been shown to act downstream of the proteasome [10] [11] [12], where it is implicated in peptide processing of MHC class I antigens [13] [14]. Since peptides for antigen presentation are generated through the degradation of proteasomal products, the parasite secreted oligopeptidases targeting such peptides would limit host immune response thus aid in immune evasion [15].
Tricorn protease C-terminal region has been shown to have detectable structural homology with several C-terminal processing proteases (CTP) of bacterial and eukaryotic origin [7] for example the eukaryotic D1 protease, which cleaves 8-16 residues peptide from C-terminal of D1 protein allowing light driven assembly of the tetranuclear Mn cluster responsible for photosynthetic water oxidation in PSII [6]. Cterminal processing proteases have been shown to contribute to virulence in some pathogenic bacteria for instance Staphylococcus aureus [16]. In Escherichia coli, a tail specific protease, also known as periplasmic protease is involved in C-terminal processing of penicillin binding protein 3 and mutations of this gene have resulted in altered cell morphology and increased susceptibility to thermal and osmotic stress due to reduced cell-wall integrity [16]. In Brucella suis, CTP has been associated with protecting cells against osmotic pressure, determining cell morphology and survival during acute and chronic phases of infection [17]. This study therefore investigated the occurrence of tricorn protease and its homologs in archaea, bacteria and eukaryotes.

Blast Identification of Tricorn Protease Homologs
Position specific iterated basic local alignment search tool (PSI-BLAST) (https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE =Proteins&PROGRAM=blastp&RUN_PSIBLAST=on) [18] search was initiated using the T. acidophillum tricorn protease (AAC 44621.1) protein sequence against the NCBI's non-redundant (NR) protein database. The search was performed to determine patterns of conservation which aid in the recognition of distant similarities with a cut-off e-value of 1 e-5 for significant matches.
Identification of orthologous groups was done through Ortho MCL database (http://orthomcl.org/orthomcl/) [19] which employs Markov Cluster algorithm to group putative orthologs. The algorithms involved BLASTp comparisons of the query sequences (AAC44621.1) and sequences from other eukaryotic genomes (fungi, plants, animals, protists and protozoans). The between species and within species relationships of the putative orthologs were identified by reciprocal best similarity pairs with a cut-off e-value of 1 e-5.
Protein homology through domain architecture was determined through Conserved Domain Architecture Retrieval Tool (CDART) (https://www.ncbi.nlm.nih.gov/Structure/lexington/lexington. cgi) [20]. The hits obtained represented proteins with significant structural similarities to the structure of tricorn protease.

Multiple Sequence Alignment
Multiple sequence alignment was performed by MAFFT version 7.0 (Multiple alignment program for amino acid sequences) at http://mafft.cbrc.jp/alignment/server/index.html [21] integrating the BLOSUM62 matrix which scores alignments between evolutionarily divergent protein sequences. MAFFT version 7.0 default gap penalty (1.53) and a cut-off e-value (1 e-5) were used to obtain only significant matches.

Protein-Protein Interactions
Protein-protein interactions of the tricorn protease with other proteins were analyzed through STRING database version 10.0 (Search Tool for Retrieval of Interacting Proteins) (http://string-db.org) [22] to determine their associations in-terms of co-expression, text-mining, protein homology and gene neighborhood. The searches were performed at high confidence levels (≥ 0.7) and included only first shell interactions. Protein-protein interaction enrichment p-value of ≤ 0.05 was considered significant.

Constructing the Phylogenetic Tree of Tricorn Protease and Homologs
The amino acid sequences of tricorn protease and its homologs were aligned using MAFFT version 7.0 at http://mafft.cbrc.jp/alignment/server/index.html [21]. Phylogenetic analysis of aligned amino acid sequences was performed using the MEGA version 7.0 package [23]. Phylogenetic tree was constructed using Maximum Likelihood method [24] and bootstrap resampling (1000 replicates) was used to assess the robustness of the groupings obtained.

Blast Identification of Tricorn Protease Homologs
Protein homology through sequence similarity conducted through BLAST homology searches predicted 159 sequences in archaea, 62,710 sequences in bacteria, 609 sequences in plants, 302 sequences in protists, 128 sequences in animals, 85 sequences in fungi and 9 sequences in viruses. The archaeon sequences were mainly annotated as tricorn proteases and tricorn-like proteases while the bacterial hits included tricorn protease homologs and C-terminal proteases. The hits in plants included D 1 processing proteases, Cterminal proteases, peptidase S 41 family, proteases, tricorn protease homologs and predicted proteins. The hits in animals included protease m1 Zn metalloproteases, Ctp Alike serine proteases, membrane alanyl aminopeptidase, glutamyl aminopeptidase, endoplasmic reticulum aminopeptidase, aminopeptidase N, aminopeptidase Q, PDZ containing proteins, WD40-like containing proteins and Cterminal proteases. The fungal hits included tricorn protease N-terminal domain (6-bladed beta propeller domain) and tricorn protease domain II (7-bladed beta propeller domain) while all the hits in viruses were hypothetical proteins with WD40 domains. Further analysis of tricorn protease-like sequences through PSI-BLAST displayed low sequence similarity among the homologs with the highest similarity being 54%, tricorn protease, Thermoplasma volcanium, a close relative of Thermoplasma aciodophillum. Other archaeon hits, example, tricorn protease from Ferroplasma acidarmanus had 39% sequence identity to the query sequence (Table 1). The alignments were obtained from PSI-BLAST with tricorn protease (AAC44621.1) running against NCBI's non-redundant protein database. The results indicate low sequence similarity among the tricorn protease homologs.
The BLAST results also revealed a wide distribution of tricorn protease homologs across the bacterial genome with significant hits found in all bacterial phyla. These hits included hypothetical proteins in Aminicenantes bacterium  Table  1). The sequence similarity in the bacterial homologs was remarkably low and was within the range of 29-34% but with significant E-values ( Table 1). The similarities of these bacterial homologs with the archaeon tricorn protease were also reflected in domain composition and organization ( Figure 1).
Significant hits were also present in some eukaryotes; example viridiplantae, tricorn protease, Ostreococcus tauri, protease, Bathycoccus prasinos and predicted protein, Micromonas pusilla. Predicted proteins in Thalasiossira pseudonana and in Phaeododactylum tricornutum had significant matches with the archaeon tricorn protease as well as hypothetical protein in Nematostella vectensis, a basal metazoa (Table 1). In-terms of domain organization, the viridiplantae hits had similar domains composition and domain organization to the archaeon tricorn protease while the stramenopiles (Thalasiossira pseudonana and Phaeododactylum tricornutum) and the basal metazoan hits lacked the N-terminal domain (Figure 1).
Orthology predictions through OrthoMCL database categorized the hits into three ortholog groups where OG 5_164838 included the archaeon tricorn protease, bacterial and eukaryotic peptidase homologs. Ortholog group OG5_204488 included non-peptidase homologs in Thalassiosira pseudonana, Ostreococcus tauri. Ortholog group OG5_130275 included the carboxyl-terminal proteases, carboxyl-terminal processing proteases and tail specific proteases widely distributed in other bacteria and higher eukaryotes ( Table 2 and Appendix 1).

Multiple Sequence Alignment
Multiple sequence alignment revealed conservation of active site tetrad in close homologs with major variation occurring in the hypothetical protein in Nematostella vectensis which seemed to lack both the N-terminal domain and the catalytic domain 1 (C1) (Figure 2). The active site nucleophile S965 (position corresponding to tricorn protease in Thermoplasma acidophilum) was conserved in all the homologs (Figure 3).

Protein-Protein Interactions
The network of protein-protein interactions of tricorn protease, Thermoplasma acidophilum through STRING database revealed 11 nodes with 12 edges and average node degree of 2.18. The clustering coefficient was 0.273 with a protein-protein interaction enrichment p-value of 1.52 e-06.
However, only 2 edges were expected thus significantly more interactions than expected. Based on gene ontology annotations, the network's major functional enrichments was proteolysis (GO: 0051603) as biological process with an evalue of 1.47 e-06 with cellular component GO: 0005737) being cytoplasm. The major KEGG pathway for the network was proteasome (03050) (Figure 4).

Constructing the Phylogenetic Tree of Tricorn Protease and Homologs
The phylogenetic tree displayed inferred evolutionary relationship with nodes indicating separate paths. The node value given as percentages represent the measure of support for the node where 100 represents maximal support, that is, sequences to the right node cluster together to the exclusion of any other. The phylogenetic tree showed two main evolutionary nodes where tricorn protease clustered with the homologs having the Peptidase_S 41_TRI domain while the carboxyl-terminal proteases having Peptidase_S 41_CPP domain also clustered together ( Figure 5).

Discussion
This study predicts a wide distribution of tricorn protease within archaeon with presence of bacterial homologs and therefore could indicate the importance of this protein in cellular functions. Proteasomes are ubiquitous in archaea which also seem to have AAA ATPases closely related to 19 S regulatory proteins in eukaryotes [3]. However, studies have shown that the proteasome is dispensable under normal conditions in Thermoplasma acidophilum. [25]. This is could only show the existence of variant pathways that contribute to the pool of oligopeptides in the cytosol thus the well-developed tricorn protease machinery in these organisms [26]. Studies have reported that in bacteria, the occurrence of a genuine proteasome is limited to actinomycetes with majority of the bacteria having the ATP-dependent proteasome-related HsIUV [8].
The identification criteria used in this study did not find tricorn protease or its homologs in methanogens, desulforococales, nanoarchaeota and korarchaeota which have been shown to have complementary pathways [9].
However, this study predicts the existence of tricorn protease/ tricorn protease homologs in all the bacterial phyla with complete similar functional domains (beta-propeller, PDZ and catalytic domains). Indeed, some of these homologs have been characterized [27]. In bacterial species where tricorn gene seemed missing for example E. coli, the carboxyl-terminal protease functional analog was present. The beta-propeller domain was however not well conserved in the carboxyl-terminal protease functional analogs, a fact that has also been reported in other studies [27]. This shows functional conservation of tricorn core catalytic domain since the beta-propeller domains mainly serve as gated exit for substrates into and products out of the catalytic chamber [7].
Other studies have shown that in eukaryotes, detecting tricorn analogues might be difficult since tricorn is built from five folding domains thus it is possible that tricorn analogues might assemble non covalently from different gene products and could have additional functionalities [6]. Therefore, based on sequence and structural homology, this study predicted tricorn protease homologs in few eukaryotic groups, but they seemed to lack the beta-propeller domain while others also lacked PDZ domain. An isolated tricorn protease C 2 subdomain, a subdomain associated with the catalytic residue also existed in a basal metazoa, Nematostella vectensis. Other eukaryotes like fungi were shown to have tricorn protease beta propeller domains which also seemed to occur in kinetoplastids. These organisms also seemed to have genes encoding homologs of tricorn protease interacting factors which cooperate with tricorn in degrading proteasomal products [28] [29]. This further raises the possibility of tricorn analogues assembling from different gene products in eukaryotes.

Conclusion
Given the predicted wide distribution of tricorn protease and its homologs archaea and bacteria, this protease machinery could have been conserved during evolution thus is essential in cellular protein degradation. Therefore, targeting it or its functional homologs in pathogens could lead to development of alternative therapeutic agents. D