Quantifying Steric and Hydrophobic Influence of Non-Standard Amino Acids in Proteins That Undergo Post-Translational Modifications

: Non-standard amino acids in protein post-translational modifications aid in a wide variety of biological functions and processes, furnishing expansion from the genome to the proteome. First, from structural examinations in unmodified proteins with only standard amino acids, this work empirically obtains numeric relations that reveal how instruction transfers occur between native-state structures. Next, from these relations, the influence of non-standard amino acids inside post-translationally modified proteins is quantified by successfully predicting the contents of large and hydrophobic residues in helices and β-strands for 210 inspections performed. This suggests a twofold molecular mechanism by the fundamental biophysicochemical properties (residue volume and hydrophobicity), and concludes that the utilized non-standard amino acids have limited global influence at the residue level. Our prediction method provides a better underlying understanding of molecular interactions and mechanisms, and is particularly promising in terms of surveying further modified proteins.


Introduction
After biosynthesis from genetic instructions, many peptide and protein species undergo post-translational modifications on the endoplasmic reticulum in order to exert their specialized biological purposes, control varied cellular processes, modulate chemical reactions, and interact with other molecules. These modifications may aid to alter chemical properties, folding events, macromolecular stabilities, activity states, and subcellular locations [1][2] of peptides and proteins. Therefore, the proteome of a living organism is two to three orders of magnitude more diversified than its encrypting genome [3], thus making such modifications a relevant topic for computational biochemistry, molecular biophysics and molecular-cellular biology of macromolecules. There are several posttranslational modifications, including those made through additions of non-standard amino acids to one or more standard amino acid residues to redirect the protein chain in the proper direction and, thus influencing or assisting its charge, hydrophobicity, conformation, stability and function [4].
Among the additions of non-standard amino acids, the phosphorylation (insertion of a phosphate group to an amino acid side chain) [5], acetylation (and amination) by inclusion of a small acetyl radical ACE (and an amino univalent radical NH2) at the extremities of the primary sequence, and hydroxylation of proline and lysine are very common. Some native proteins greatly increase their resistance against conformational degradation by acetylation/amination of the extreme amino/carboxy positions, since many proteasesspecially the aminopeptidases and carboxypeptidases-require either an amino or a carboxy terminal to selectively act [6][7]. Acetylation and amination may also be used to eliminate the influence of charged groups at terminal portions [8][9]. The post-translational succinylation (or insertion of a group succinyl (SIN) to ends of a chain) in the fragmentation of biopolymers can both solubilize the peptide derivatives and block the action of enzymes [10], as well as inhibiting hydrolysis [11].
In modified proteins with the presence of covalent posttranslational alterations, selective uses of non-standard compounds or amino acids-such as uncommon residues, cofactors, and prostetic groups-furnish proteome expansion and diversification [3]. Non-standard compounds usually have specific aims, as shown by the following cases: the alpha-aminobutyric acid (ABA), a parent of the alanine, is used for selective replacement of half-cystines [12][13]; 3amino-alanine (DNP) may alter the affinity of the amino acids involved in potassium channel binding [14]; D-proline (DPR) can nucleate β-turns of appropriate stereochemistry and type II conformations [15]; alloisoleucine (ILL), a stereoisomer of isoleucine, orients the side-chain into a trans χ1 dihedral angle [16]; norleucine (NLE) sometimes produces only local perturbations reflecting in an increased folding rate [17]; and the pyroglutamic acid (PCA), a parent of the glutamic acid, in the N-terminal tail plays useful functional roles, and reduces the susceptibility to aminopeptidases [18][19]. In addition to the above-mentioned post-translational alterations, several other chemical compounds are added to protein chains [20].
With the main purpose of quantifying the global steric and hydrophobic influence and reaching a better underlying understanding of non-standard amino acids in modified proteins, this study makes use of: (i) an empirical approach taking data directly from experimentally derived proteins; (ii) a quantitative formulation for the proteins selected through numerical relations and prediction rules from the relationship between primary and secondary structures; (iii) validations of the utilized relations and rules; and (iv) computer algorithms to facilitate and automatically operate data processing for the items (i)-(iii). The remainder of this study is arranged as follows. In the Materials and Methods section, we establish the benchmark dataset, binary codes for amino acids, and the accuracy scale for residue content predictions. In the Results and Discussion section, we obtain numerical rules from unmodified proteins, and use these rules for inspecting mechanisms and influence of non-standard residues in two target subgroups of post-translationally modified proteins. This study is concluded in the Conclusions and Future Developments section.

The Benchmark Dataset
The proteins for our benchmark dataset are carefully selected under the following conditions: in a specific extension with equal number of beads or residues, the modified and unmodified (only with standard or L-α-amino acids) exemplars should exist in a quantity greater than or approximately to two tens each (un)modified exemplar; and in each utilized extension, the chosen exemplars must be non-redundant with either low-similarity residue sequences (less than 25% identity) or with differences in helices or strands of at least four residues. Taking the above restrictive conditions, and among several sequence extensions that were extensively investigated, we opt for a systematic analysis of 35-residue modified (target subgroup I) and unmodified (template group) proteins deposited in the Protein Data Base (PDB) [21]. The target subgroups I and II with modified proteins from 35 to 40 residues and their non-standard amino acids will be displayed in the Table 1 of Supplementary Data  Appendices. Proteins with 35 residues have diversified residue dispositions, varied structures and functions, and are in many biological sources. The 35-residue unmodified proteins consist of 39 exemplars, while the modified ones include 16 exemplars and residue sequences with at least one of the nine non-standard chemical compounds (ABA, ACE, DNP, DPR, IIL, NH2, NLE, PCA, and SIN), whose skillful roles [6][7][8][9][10][11][12][13][14][15][16][17][18][19] were previously described in the Section 1. Structures of proteins with equal number N of residues are usually compared by their compactness utilizing the radius of gyration R G defined as: where r k,l is the Euclidean distance between mass centers of residues "k" and "l" from the atomic coordinates laid in the PDB library.

Amino Acid Types, Primary and Secondary Structures, and Prediction Accuracy
Protein polymers employ a rich repertoire of covalentlylinked residues (non-standard and 20 standard amino acids) that cover a wide range of shapes, sizes in many atomic and molecular interactions. This residue diversity consequently determines the broad variety of biophysicochemical properties that are fundamental in ascertaining macromolecular structures and functional activities [22][23]. Among such properties, the volume and hydrophobicity have been considered as two primary components [24][25]. The residue volumes (steric contributions) and hydrophobicities (hydrophobic effect and interactions) are dominant throughout the selection and maintenance of threedimensional folded configurations under varied physiological conditions and environmental contexts [26][27].
The amino acids are denoted only by their volumes or sizes [28][29] and hydrophobicities [30], via coarse-grained binary codes, large-small (LS) and hydrophobic-polar (HP), respectively. The standard amino acids and by association the non-standard ones, are large-hydrophobic (F, H, I, L, M, V, W, Y; IIL, NLE), large-polar (E, K, Q, R), small-hydrophobic (A, C, P, T; ABA, DPR, PCA), and small-polar (D, G, N, S; ACE, DNP, NH2, SIN). These reduced codes, LS and HP, compress the contained information in biomolecular structures without great loss of direction, capturing many of their essential and basic features [22]. The codification LS expresses the steric constraints and packing efficiency [23][24]; the label HP embodies the intra-chain and medium-chain interactions [25,[30][31]. In the 55 analyzed 35-residue proteins, the standard amino acids are much more frequent than the specific non-standard compounds, having 1899 and 26 residues, respectively. Among the 20 standard amino acids, large and hydrophobic sub-components dominate, both independently having 12 units and, in consequence, these sub-components are taken into account for the outputs shown below.
A one-dimensional residue sequence can be suitably represented by its total number of large (n L ) and hydrophobic (n H ) residues in the primary structure of native chain. Each n i may be associated with no, one or many proteins in the dataset, whose the subscript character "i" stands for the large or bulky (L) and hydrophobic or apolar (H) residues in the primary and secondary levels.
Omnipresent steric and hydrophobic interactions represented by binary codes (large and hydrophobic residues) are suitable key tools to observe the global biophysicochemical influence of the non-standard amino acids in two periodic structural motifs (helices, and β-sheets formed by strands). Other resolution levels (e.g., more letter codes or atomic approaches) should be utilized for sharper measurements of non-standard amino acids. Furthermore, we only evaluate overall lengths of motifs more than five residues (L j >5) to provide additional security to our comparative structural studies, where the character "j" accounts for the helices (h) and strands (e). For unmodified proteins of the template group, the actual contents (designated by t i,j ) of large and hydrophobic sub-components in secondary motifs give rise to percentage fractions p i,j given by: where p i,j varies from 0 (when L j does not own large and hydrophobic residues, t i,j =0) to 100% (in case of L j uniquely owned by these residues, t i,j =L j ).
In the modified proteins of the target subgroups I and II, the efficiency of the predictions for the estimated contents t i,j of large and hydrophobic residues in periodic motifs is measured by observing the dissimilarity with the actual values through: where ∆t i,j is given in absolute value, at residue level, and can range from zero (at best) to L j (at worst). More specifically, the prediction accuracy is considered excellent (when , and bad (∆t i,j ≳2.0 as long as ∆t i,j -0.1L j >1.0).

Prelusive Measures in the Template Group and Target Subgroup I
Modified and unmodified 35-residue proteins are amply diversified structurally (Figure 1a-c), as seen by their varied compactness (R G (1)), and overall lengths of 3 10 -, α-and πhelices (L h ), and strands (L e ) inside parallel and antiparallel β-sheets. In the inset of Figure 1c, a fraction p L,e (2) of large sub-components in strands is displayed, whereas L e >5. Total length L j ≤5 has few stabilizing interactions, and structural variations that are often aggravated by conformational changes [33], helical distortions [34] and configurational instability [35].
The functional native conformations have distributed compactness degree from very compact (R G ≤9.0 Å, Figure  1a) to rather tight, to the swollen or less densely packed (R G >11.0 Å); and whose cutoff threshold values of 9.0 and 11.0 Å are strategically used for more precise linear adjustments in the following subsection. The modified and unmodified protein conformations are proportionally compatible, as in R G and L h (Figure 1a

Relationship Between Primary and Secondary Structures in the Template Group
According to the compactness and cutoff values (9.0 and 11.0 Å) of R G (1), the 39 unmodified protein chains are categorized into 11 highly compact, 10 reasonably tight, and 18 less packed chains. For every chain with L j >5, one separately computes the extent to which the conformational compactness degree is related to contributions of the large and hydrophobic residues from the primary sequence (n i ) to its secondary structural elements (p i,j )-although, p i,j (2) does not seem to be connected with n i . The amounts of p i,j in relation to n i (Figure 2) are plotted for 29 data points of helices and 11 of strands in their respective graphs, totalizing 80 structural inspections, and adjusted by linear fits, whose general equations are expressed as: where m, b and R are the slope, intercept, and linear correlation coefficient.
The 80 data points of 40 unmodified samples are sufficiently scattered and give rise to their corresponding linear regressions ((5)-(9) in Figure 2), expressing the noticeable efficacy of the sub-components (large and hydrophobic) of the residues in functional conformations. The fractions p i,j are slightly dependent on the residue subcomponent types, considering that the hydrophobic residues (7)-(8) have more sloped regression lines than those large ones (5)-(6), as expressed by their greater slopes, m (and negative intercepts, b), other than at p H,e (9) of less closely packed configurations. Most unmodified samples have both points around the straight lines, indicating an efficient and simultaneous use of both sub-components from primary to secondary structures by means of a double effective molecular mechanism.
Some special samples possess a residue sub-component That Undergo Post-Translational Modifications that is more selective and compensatory than the other by a single effective mechanism. For instance, the once underlined chains 1PXQ and 1ROO are farthest from the linear fit for p L,h (5), but at the same time they are near to p H,h (7), as designated by arrows, and therefore using a single   (8). These samples present particular structural features [36][37][38][39], such as reasonable conformational flexibility, relatively short secondary structure motifs (L j <10), and motifs with residues into segments close to flexible N-or C-terminal domains. Different from the first three cases (p i,h and p L,e (5)-(7)), the fractions p H,e of hydrophobic residues in strands are dependent on the packing density, whereas more compact globules (R G ≤9.0 Å) assume an upward-sloping regression line ((8) alike to (5)-(7)), and in a different way the less dense forms (R G >11.0 Å) are better described by an almost horizontal line (9). The dependence of p H,e with the conformational packing may concurrently result from longrange interactions into hydrophobic exclusions, and non-local bonds into strands forming β-pleated sheets. The partially tight sample 1E4R (9.0 Å<R G ≤11.0 Å) adjusts reasonably well in both linear relations (8)-(9), maintaining fairly unchanged these relations.
The linear relations p i,j (5)-(9) are validated partitioning our benchmark dataset into separated subsamples similar to cross-validation tests in statistical analyses [40]. In different partitions, the p i,j relations remain almost unaltered and, therefore, they should be considered reliable and well constituted. For example, in three sub-collections of the present dataset (compact non-toxins [41], R G ≤11.0 Å; unmodified samples, Figure 2; together modified and unmodified samples [42]) for helices (p L,h (5)) with 24, 29 and 41 data points, the slopes m are equal to 2.6, 2.4 and 2.5; for p H,h (7), m are 3.2, 3.2 and 3.2; and for strands (p L,e (6)) with 4, 11 and 17 points, m are 2.3, 2.3 and 2.6, respectively. Eqs. (5)-(9) are re-validated by predictions in protein samples of the target subgroups I and II, as shown in the following subsections.

Influence of Non-Standard Amino Acid Residues in the Target Subgroup I
The sequence-structure relationship between p i,j and n i articulated by five crucial rules (5)-(9) from unmodified samples may or may not be amenable to validation, when applied to modified samples with non-standard amino acids. In addition, whenever the length L j is previously known, one can predict the most probable contents (or estimated t i,j ) of large and hydrophobic residues in secondary structure topologies from (2) and (4), according to the estimation equations given as: Utilizing the previous numerical expressions (5)-(10), residue content predictions are made for 12 tubular helices and 6 extended strands in their corresponding graphs, totalizing 36 estimated t i,j that are compared with their 36 actual t i,j from Promotif ( Figure 3). For hydrophobic interplays in β-strands (p H,e ) of slightly compact samples, the linear relation from very tight globular conformations (8) is arbitrarily employed.
In the current detailed case study, one observes strategic acting of the steric and hydrophobic interactions by means of molecular mechanisms given by: (a) in 26 out of 36 estimates (Figure 3), 13 modified samples make use of a double effective mechanism with both residue sub-components being simultaneously used in the native conformations, as seen by narrow proximity of actual and estimated t i,j (both ∆t i,j ≲1.5 from (3)); (b) seemingly, in eight estimates, four once underlined samples (1BDE, 1C4E, 1RH4 and 1WY3) employ a single mechanism with a looser sub-component (only one ∆t i,j ≲1.5) in the helical or β-sheet topologies, mainly regarding the hydrophobic selectivity; and (c) in two remaining estimates, one double-underlined sample (1JY4) uses both ∆t i,e ≳2.0. Before attributing such unexpected outputs (∆t i,j ≳2.0 in items (b)-(c)) to occurrence of nonstandard amino acids, a deeper survey for reevaluation of these outputs (Figure 4) is necessary.
Two probable reasons for 6 apparently undesirable predictions t i,j in 5 underlined samples (items (b)-(c) above, Figure 4) are sizable extents of L j (first term in (10)), greater or approximated to 10; or the performance of linear relations (mn i +b, second term in (10)). Some values of estimated t i,j are related to L j ≳10 and allow differences ∆t i,j -0.1L j ≤1.0 that occur for four predictions in 1JY4 (t L,e ), 1BDE, 1RH4, 1WY3 (t H,h ), with ∆t i,j equal to 2.4, 2.3, 2.2, 1.7, resulting from L j equal to 21, 27, 16, 25, and so with ∆t i,j -0.1L j equal to 0.3, -0.4, 0.6, -0.8, respectively. Hence, when L j ≳10, the strict accuracy requirement of ∆t i,j ≈0.0 or 1.0 is re-evaluated, and ∆t i,j ≈2.0 is considered plausible-that is, in the vicinity of one-tenth of L j , as long as ∆t i,j -0.1L j ≤1.0.   Figure 3. Biological assemblies of 1JY4 and 1RH4 with quaternary structures are shown, and non-standard amino acid residues are seen in a ball and stick model. Native configurations were created from the Jmol viewer [43].
In the two remaining predictions (1C4E and 1JY4) with seemingly inconvenient estimations, both estimated t H,e are connected with the appropriate application of the linear relations (mn i +b) from less densely folded structures (9) instead of the relation (8) arbitrarily utilized. Thus, the highly tight 1C4E (L e =6) reduces its ∆t H,e from 1.8 (Figure 3) to 0.3 (now). The marginally compact 1JY4 (L e =21) also compresses its ∆t H,e from a bad value equal to 4.7 (Figure 3) to 2.4 (now), with ∆t H,e -0.1L e equal to 0.3. Therefore, all underlined samples of the items (b)-(c) above should be considered employing a double effective mechanism; and furthermore, 1C4E and 1JY4 reveal that the almost horizontal straight line for p H,e (9) can be predominant, but not exclusive in less dense forms.
1JY4 is alone with both ∆t i,e ≈2.0, which may happen due to some interactional and structural particularities [15,44]. Since 1JY4 is a precursor peptide (two-fold symmetric), arising from a 70-residue multistranded polypeptide; it also has hydrophobic interactions crossing its dimer interface, residues Y with I, and it contains one covalent disulfide bond stabilizing antiparallel β-strands in the boundary of two fourstranded β-sheets, as well as utilizing three non-standard residues DPR with ability to nucleate β-turns, at the positions 9, 17 and 27. The single mechanism of unmodified samples (Figure 2) was not detected in modified samples, because in those former ones, we had another greatness and a rougher metric (p i,j of 0-100%), in addition to another goal that was to quantify the sequence-structure relationship (5)- (9). In the subgroup I, only the double mechanism was identified. New inspections and outputs for additional modified samples (subgroup II) are shown next.
Thus far, 35-residue proteins in the template group and target subgroup I with a total of 116 structural inspections (Figures 2-3) were selected and analyzed. However, the linear estimation equations (2), (4)-(10) are applicable to other post-translationally modified proteins of N residues. For new sequence and structure surveys, we only alter the slopes m (multiplying them by 35) and normalize the numbers n i (dividing them by N) in (5)-(9) for 89 further modified proteins in the target subgroup II with 29 nonstandard amino acids that are not encoded by genetic codes. This subgroup contains other 35-residue proteins as well as all those non-redundant ones from 36 to 40 residues currently available in the PDB database.

Non-Standard Residues and Molecular Mechanisms in the Target Subgroup II
Five (ACE, NH2, NLE, PCA and SIN) among nine nonstandard amino acids are present in the modified proteins of the subgroups I and II. Other 24 non-standard amino acids [20] employed here occur exclusively in the subgroups II. For more details on the 33 non-standard amino acids and the proteins of the subgroups I and II, see Table 1 in Supplementary Data Appendices.
In the 89 modified proteins of the subgroup II, there are 7, 73 and 9 proteins with both, one and no secondary elements originating to 87 samples being formed by 71 helix and 16 strand samples. 71 samples with helices summing 142 ∆t i,h (Figure 5a) are firstly inspected. Furthermore, a careful analysis is made when a sample has one ∆t i,h ≳2.0 (Figure 5b) or both ∆t i,h ≳2.0 (Figure 5c).
Among 71 samples with helical structures, 49 of them have both points ∆t i,h ≲1.5 inside translucent rectangles (Figure 5a), and thus utilizing a double effective mechanism, in contrast to other 22 underlined samples (17 singleunderlined and 5 double-underlined ones, with one and both ∆t i,h ≳2.0, respectively). In order to more thoroughly investigate these 22 special cases, the biophysicochemical compositions ∆t i,h and the lengths L h through ∆t i,h -0.1L h (Figure 5b Figure 6c). For the hydrophobic residues in strands, we show the best predictions using both p H,e (8) and (9) in each estimated t H,e (10).
Among 16 strand samples, 11 of them have ∆t i,e ≲1.5 inside translucent rectangles (Figure 6a), the sample of number 14 with one ∆t i,e ≲1.5 and another ∆t i,e -0.1L e ≤1.0 (Figure 6a-b), and the number 5 with both ∆t i,e -0.1L e ≤1.0 ( Figure 6c) totalizing 13 samples make use of a double effective mechanism. Other three bold-faced numbers (6, 7, 15 with ∆t L,e -0.1L e >1.0 (Figure 6b)) work with a single mechanism by the hydrophobic selectivity.
In summary, all the 210 residue content predictions (Figures 3, 5-6) for 105 modified samples are successful from the five linear relations (5)-(9) inside the estimation equation (10), and indicate a twofold molecular mechanism by the sub-components (large and/or hydrophobic) of the residues. Specifically, we observe: (a) in 96 out of 105 samples, a double effective mechanism leading to 192 excellent, good and/or acceptable predictions; and (b) in 9 bold-faced samples (six with helices (Figure 5b-c), and three with strands (Figure 6b)), a single mechanism furnishes an excellent, good or acceptable and another bad output for 18 predictions. Dissimilarities ∆t i,j ≈2.0 or 3.0 (as long as ∆t i,j -0.1L j ≤1.0) should be tolerable as molecular fluctuations, whereas proteins are functional macromolecules of intrinsic dynamic nature, have non-ideal secondary arrangements, and can contain non-standard amino acids to aid punctual jobs, beyond other intra-and inter-molecular interactions influencing the steric and hydrophobic driving forces measured by the contents t i,j and dissimilarities ∆t i,j .
The sequence-based predictions and twofold molecular mechanism (items (a)-(b) above) further suggest that the non-standard amino acids do not significantly modify the secondary structure contents of the standard amino acids into specific native conformations. Non-standard amino acids at the residue level act in reasonable harmony with biological roles exerted by the 20 naturally occurring standard amino acids, whereas they have local working and limited global influence measured by the content comparisons ∆t i,j that remain oscillating inside a bearable threshold of ∆t i,j ≲1.5 or of ∆t i,j -0.1L j ≤1.0. The existence of a bad or malfunctioning mechanism by both large and hydrophobic sub-components could indicate a more pronounced global influence of the non-standard residues, but this mechanism is hypothetically possible to occur in other proteins not analyzed yet.  The current approach based on two coarse-grained (LS and HP residues) models is suitable in quantitative analyses and when atomic or molecular details can be suppressed or uninterested, but it is rather insufficient for sharper or local measurements, or very specific difference between nonstandard and standard residues, as frequently occur in other applications utilizing coarse-grained models [45][46][47][48]. However, detailed approaches (such as semi-empirical or atomic models) are also limited in many features, since they are very CPU demanding. Furthermore, the complexity of realistic proteins and unknown factors of molecular interactions and cellular environments that detailed approaches strategically use are not yet fully understood [49][50]. In addition our prognostication expressions (3), (5)- (10) are increasingly being used for other (un)modified proteins of different extensions and functions, with opportune and promising preliminary outputs [41][42]. This study therefore establishes the foundations and protocols of a future webserver software to determine quantitatively as happens the steric and hydrophobic selectivity and molecular mechanism in secondary structural motifs of a specific modified protein.

Conclusions and Future Developments
From 80 preliminary structural examinations in unmodified protein samples, the present article succeeded in to quantify the global influence of non-standard compounds in modified samples through 210 inspections in cylindrical helices and flat β-sheets. The unmodified samples reveal a direct and linear synchronization between the opportune use (p i,j ) with the objective availability (n i ) of the large and hydrophobic residues by five special linear relations (p i,j vs. n i in (5)-(9), Figure 2). In most cases both large and hydrophobic sub-components act concurrently by a double effective molecular mechanism. In some cases, one subcomponent is more efficient or selective than the other through a single mechanism, but no sample owns p i,j simultaneously very far from the linear fits for both subcomponents by means of a malfunctioning mechanism. The linear relations display the following: the simple and strategic rules employed for the transmission of orientation from primary to secondary levels, the interaction specificity in the protein structural organization, and the remarkable efficiency and combination of the residue biophysicochemical properties (volume and hydrophobicity) by folded configurations under varied conditions and contexts [51][52][53].
The post-translationally modified samples have suitable predictions for all the 210 compositions (as measured by t i,j and ∆t i,j , Figures 3, 5-6), confirm the double and single mechanisms already indicated by unmodified samples, and lead to a new prediction method. Such outputs reveal the balanced dependence between functional conformations with the residue sequences-as also shown by other structure prediction methods [54][55][56]-and establish the thorough contributions of the steric and hydrophobic selectivity. The successful predictions, besides the absence of a malfunctioning molecular mechanism, indicate that at a residue level the non-standard chemical compounds considered hitherto do not drastically alter the secondary structure contents suggested by the 20 standard amino acids. Consequently, such compounds work locally as additional, complementary and harmonic partners to the standard amino acids and immediately favor the complexity and robustness of the native-state conformations for skilled cellular and physiological services, as well as the molecular diversifications from the genome to the proteome.
Taken together, our exhaustive quantitative analyses for modified and unmodified proteins are striking due to the high conformational and functional complexities of these proteins, including their residue compositions, compactness degrees, structural classes, and biological duties. These analyses can provide a deeper understanding of steric constraints and hydrophobic/hydrophilic interactions in sequence-based analysis problems, including the protein folding mechanism [57][58], configurational flexibility and stability [59], and sequence design [60]. Here, we worked to make a systematic progressions and protocols of a new computational prediction method for basic insights and knowledge on molecular interactions and mechanisms, and quantification of the global influence by non-standard amino acids. In the future, we shall supply a user-friendly and freely accessible web server for our method, contributing and collaborating with other useful databases, specialized servers, and web resources [20,[61][62] for post-translationally modified proteins.

Supplementary Data Appendices
This work has surveyed independently two target subsets (subgroups I and II) of modified proteins with non-standard amino acids. The first previously examined in the subsection 3.3 comprises sixteen 35-residue proteins and residue sequences with at least one non-standard amino acid in 9 frameworks ( Table 1). The second inspected in the subsection 3.4 consists of 89 modified proteins with 35 to 40 residues and at least one non-standard amino acid among 29 nonstandard compounds (Table 1). Table 1 has columns with code and name of the 33 nonstandard amino acids in alphanumerical order, molecular structure and formula of these amino acids, a parent standard amino acid (if any), binary code for the volume and hydrophobicity, target subgroup (I and/or II), and PDB code for modified proteins, respectively.