Density Functional Theory (B3LYP/6-31G*) Study of Toxicity of Polychlorinated Dibenzofurans

Quantitative Structure Toxicity Relationship (QSTR) study was applied to a dataset of 35 polychlorinated dibenzofurans (PCDFs) to investigate the relationship between toxicities of the compounds and their structures by employing Density Functional Theory (DFT) (B3LYP/6-31G*) method to compute their quantum molecular descriptors. The model was built using Genetic Function Algorithm (GFA) approach. The model (N= 24, Friedman LOF = 0.361, squared correlation coefficient (R 2 ) = 0.963, R 2 adj = 0.955, cross-validation correlation coefficient (Q 2 ) = 0.889, external prediction ability (R 2 pred) = 0.8286, P-value of optimization at P95% ˂ 0.05) of the best statistical significance was selected. The accuracy of the model was evaluated through Leave one out (LOOV) cross-validation, external validation using test set molecules, Y-randomization and applicability domain techniques. The results of the present study are expected to be useful to the environmental regulatory agencies locally and internationally in the area of environmental risk assessment of toxicity of Polychlorinated dibenzofurans (PCDFs) and other related Polychlorinated aromatic compounds/ pollutants that fall within the model’s applicability domain.


Introduction
One of the important aspect of modern toxicology research is the prediction of toxicity of environmental pollutants from their molecular structure in which a quantitative risk assessment becomes increasingly important in the modern society and is slowly incorporated into legislation of different countries. For instance, the European Union (EU) has introduced the Registration, Evaluation and Authorization of Chemicals (REACH) program for assessment of human and environmental risk of all chemicals that are produced or imported in the amount greater than 1 ton per year.
One of the important aspect of modern toxicology research is the prediction of toxicity of environmental pollutants from their molecular structure. The potential toxicity of compounds could be assessed on the basis of a wide variety of physicochemical and biological properties [1]. These physicochemical and biological properties of molecules constitute their molecular descriptors.
Polychlorinated dibenzofurans (PCDFs) are polychlorinated aromatic compounds that represent a group of environmental contaminants known by their ubiquitous distribution, resistance to biological and chemical degradation, high toxicity and bioaccumulation [2]. They can have a significant impact on the health and well-being of human and animals [2]. Some of the health effect at long exposure to these compounds include liver enlargement and lesions, immunotoxicity, a wasting syndrome, spleen atrophy, carcinogenesis, endocrine disruption, and extreme cases, death [3]. In addition to these, several persistent organic pollutants (e.g. PCDFs) are suspected to contribute to the increasing prevalence and risk of type 2 diabetes [4].
Polychlorinated dibenzofurans (PCDFs) are mainly formed or produced from as byproducts of various industrial processes and incomplete combustion of wastes such as medical or municipal wastes incineration, including burning of many materials that contain chlorine and polychlorinated chemicals [5][6].
Quantitative structure activities relationship (QSAR/QSTR) as an important area of chemo metrics has been the subject of a series of investigations [7]. In order to ensure a safer environments and quicker estimation of the environmental behaviors of PCDFs, quantitative structuretoxicity relationship (QSTR) models, which correlate and predict toxicity data of compounds from their molecular structural descriptors have been developed over the years, providing valuable approach in research into the toxicity of compounds without necessarily embarking on the conventional laborious, time consuming and expensive experiments. QSTR has been widely applied to evaluate and predict toxicity of chemicals [8]. Previous studies have shown that reliable QSTR models are not only applied to predict toxicity and provide basic data to risk assessment, but also used to explain the toxicity mechanisms [9].
The alternative hypothesis to this study includes: The magnitude of the observed toxicity log (1/EC 50 ) of Polychlorinated dibenzofurans (PCDFs) are direct function of the empirical property (ies) or the theoretical parameter(s) which makes the descriptor of the total chemical structure of the compounds under investigation.
The null hypothesis to this research includes; The observed toxicity log (1/EC 50 ) of Polychlorinated dibenzofurans (PCDFs) is independent of the descriptors of their total chemical structures.
The aim of this study is to build a robust, reliable and rational Genetic Function Algorithm approximation (GFA) based QSTR models to predict the toxicity of Polychlorinated dibenzofurans (PCDFs) by exploring the correlations between the experimental log (1/EC 50 ) of the compounds and their calculated molecular descriptors. It is expected that the information in this study would provide a fast, economical, more environmental friendly approach and less time consuming techniques of accessing the toxicity of Polychlorinated dibenzofurans (PCDFs) and other related toxic Polychlorinated aromatic compounds and Organic pollutants that could endanger our environment.

Materials and Methods
The materials used in this study include; Dell ® computer system (Intel Pentium), 4.80 GHz processor, 8GB RAM size on Microsoft windows 7 Ultimate operating system, Spartan

Data Collection
A data set of Polychlorinated dibenzofurans (35 PCDFs) used for the QSTR analysis was selected from the literature [10]. The Chemical structures and experimental log (1/EC 50 ) values for studied compounds are represented in Table 1.

Molecular Optimization and Descriptors Calculation
Optimization is the process of finding the equilibrium or lowest energy geometry of molecules. The chemical structure of each compound was drawn with ChemDraw ultra version [11] 12.02 module of the program and subsequently imported into Wave function program Spartan ''14'' version [12] 1.2.2 for structural minimization. The geometries of all the compounds (35 PCDFs) were optimized by means of Density functional theory (DFT) using B 3 LYP level of theory and 6-31G* as the basis set. The molecular descriptors were calculated by using paDel descriptor tool kit and Spartan "14" software. The most significant descriptors were identified using the Genetic Function Approximation (GFA) algorithm. Molecular descriptors simply refer to arithmetical values that describe properties of molecules obtained from a well-defined algorithm or experimental procedure [13]. The various 0D, 1D, 2D and 3D descriptors were calculated.

Data Set Division into Training and Test Set
The training set comprises of molecules used in model development while the test set is made up of molecules not used in building the model that are used in the external validation of the model i.e. evaluation of its prediction abilities. Dataset Division GUI v 1.2 software was used to divide the data set of the studied compounds into a training set of 24 PCDFs (70%) and a prediction set (test set) of 11 PCDFs (30%) respectively.

Genetic Function Algorithm and Model Building
In this study, a statistical technique of analysis by Genetic function approximation algorithm was employed to build the models. Genetic function approximation (GFA) algorithm is a search method to find exact or approximation solution to optimization and search problems which is based on the principles of Darwinian evolution [14].
A peculiar features of Genetic function approximation (GFA) algorithm is that it generate a population of equations rather than a single equation as do most other statistical methods. The range of variations in this population gives added information on the quality of fit and importance of the descriptors [15]. The fitness function or Lack of Fit (LOF) used to estimate the quality of the model here was the leave one out gross validated correlation coefficient (Q 2 LOO ) and is calculated by this equation * Where c is the number of basic functions, d is the smoothing parameter, M is the number of samples in the training set, LSE is the least square error and P is the number of features contained in all basis functions [16].

Validation of Developed Model
The predictive ability of the developed QSTR model were evaluated using both internal and external statistical validation parameters. The validation parameters were compare with the minimum recommended value for a generally acceptable QSAR/QSTR model proposed by Revinchandran et al [17] shown in Table 2.

APPLICABILITY DOMAIN (AD)
The model was further validated by applying the Williams plot, the plot of the standardized residuals versus the leverage as shown in Fig. 2. This was exploited to visualize the applicability domain (AD) [18]. (Leverage indicates a compound's distance from the centroid of X. The leverage of a compound in the original space is defined as; (2) where is the descriptor vector of the considered compound and X is the descriptor matrix derived from the training set descriptor values.
The warning leverage (h*) is defined as: Where n = number of training compounds, p = number of predictor variables  Based on statistical significance, model 1 is chosen as the best model.

Discussions
Model 1 gives the best GFA derived QSTR model for predict the p1/EC 50 of PCDFs. The result of the GFA QSTR model is in conformity with the standard shown in Table 2 as N = 24, Friedman LOF = 0.361, R 2 = 0.963, R 2 adj. = 0.955 R 2 cv = 0.889, R 2 pred = 0.8286, P 95% ˂ 0.05. This confirms the robustness of the model. Figure 2 reveal the agreement between the experimental and the predicted values of p1/EC 50 of molecules in the test set. The high Linearity of this plot indicate a sound agreement between the experimental and predicted values indicative of the high internal accuracy of the model. Likewise, Figure 3 gives a combine plot of the experimental and the predicted values of p1/EC 50 training and test set molecules. The high linearity of the plot is indicative of an excellent external predictive power of the model. The comparison of experimental and predicted p1/EC 50 of the compounds is presented in Table 8. The predictability of model 1 is evidenced by the low residual values observed in the Table. The P-value of the optimization model at 95% confidence level shown has α value ˂ 0.05. This reveals that the alternative hypothesis that the magnitude of the observed toxicity of PCDFs is a direct function of the descriptors of their total chemical structures takes preference over the null hypothesis which states otherwise.
The statistical significance of the relationship between the toxicity of PCDFs and their molecular descriptors was further demonstrated by Y-randomization procedure. The results of Y-randomization test as well as the random models parameters are shown in Tables 9 and 10 respectively. The low R 2 and Q 2 values obtained shows that the optimization model is robust and was not obtained due to a chance correlation. The fact that the value of cR 2 p of the model is > 0.5 as reported in the Table 10 is a good confirmation that the model is robust and very reliable. 18 Since the model 1 cannot predict the toxicity of all compounds in the universe, its applicability domain was determined using William's plot shown in Fig. 4. All the compound in the test set fall inside the domain of the GFA model (the warning leverage h* =0.40). There are only two compounds in the training set which have the leverage higher than the warning h* value as shown in the plot, thus they can be regarded as structural outliers. This implies that the models can be successfully applied to this series of Polychlorinated dibenzofurans. The few compounds with higher leverage than h* are most likely to be structural outliers.

Significance of the Descriptors in the Model 1
The positive coefficient of the descriptors; MATS2e, SpDiam_Dzv, SHBa reveal that the toxicity of PCDFs increases with increase in the values of these descriptors. Thus, the higher the values of these descriptors in a PCDFs, the more the toxicity of the molecule and vice versa. Also, the negative coefficient of XLogP descriptor as an indication that the value of this descriptor in PCDFs varies inversely with its toxicity. The percentage contribution of each descriptor in the model include; 33.50% (MATS2e), 15 [19] in which the observed toxicity of aromatic nitro-derivatives was influence by a descriptor of molecular electronegativity, X1. SpDiam_Dzv (Spectral diameter from Barysz matrix weighted by van der Waals volumes) is a descriptor of molecular size. Its positive coefficient in the model reveals that the toxicity of PCDFs varies directly with the value of this descriptor in the molecule. This is agreement with the findings Falandysz et al. (2001) [20] in which vander waal's volume (size descriptor) of dioxins has a pronounce influence on the observed toxicity of the molecules. Also in agreement is the results of the QSTR modelling by Hassan et al. (2016) [21] in which ETA-dAlpha-B (a measure of electronic features of the molecules relative to molecular size) was found to influence the toxicity of the studied dioxins. The increase in toxicity with increase in molecular size may be due to the possibility of the molecule been largely confined to the plasma compartment because of their too large size affecting its distribution via out the body.
XLogP is a descriptor of lipophilicity of molecules. Its negative coefficient in the model is an indication that the value of this descriptor varies inversely with the toxicity of the molecules and vice versa.
SHBa (Sum of E-States for (strong) hydrogen bond acceptors), just as the name implies is a descriptor of hydrogen bond acceptor ability of a molecule. Its positive coefficient in the model reveals that the toxicity of PCDFs varies directly with the value of this descriptor in the molecule. This is in agreement with the findings of Lipinski et al. (2001) [22] and van de et al. (2003) [23]. The increase in toxicity of PCDFs with increase in values of hydrogen bond acceptor descriptors may be due to the possibility of this descriptor eliciting some interaction of the toxic molecules with biological macromolecules such as enzymes or cellular receptors.

Conclusion
In this study, a model to predict toxicity of Polychlorinated dibenzofurans (35 PCDFs) in exploring the structural features (descriptors) that are responsible for its toxicity was successfully computed using Genetic Function Algorithm (GFA) approximation approach at B 3 LYP level of theory and 6-31G* as basis set. The observed log (1/EC 50 ) of the Polychlorinated dibenzofurans (PCDFs) was found to be predominantly influenced by MATS2e, SpDiam_Dzv, SHBa and XLogP descriptors. The robustness, reliability, stability and applicability of the model was established by internal and external validation techniques (N= 24, Friedman LOF = 0.361, squared correlation coefficient (R 2 ) = 0.963, R 2 adj = 0.955, cross-validation correlation coefficient (Q 2 ) = 0.889, external prediction ability (R 2 pred) = 0.8286, P-value of optimization at P 95% ˂ 0.05).
It is believed that the information in this model would be useful to the environmental regulatory agencies locally and internationally in the area of environmental risk assessment of toxicity of Polychlorinated dibenzofurans (PCDFs) and other related Polychlorinated aromatic toxic compounds/ pollutants which are being frequently released into our environment as a result of increasing industrial activities and incomplete combustion of various processes such as medical and domestic wastes incineration.