Discover the Dehydration Response Genes in Boea hygrometrica Transcriptome Using Bayesian Network Approach

“Drying without dying” is an amazing feature in land plant evolution. Boea hygrometrica is an important resurrection plant model. The current genome and transcriptome analysis have revealed that some biological processes may contribute to its dehydration tolerance, but genes play pivotal roles in the dehydration response remains unclear. Bayesian network approach is a powerful tool for transcriptome data analysis and biological network reconstruction. In this work, by using the Bayesian network approach, we first reconstruct a gene regulation network with the B. hygrometrica transcriptome data. The network contains 1292 genes. Next, we defined the hub node genes in the network and focus on their functions in order to understand the response B. hygrometrica carried out under the dehydration stress. Finally, by an association analysis, we deduce the function of the unknown gene Bhs126_021 which has a degree of 84 in the network. The data-driven strategy we applied in this work not only finds out the knowledge from the knowledge-driven strategy analysis, but also provides novel findings from the B. hygrometrica transcriptome. Our findings give insight of control genes in land plant under the dehydration stress. The data-driven strategy applied in this work can also efficiently analyze other similar transcriptome data sets.


Introduction
Boea hygrometrica is a homiochlorophyllous dicot in Gesneriaceae that grows in rocky areas throughout most of China [1]. It is an important plant model for understanding responses to dehydration. In 2015, the draft genome of B. hygrometrica was sequenced. The genome size of B. hygrometrica is about 1.69 Gb. The genome encodes 23,250 genes. The dehydration-induced alteration in gene expression experiments discovered 9,888 differentially expressed genes (DEGs) [2]. Knowledge-based analysis of its transcriptome revealed three major clusters of genes involved in dehydration stress response. Cluster 1 primarily associated with photosynthesis. Cluster 2 was mainly of ABA metabolism and signaling, late embryogenesis abundant proteins (LEAs) and components of ROS protection and detoxification pathways. Cluster 3 primarily encoded proteins for nucleic acid metabolism. However, knowledge-based analysis did not find the genes played key roles in B. hygrometrica under dehydration stress. The key controlling genes for the dehydration tolerance in B. hygrometrica remains unknown.
Bayesian network approach [3] is a promising tool for transcriptome data analysis [4][5][6][7] and biological network reconstruction [8][9][10][11][12]. Bayesian network approach is a kind of data-driven analysis method. It is independent of the known knowledge and could mine the novel knowledge merely based on the dataset itself.
To investigate the mechanisms of the dehydration tolerance in B. hygrometrica, in this work we reconstructed the B. hygrometrica gene regulatory network using Bayesian network approach, and discover pivotal control genes in B. hygrometrica against dehydration stress. The pipeline used in this work can also be converted to analyze other transcriptome data.
In this work, we extracted the data of the 1292 two-fold differentially expressed genes to reconstruct the Bayesian network.

Reconstruction of B. hygrometrica Gene regulatory Network Using Bayesian Network Approach
Bayesian network is a probabilistic graphical model that represents a set of random variables and their conditional dependencies via a directed acyclic graph (DAG). It consists of two components: the first component is a directed acyclic graph; and the second component is a set of parameters that quantify the network [9]. A Bayesian network is defined as: In this study, we used the R package bnlearn (http://cran.r-project.org/) to learn the Bayesian network structure.

Visualization of the Gene Regulatory Network
We use Cytoscape 2.8.3 (http://www.cytoscape.org/) software to visualize the gene regulatory network [13].

Gene Regulatory Network of B. hygrometrica
First, we reconstructed the B. hygrometrica gene regulatory network using Bayesian network approach. The gene regulatory network includes 1292 nodes and 8969 edges ( Figure 1). The distribution of the node degree showed that only 114 nodes (less than 10%) have a degree greater than 25 ( Figure 2). The distribution follows the power law. The result suggested that B. hygrometrica gene regulatory network is a scale-free network. The scale-free networks are remarkably resistant to accidental failures [14]. Since the hub nodes (i.e. nodes having high degree of connectivity) in a scale-free network dominate the overall connectivity of the network, these hub nodes play important roles for maintaining the stability of the network [15].

Dehydration Response Genes in B. hygrometrica
After reconstructing the gene regulatory network, we focused our analysis on the hub nodes in B. hygrometrica, and expected to find out the key controlling genes involved in the dehydration tolerance in B. hygrometrica.
In this work, we defined the hub node genes as the genes with the degree equal to or greater than 40. We found 58 hub nodes in the B. hygrometrica gene regulatory network. Table 1 shows the 18 hub nodes with degree above 60. Hub nodes play pivotal role in a network. Therefore, these hub node genes are of key genes involved in the dehydration response in B.

hygrometrica.
Previous study has shown that three clusters of genes involved in dehydration response [2]. The first cluster is associated with photosynthesis. In this work, by using the Bayesian network approach, we independently discovery that the hub node gene Bhs3_009 with a degree of 79 associates with photosynthesis ( Table 1). The second cluster is mainly of ABA metabolism and signaling. In the gene regulatory network, we discovered two hub node genes Bhs6354_003 (52) and Bhs109_092 (43) (Numbers after each gene is the degree of the gene in the network, the same here after) participate in the abscisic acid (ABA) mediated signaling pathway ( Table 1). The third cluster encoded proteins for nucleic acid metabolism. In the gene regulatory network we discovered four genes involved in nucleic acid metabolism, including genes Bhs3_009 (79), Bhs211_042 (50), Bhs3_092 (43) and Bhs6835_002 (42) ( Table 1). Obviously, by adopting the data-driven strategy, the Bayesian network not only verified the previous three gene clusters, but also discovered the key controlling genes in B. hygrometrica regarding to the dehydration response.

The Possible Function of the Gene Bhs126_021
The gene Bhs126_021 has the highest degree of 84 in the gene regulatory network suggesting that it may be the most important gene in dehydration response in B. hygrometrica (Table 1). Unfortunately, the function of Bhs126_021 is unknown. We attempted to uncover its function via an association analysis. Since Bhs126_021 regulated seven genes in the regulatory network (Table 2), we investigated the functions of the seven genes, and found that these genes involve in two of three clusters related to dehydration response, including photosynthesis and RNA metabolic process. The result suggested that Bhs126_021 truly involved in dehydration response.