Principal Component Analysis of Early Generation Drought Tolerant Tef Genotypes for Yield-contributing Traits

: The present investigation was carried out to determine the relationship and genetic variability among 49 tef inbred line using principal component analysis for drought prone areas. To improve tef productivity, farmers need high-yielding and drought tolerant tef cultivars. The objective of this research is to evaluate genetic diversity among drought tolerant tef inbred lines for yield, yield-contributing traits. In this study, Component I had the contribution from the traits viz. , days to heading, days to physiological maturity, plant height, panicle length, culm length, number of spikelets per panicle, number of primary panicle branches per main shoot, lodging index, above-ground biomass and harvest index which accounted 40% to the total variability. Grain filling period, number of total tillers per plant, number of fertile tillers per plant, days to mature, peduncle length, number of florets per spikelet and thousand-seed weight has contributed 14% to the total variability in component II. The remaining variability of 13%, 7% and 6% was consolidated in component III, component IV and component V by various traits like days to seedling above-ground harvest number of total and fertile tillers per plant. The cumulative variance of 79% of total variation among 18 characters was explained by the first five axes. Thus, the results of principal component analysis revealed, wide genetic variability exists in this drought tolerant tef inbred lines. Drought tolerant traits with high genetic variability are expected to provide high level of gene transfer during breeding programs.


Introduction
Tef [Eragrostis tef (Zucc.) Trotter] is endemic to Ethiopia and its domestication is estimated to have occurred between 4000 and 1000 BC [26]. Tef is also cultivated in very small quantities in Eritrea and recently in the USA, the Netherlands and Israel [2]. Ethiopia, where tef is the main cereal crop and food shortage is a recurring phenomenon, exerted an export ban on tef which increased interest in growing tef outside Ethiopia.
Currently, the crop is increasingly receiving global attention for its nutritional advantages because it is rich in nutrients and is gluten free. Consumers prefer tef due to its high protein, high mineral content and good quality "injera", a pancake-like soft bread [7] and the absence of gluten [24], which makes it an alternative food for people suffering from celiac disease. Due to this "life-style" nature of the crop, it has been heralded as a super food or super grain for human being [11,22]. It contains 11% protein, 80% complex carbohydrates and 3% fat [20].
Tef grows under a wide range of ecological conditions, from sea level up to 3000 meters above sea level (m.a.s.l). Tef has the genetic potential to yield up to 6 t ha -1 [23]. Despite its numerous relative advantages and economic importance, the productivity of tef in Ethiopia is low, amounting to 1.85 t ha -1 [6]. The major yield limiting factors to tef production are lack of cultivars tolerant to lodging and drought conditions [13], as well as small seed size. Yield losses are estimated to reach up to 40% during severe moisture stress [18].
To maximize selection benefits, breeding for drought tolerance requires the accumulation of additive genes, a controlled stress screening environment, and high throughput selection methods [4]. While early research in tef revealed significant genotypic differences in drought tolerance in relation to root depth and osmotic adjustment [17], information on drought tolerance based on grain yield is scarce. Drought, salinity, and heat, as well as climate change, have a significant impact on crop production and food security. It has been proposed that future studies should concentrate on improving resistance or tolerance to these environmental calamities [28].
The central idea of principal component analysis (PCA) is to reduce the dimensionality of a data set consisting of a large number of interrelated variables, while retaining as much as possible of the variation present in the data set. This is achieved by transforming to a new set of variables, the principal components (PCs), which are uncorrelated, and which are ordered so that the first few retain most of the variation present in all of the original variables [12]. Principal component analysis is an important breeding tool commonly used by breeders to identify traits that could be used to discriminate crop genotypes [27]. Establishing suitable selection criteria for identifying genotypes with desirable traits is useful in developing improved varieties. In order to plan an effective breeding program, it is critical to analyze trait variability and understand the relationships between traits that contribute to yield [16].
Principal component analysis (PCA) analyzes a data table representing observations described by several dependent variables, which are, in general, inter-correlated. Its goal is to extract the important information from the data table and to express this information as a set of new orthogonal variables called principal components [1,21] found that four PCs explained 80% of the variation of 13 traits for tef landraces grown under greenhouse conditions. In their studies, grain color, days to maturity, the number of panicles and number of internodes per plant, second culm internode diameter, plant height, shoot biomass, grain yield, and harvest index were well correlated with PC1, which explained 40% of the variation. [15] reported that five PCs explained 71% of the variation of 17 quantitative traits found in 320 tef lines and 35 landraces, evaluated at two locations in central Ethiopia. These authors found that PC1 was correlated with the number of productive tillers per plant, grain yield and the harvest index.
The principal components is used to interpreted based on finding which variables are most strongly correlated with each component. According to [8] Eigenvalues greater than one were only for the first three PCs, which together explained 75% of the observed variation. [21] observed four principal components (PCs), having eigenvalues between 5.16 and 1.12. According to [9] report the first three principal components (PCs) with eigenvalue greater than one contributed for 78.3% of the entire phenotypic variation observed among the 36 tef genotypes. The study is aimed to determine variation of drought tolerant tef genotypes to identify and classify variation for grouping the inbred lines by taking into account several characteristics and relationship between them with the help of principal component analysis  [3]. The region experiences poor rainfall distribution (500 mm to 750 mm), coupled with relatively high temperature (15°C to 30°C), which makes the area vulnerable to moisture stress.

Experimental Sites, Designs and Experimental Materials
For this study, 49 genotypes, including 42 drought tolerant advanced lines (Dtt), four parents of the advanced lines, two varieties and a local check were used for this study. Dtt 2 (drought tolerant tef 2) and Dtt 13 (drought tolerant tef 13) were obtained from ethylmethane sulfonate mutagenized populations of Tsedey using the targeted induced local lesions IN genomes (TILLING) method at the Institute of Plant Sciences of the University of Bern through the Tef Improvement Project supported by the Syngenta Foundation for Sustainable Agriculture. These lines are depicted excellent performance under moisture scarcity. The unique morphological difference between the Dtt and the original parental tef line Tseday (DZ-Cr-37) is the size and number of stomata. The stomata at the adaxial or upper side of the two Dtt lines are smaller both in size and number compared to the original parental tef line [5].
Debre Zeit Agricultural Research Center provided seeds for all genotypes. The experiment was laid out in a 7 x 7 simple lattice design. Each experimental plot was 1 m 2 (1 m x 1 m) and consisted of five rows spaced 20 cm apart. The distances between both incomplete blocks and plots within incomplete blocks were 1 m, and that between replications was 1.5 m. Seeds were sown at the recommended rate of 15 kg ha -1 , amounting to1.5 g of seed per plot per row. The recommended full doses of blended fertilizer urea (21.74 kg) and NPS (158 kg) per hectare were applied at both locations.

Data Collection and Analysis
Data were recorded for days to 50% seedling emergence, days to 50% heading, days to 90% physiological maturity, grain filling period, plant height, panicle and, peduncle length, culm length, number of spikelets per panicle, number of primary panicle branches per main shoot, number of florets per spikelet, number of total tillers per plant, number of fertile tillers per plant, lodging index (%), total aboveground biomass, total grain yield, harvest index (%), thousand grain weight. The principal component variables are defined as linear combinations of the original variables … , … , . The extracted eigenvectors table provides coefficients for Equations [19].

= 1 1 + 2 2 + ⋯ +
Where: = the k th principal component ;and ′ = the coefficients [10] Suggested standard criteria that allow to overlook components whose variance explained is less than one when a correlation matrix is used for determining number of PCs should be investigated was employed. It also indicates that data were standardized to mean zero and variance of one before computing principal component analysis. The principal component based on correlation matrix, was calculated using MINITAB software.

General Principal Component Analysis (PCA)
The principal components analysis revealed that five principal components with Eigen-values greater than unity accounted for 79 percent of the gross variability in 18 phenomorhic characters (Table 1). [10] suggested that standard criteria permit to ignore components whose Eigen values are less than 1 when a correlation matrix is used. Similarly [14,15], reported that about 71-79 percent of the variation in 320 tef germplasm lines was explained by five PCs. Likewise, [25] reported that 76 percent of the total variation among 49 tef varieties evaluated for 23 traits was explained by six PCs. The cumulative variance of 79 percent (Table 1) by the first five axes with Eigen value of > 1.0 indicates that the identified traits within the axes exhibited great influence on the phenotype of germplasm linens.

Scree Plot of Principal Component Analysis (PCA)
The Scree plot of the PCA (Figure 1) shows that the first five eigenvalues correspond to the whole percentage of the variance in the dataset. The first principal component alone explained 40 percent of the total variation, while PC2, PC3, PC4 and PC5 in that order accounted for 14, 13, 7, and 6 percent of the gross observed variation among the test tef genotypes. The first three PCs, together accounted for a cumulative of 67 percent of the total variation indicting that much of the variability among the test genotypes originated from the traits included in these PCs.

Principal Component Analysis for Yield Contributes Traits
Among the 18 traits studied, 10 of them had high contribution effect to the first PC, and these traits included days to heading, days to physiological maturity, plant height, panicle length, culm length, number of spikelets per panicle, number of primary panicle branches per main shoot, lodging index, above-ground biomass and harvest index. The second component predominantly illustrates variation in grain filling period, number of total tillers per plant, number of fertile tillers per plant, days to mature, peduncle length, number of florets per spikelet and thousand-seed weight. The third principal component was chiefly accounted by variation in days to seedling emergence, culm length, peduncle length, lodging index, above-ground biomass yield, grain yield, harvest index, number of total and fertile tillers per plant. The fourth principal component indicated with high variation in grain filling period, fertile tiller per plant, total tiller per plant, days to seedling emergence, lodging index, grain yield, harvest index and thousand-seed weight. The fifth principal component that accounted for about 6 percent of the total variation was due mainly to high variation in days to seedling emergence, number of spikelets per panicle, number of primary panicle branches per main shoot and number of florets per spikelet (Table 1).

Correlation Among Yield Contributing Traits
The applied method of PCA made it possible to fully assess the relations among tef traits which are used for the analysis of observed diversity in regard to different traits. The most prominent relations shown in Figure 2 are: a strong positive association between GY and SBM; between PL and PH; between GY and PL; among NSPP, CL, PH and PL; among NFTPP and NTTP; as indicated by the small obtuse angles between their vectors (r=cos0=+1). There was a near zero correlation between HI and NFPS, between LI and NFPS, between HI and DTE ( Figure 2) as indicated by the near perpendicular vectors (r=cos90=0). There was a negative correlation between LI and GY, and between DFTPPE and NFPS, and NTTP and NFPS ( Figure 2) as indicated by the angle of approximately 180 degrees (r=cos180= -1).    Table 1.

Conclusion
We concluded that significant diversity existed among drought tolerant tef genotypes for the traits studied. Five principal components with Eigen-values greater than unity accounted for 79 percent of the gross variability observed for 18 traits across 49 tef genotypes. The first principal component alone explained 40 percent of the total variation, while PC2, PC3, PC4 and PC5 in that order accounted for 14, 13, 7, and 6 percent of the gross observed variation among the test drought tolerance tef genotypes. The PCAs and factor analysis are statistical techniques that are useful for the description of the relations that occur among drought tolerant tef genotypes characteristics. The obtained non-correlated traits may be used for further analysis, where the assumption of having no co-linearity problem of variables is needed.