Use of Alpha-Designs in Oil Palm Breeding Trials

In Oil Palm Breeding trials the plots have palms at vertices of equilateral triangles with side length of 9 m. The plots consist of 6x6 = 36 palms, hence a plot is a rectangle of 46.8 x 54m. The number of tested varieties is 20 – 40, the experimental design needed is an incomplete block design, with usually 3 replications; the alpha-designs can give a connected incomplete block design. Current Oil Palm planting materials are DxP hybrid based on crossing selected dura palms (female parents) with pisifera palms (male parents) to produce tenera palms with thin shelled fruits. The crossing scheme of A dura and B pisifera is an incomplete diallel if he number of crossings C is smaller than A*B. To make a connected crossing scheme the alpha-design can be used. In the analysis of an oil palm breeding trial an additive model of the dura and pisifera effects is applied to estimate the general combining ability of the parents after removing the fixed replication effect and the random blocks within the replication effects. The analysis can be done with the package SAS or IBM SPSS Statistics with program Mixed; further with R and the R package lme4.


Variety Trials of Field Crops
In 1966 Poland wished to reorganize the variety testing of field crops. The director of the in 1966 new founded Polish "Centre for Research on Varieties of Agricultural Crops (COBORU = Centralny Osrodek Badania Odmian Roslin Uprawnych)" in Słupia Wielka, Dr. Eugeniusz Bilski, invited J. Hogen Esch, the deputy director of the Netherlands State "Institute for Research on Varieties of Field Crops" in Wageningen (who was a specialist on potato varieties) and the head of the Statistical Department of that Institute, Rob Verdooren, to advise COBORU with the set-up of variety testing and the design of variety trials to cope with the climatic regions in Poland. Early July 1967 we both went to Słupia Wielka. The statistical advisor of COBORU, the Statistical Lecturer of Academia Rolnicza (Agricultural University) in Poznań, Dr. Tadeusz Caliński visited with me some variety trials in the surrounding area. Meanwhile we were discussing the set up of variety testing. My advice to COBORU was to apply Incomplete Block designs in the variety trials, because the number of varieties was too large to lay down in homogeneous complete blocks, and further to use not more than three replications per experimental site but instead to go for more experimental sites. This was in contrast to the practice of variety-trial designs of the "Bundes-Sortenamt" of West Germany (the West German state variety trial organization). The combination of the results of variety trials in climate zones of Poland turned also out to be large incomplete two-way layouts as well.
But variety trials in incomplete two-way layouts were at that time difficult to analyze by hand. The Best Linear Unbiased Estimators (BLUE) for varietal contrasts were then very cumbersome to calculate, by using matrix inversion for the solution of the normal equations. There were no mainframe computers available. But in The Netherlands at the Agricultural University of Wageningen we had done this BLUE analysis already by hand with an iterative method using mechanical electrical calculators. Back in 1948 W. L. Stevens [14] published an iterative solution to get the BLUE for incomplete two-way layouts, but he did not prove that this iterative procedure was converging and always gave a solution. The Mathematics Professor Dr. Nico H. Kuiper of the Agricultural University of Wageningen produced evidence in 1952 for the incomplete two-way layouts by the projection of vectors [7]. His first staff member, Leo Corsten, obtained his PhD degree of the Agricultural University of Wageningen in 1958, with supervisor Prof. Dr. N. H. Kuiper, in which he elaborated the iterative method also for incomplete three-way classification designs [4]. In the iterative Kuiper-Corsten method for the analysis of incomplete blocks, however, there was a missing link, namely for the calculation of the standard error of varietal contrasts.
During my visit in Poland I gave a lecture "The use of incomplete block designs in agricultural research and its analysis" at the Statistical Department of the Agricultural University (Academia Rolnicza) in Poznań. I demonstrated how to find the BLUE by hand for a small variety-trial example with the iterative Kuiper-Corsten method and mentioned that in the analysis of incomplete block designs with the iterative procedure, there was a missing link namely for the calculation of the standard error of varietal contrasts. These alpha-designs are very useful for Variety Testing because they make it easier to find designs for a large number of varieties and different (even small) sizes of incomplete blocks. COBORU adopted these designs immediately.
In variety testing we have the problem that we often have a large number C of varieties to be compared in just 3 or 4 replications. A Randomized Complete Block Design, with block size k = C, can often not be used if C > 10. To take care of the heterogeneous growing conditions in an experimental field one can use a Randomized Incomplete Block Design. With a smaller block size k < C we can find homogeneous parts of the experimental field; in that case an incomplete block design is used. The well known incomplete block designs, such as balanced incomplete block designs often require too much replications. Initially, one used the incomplete block designs such as lattices (for the case C = k×k) and rectangular lattices (for the case C = k×(k +1)); see [3]. Often the number of tested progenies C does not fit into lattices or rectangular lattices. An extension of the incomplete block designs is given in the studies [9,10]. They introduced for binary connected incomplete block designs the so-called alpha-designs. They start with a rectangular array with column lengths of k (the size of the incomplete blocks) of the sequence of 1, …, C varieties and shift the columns according to an array. For many combinations of progenies C and block sizes k they give a procedure to construct alpha-designs. (The name of alpha comes from the first letter of array in the Greek alphabet). There is now a computer program CycDesigN [5] available to generate incomplete block designs as alpha-designs and cyclic designs.
In variety testing trials one wants to use resolvable incomplete block designs where the design can be divided into r groups (= replications) such that each group contains each of the C crosses exactly once. The resolvable incomplete block designs, and particularly the so-called generalized lattice (GL) or alpha-designs, have become most suitable for crop variety trials [11,13]. The program CyCDesigN can give such resolvable incomplete block designs. All these above-mentioned designs are connected. In a connected incomplete block design one can estimate all differences between the varieties. Now with the Personal Computers and statistical packages as SAS, IBM-SPSS Statistics or R, the Kuiper-Corsten iterative method is not needed any more and we solve the varietal effects with the Least Squares Method using the matrix solution of the normal equations with b = (X'X) -1 X'y and the variance of a varietal contrasty p'b is found as σ 2 p'(X'X) -1 p.

Oil Palm Breeding Trials
In July 1987 I was sent by Harrisons Fleming Advisory Services (HFAS) for two weeks to the Oil Palm Research Station (OPRS) at Dami, on the island of New Britain north of Papua New Guinea. There was a breeding trial with oil palm (Elaeis guineensis Jacquin) done with 40 DxP hybrids, crosses of selected dura palms, as female parents, with pisifera palms, as male parents, to produce tenera palms with thin-shelled fruits. Statistical analysis of yield and its component of oil did not show significant differences between the crosses. The experimental design was a randomized complete block design with three replications and 6 x 6 palms per plot. The palms were planted at the corners of equilateral triangles with side lengths of 9 m. The distance between the rows is then equal to the height of this triangle = 9×sin(π/3) = 9×(1/2)×√3 = 7.80 m. Hence the length of a plot is 6×7.80 = 46.80 m and the width is 6×9 = 54 m. For the yielding capacity of a cross the yield of the 4×4 = 16 inner palms is recorded.  With the English agronomist I walked about two kilometers over the trial to the end of a replication and back again through the other replications over the hilly land. After the visit of the trial I asked him what his impression was about the experimental field. He said the plots were well maintained. I agreed with him, but I explained to him that in the experimental design a block is supposed to have homogeneous growing conditions. In England with their centuries of agricultural history on flat land, a homogeneous part along the ditch may have uniform growing conditions suitable for a block in the field trial. Another parallel part of the land along this first block has also the same growing circumstances but with a different water level. Therefore in English books of Design of Experiments there were always figures with rectangular blocks. Most of the agronomists in tropical countries were taught about Experimental Designs with those English books, and applied, therefore, rectangular blocks even on freshly cleared forests after removing the tall trees by excavating or dynamiting the large trees. Neighboring plots could have different soil structure. In the oil palm trial at Dami the plots on hill tops will have different growing conditions than those on the slopes and in the valleys. These varying conditions preclude the required level of chance of finding significant difference between the crosses. The trial has therefore to be laid down with incomplete blocks.
Back in the office, I made a post-mortem analysis by using incomplete blocks of 8 adjacent plots, hence a replication consists of 5 incomplete blocks. I had my own FORTRAN program with me to analyze incomplete two-way classifications with the Kuiper-Corsten iteration method which could be run at the PC in the office. In the rows of the two-way classification system were the 40 DxP crosses and in the columns the 3*5 = 15 incomplete blocks. Now the analysis showed statistically highly significant differences between the crosses.
The HFAS company had send a colleague of me to set up a data base of all the oil palm crossings at the OPRS in Dami and further me to solve the problems of the breeding trials with oil palm. They assumed that we would require each two weeks to complete our assignment. My colleague needed these two weeks, but I had finished my report and explaining lecture to the staff about the main goals of the breeding trials after one week. Because I had nothing else to do I asked the OPRS oil palm breeder about the main goal of the breeding program. He confirmed that this was to find the best dura x pisifera combinations to maximize oil yields of the resulting tenera palms. I solved this problem in the second week of my stay.
First of all I would like to give some background on oil palm breeding. The main economic product is palm oil extracted from the mesocarp, i.e. the fruit flesh surrounding the pit (or stone) of the oil palm fruit. The shell thickness is therefore an important characteristic as this determines the proportion of the fruit available for the oil bearing mesocarp. Shell thickness is determined by a single gene. One homozygote, the pisifera, is shell-less; many pisifera palms fail to set fruit, so the pisifera is not grown for commercial use. The other homozygote, the dura, has a thick shell. The heterozygote of the dura × pisifera cross, the tenera, has a thin shell. The tenera is the fruit form preferred for commercial use, because of its larger proportion of oilbearing mesocarp.  As pisifera are predominantly female sterile, i.e. early abortion of fruit bunches, the dura is used as the female and the pisifera as the male parent (male inflorences produce fertile pollen) of tenera planting material. The search is thus for dura and pisifera parents which transmit high bunch yield and oil-and-kernel extraction per hectare to their tenera offspring. Fortunately, as in other crops, These additive effects of the parents are in quantitative genetics termed General Combining Ability (GCA) values. Reliability of selection can therefore be greatly improved by selecting parents according to GCA values estimated from results of dura×pisifera crosses. The additive model, however, does not fully predict the performance of the tenera offspring; crosses may perform better or worse than estimated by adding GCA values of the parents. This deviation is due to the effect of Specific Combining Ability (SCA); but this SCA effect is usually much smaller than the effect of GCA. A model for the expected yield of the tenera offspring (D i ×P j ) is then, E(y ij ) = µ + α i + β j + γ ij where γ ij is the SCA of the cross D i ×P j .
To exploit both GCA and SCA effects the parents must be crossed; If we have A dura and B pisifera and all A×B dura×pisifera are realized, then we have a complete diallel crossing scheme; if there are only made C dura×pisifera crosses where C < A×B, then we have a partial diallel or incomplete diallel crossing scheme. In order to compare the entire set of the A dura and the B pisifera on the basis of the GCA values, the parents must be crossed according to a socalled connected crossing scheme. A crossing scheme is called connected if for each dura pair (D h , D i ) of the A dura, there is a chain of dura from dura D h to dura D i , in which each of the adjacent links of the chain occur together with the same pisifera. Otherwise the crossing scheme is called disconnected. In the same vein, the crossing scheme is connected if for each pisifera pair (P k , P j ) of the B pisifera, there is a chain of pisifera from pisifera P k to pisifera P j , in which each of the adjacent links of the chain occur together with the same dura. Another way to check whether the crossing scheme is connected, is to form a two-way table of the crosses with the A dura as rows and the B pisifera as columns. The crossing scheme is connected if we cannot split the table in separate tables by interchanging rows and columns. Let us elucidate this by a little example with C = 8 crosses made from A = 4 dura and B = 4 pisifera. Let the realized crosses be indicated by an asterisk (*) in the twoway table given in Table 1. From the cross of dura D 1 with pisifera P 1 , D 1 ×P 1 , we can make a chain to the cross of dura D 3 with P 1 , D 3 ×P 1 ; from D 3 ×P 1 we can go to the cross D 3 ×P 3 , and from this cross D 3 ×P 3 we can go to the cross D 1 ×P 3 , and then we come back to the cross D 1 ×P 1 . In this chain we have missed dura D 2 and D 4 . Hence this crossing scheme is disconnected. When we have rearranged the two-way table as follows (interchange P 3 with P 2 and also interchange D 3 with D 2 ), we see directly that there are two disconnected sets of four crosses each, see Table 2. The first set contains the 4 connected crosses D 1 ×P 1 , D 1 ×P 3 , D 3 ×P 1 and D 3 ×P 3 ; the second set contains the 4 connected crosses D 2 ×P 2 , D 2 ×P 4 , D 4 ×P 2 and D 4 ×P 4 . In such a disconnected crossing scheme no unbiased estimate can be made for the difference in GCA effect between, for example dura D 1 and D 2 or from the difference in GCA effect between pisifera P 3 and P 4 .
A more practical method of checking whether a crossing scheme is connected is to draw a chain from one cross to another following a horizontal or vertical direction only. If all the crosses are connected by one continuous chain the crossing scheme is connected.
A necessary (but not sufficient) condition to have a connected design is that C must be at least equal to the sum of the degrees of freedom of Intercept, dura and pisifera C ≥ 1 + (A-1) + (B-1) = A + B -1. In the example above we have A = 4 and B = 4, so C ≥4+4-1=7 crosses are sufficient for a connected design. In Table 3 the crossing scheme is connected when the following C=8 crosses were made: Here we have 8 crosses and the crossing scheme is still connected when C=7, for example, if the cross D 4 ×P 1 was not made.
In the past the author found many large crossing schemes that were not connected, because the oil palm breeder looked only for including previous good dura and pisifera parents. But the easiest way is to produce random yields with A dura and B pisifera parents and analyse the two-way classification of the crosses with a Personal Computer with statistical packages as SAS, IBM SPSS-Statistics or R, to find in the Analysis of Variance the Sum of Squares (SS) Type III option (SS of dura after correction for pisifera and SS of pisifera after correction for dura). If the degrees of freedom (df) of dura = A-1 and the df of pisifera = B-1, then the crossing scheme is connected, otherwise it is disconnected. Now we will discuss the construction of a good crossing design when we have A dura and B pisifera and we want to use C crosses in an incomplete diallel connected scheme where A+B-1 ≤ C ≤ A×B. The choice between several connected mating designs can best be tested on the standard error of the estimator for the difference in the GCA value of all the dura pairs and of the pisifera pairs. The standard error of the estimator for the difference in the GCA value between two dura parents D i and D j is (S Dij )×σ, or between two pisifera parents P i and P j is (S Pij )×σ, where σ is the residual standard deviation and the value of S Dij and S Pij depends solely on the mating scheme. The value of σ depends on the studied trait (e.g. yield), the variation between the plots in the experimental field and the plot size. For complete crossing schemes (as a complete diallel) with A dura and B pisifera, where each cross occurs on r plots, the standard error of the estimator of the difference between the GCA values of the dura parents is the same for all pairs of dura and S Dij is 2/(B × ); also the standard error of the estimator of the difference between the GCA values of the pisifera parents is the same for all pairs of pisifera and S Pij is 2/(A × ). For incomplete mating designs the standard error of the estimator of the differences in GCA values varies across the parents. The quality of such mating designs can be measured by the average and range of the standard errors of the estimator of the differences between the GCA values of a pair of dura parents or of a pair of pisifera parents. As shown above, such quality evaluation can solely based on the coefficients S Dij and S Pij .
To find a good mating design one can search for balanced or partially balanced incomplete mating designs. For such incomplete mating designs one can use the incomplete block designs. In such incomplete block designs there must be compared v varieties (or treatments) in blocks of sizes of k plots, where the block size k < v. Well known incomplete block designs are lattices where v =k×k or rectangular lattices where v = k×(k+1). To extend the possibilities for v unequal to k×k or k×(k+1) there are the so-called alpha-designs (see [9,10]). To use such an incomplete block design the role of treatments is played by the dura and the role of the incomplete blocks by the pisifera. So we must look for incomplete block designs with A treatments and B blocks. The block size k is then chosen as C/B, where C is the number of crosses used. If there is no incomplete block design which fits the requirements, we can always start from a smaller design and add some extra treatments (= dura) to the blocks (= pisifera).
As an example we give here some mating designs involving C = 40 crosses among A = 20 dura and B = 10 pisifera. In these designs each dura must be crossed with two pisifera; further-more each pisifera must be crossed with four dura. Two designs (I and II) were solely chosen intuitively on the basis of symmetry by two experienced oil palm breeders and the last design III is chosen by the author as an alpha-design, with v =20 treatments (= dura), k = 40/10 = 4 as block size, b = 10 blocks (= pisifera), r = 2 replications where the first replication consists of blocks 1-5 and the second replication consists of blocks 6-10, hence a resolvable design. Table 4. Three crossing schemes with C = 40 crosses of 20 dura and 10 pisifera.

Design I Design II
Design III pisifera pisifera pisifera dura 1 2 3 4 5 6 7 8 9 10 dura 1 2 3 4 5 6 7 8 9 10 dura 1 2 3 4 5 6 7 8 9 10 It can be directly seen that all these three mating designs are connected. In the following table the minimum, maximum and average of the coefficients S Dij and S Pij for the standard errors of the estimators of the difference between GCA values of pairs of dura and pisifera parents, for the three designs are given in Table 5. From Table 5 it is clear that design III (the alpha-design), which has the smallest average value for S Dij and S Pij for the dura and the pisifera pairs, and moreover has the smallest range (max -min) for S Dij and S Pij , is the mating design which must be preferred. Hence it is worthwhile to use an alpha-design which gives always a connected mating design and be careful to rely too much on "experience"!

Analysis of an Oil Palm Breeding Trial
Let us consider the case that we have made C = 10 connected tenera crosses T 1 , T 2 , … , T 10 derived from A = 5 dura mothers and B = 5 pisifera fathers. In the following two-way table the crossing scheme is given; a dot ( • ) indicates a cross which has not been made, see Table 6. The palm plot consists of 6 rows of 6 palms, where the palms are laid down at the corners of an equilateral triangle with 9 m sides. For the yielding capacity of a cross the yield of the 4×4 = 16 inner palms is recorded. Suppose that the experimental field is very heterogeneous, and that we can only find homogeneous growing conditions (blocks) of five adjacent plots.
A resolvable alpha-design with block size k = 5 and with r = 4 replications is used. The index of the tenera crosses T i is given by the program CycDesigN in the randomized resolvable alpha-design in Table 7. After we have laid out the design of Table 7 in the field, we gathered after a year the yield y in ton/ha. We made then the following data file of the results from this example (see Table 8) for IBM-SPSS Statistics, where we used consecutive block numbers 1 -8 for the blocks in the replications. The IBM-SPSS Statistics syntax to find the GCA values for the dura and the pisifera and the adjusted tenera means (EMMEANS= Estimated Marginal Means) is given in Table  9. Table 9 IBM-SPSS Statistics syntax file for the analysis of data file Table 8.
Title ' Table 8 The SAS syntax to find the GCA values for the dura and the pisifera and the adjusted tenera means (LSMEANS = Least Square Means; this LSMEANS is the same as what SPSS called EMMEANS), is given in Table 9. From the output we find that the dura is significant (Pvalue = 0.005) and that the pisifera is significant (P-value = 0.001) and the tenera is significant (P-value = 0.000).
The SCA (Specific Combining Ability)of tenera is calculated as LSMEAN(ternera) -Additive Mean; the SCA for tenera 1 is 15.375 -15.0284 = 0.3466. Then this has been done for all the other tenera.
In a recent handbook about Oil Palm Breeding, see Chapter 12 "Field Experimentation" [12] the use of incomplete block designs for Oil Palm breeding trials is also treated.
For the use of statistical selection procedures to select the best set of dura and pisifera parents with the indifference zone approach of selection of Bechhofer [1], or the subset selection procedure of Gupta [6,8].

Conclusions
In Oil Palm Breeding trials the breeder must be aware that he must use a connected incomplete crossing scheme of the dura (female parents) and the pisifera (male parents) to produce the tenera hybrids. The use of an alpha-design to produce such a connected incomplete diallel is easily be done by using the dura as the treatment and the pisifera as the incomplete blocks.
Because the field plots of the tenera palms, which are planted at the side of an equilateral triangle with sides of 9 m, are quite large, the breeder must use an incomplete block design. For the trial design of the tenera crosses in the field use an incomplete block design given by an alpha-design, because this gives easily a solution for the number of tenera with a small block size of the incomplete blocks.
Using the program CycDesigN the needed alpha-designs are easily found for the incomplete diallel and later for the experimental design in the field.
The analysis can be done on a Personal Computer with a statistical package which can analyze a Mixed Model. Syntax are given for SAS and IBM-SPSS Statistics. But the analysis can also be done with the free program R with the package lme4.