Documentation for YTDESIGN:

Yield Trial Design

Hugh G. Gauch, Jr.

April 1995.

This freeware program, YTDESIGN, and its documentation may be copied and shared without charge. Archive versions can be downloaded from your browser for both IBM-compatible PC and Macintosh computers. Or via ftp at:

ftp://ftp.lightlink.com///pub/mcp/pub/YTDesign.sit for Macintosh

ftp://ftp.lightlink.com///pub/mcp/pub/_YTDesig.exe for PC

o Address questions to:

: Hugh G. Gauch, Jr.
: Soil, Crop, and Atmospheric Sciences
: 1021 Bradfield Hall
: Cornell University
: Ithaca, NY 14853

: Telephone 1-607/255-7764
: FAX: 1-607/255-8615

o Introduction

This program, YTDESIGN, provides agronomists and plant breeders guidance in choosing the optimal number of replications in their selection experiments. This documentation sketches relevant statistical theory, explains program utilization, and provides a sample output. The yield increases achieved through breeding and yield-trial experiments depends not only on the number of crosses and number of yield plots, but also on the statistical design and analysis of the experiments. Indeed, improvement of an experiment's design and analysis is frequently the most cost-effective means available for increasing research payoffs and yield gains. An efficient experiment with 400 plots can outperform an unwise experiment with 2000 plots while at the same time costing far less. Progress depends on both the size and efficiency of a research project.

The key concept of YTDESIGN is to strike an optimal balance between genotypes and replications. Increasing the number of genotypes includes more of the markedly superior entries, while increasing the number of replications improves accuracy and hence promotes correct identification of the truly superior entries. Since the cost of an experiment rises roughly as the number of plots (number of genotypes times the number of replications), an inevitable tradeoff is imposed between genotypes and replications. For example, with 400 plots and roughly the same cost, one could test 400 genotypes with 1 replication, 200 with 2, 133 with 3, or 100 with 4, and so on. Which choice leads to the highest-yielding selections? The answer to that question depends on exactly two input parameters: the number of plots and the signal-to-noise (S/N) ratio, which is defined as the genotype variance divided by the error variance (Gauch and Zobel 1995b). For an experiment with a given number of plots and S/N ratio, YTDESIGN calculates the optimal number of replications. It also quantifies the deleterious impact of suboptimal choices.

Very early-generation trials are often conducted with small plots in only one environment (site-year) because of small seed supplies and related limitations. But for late-generation trials, in recent decades there has been a marked trend toward more environments at the cost of fewer replications in order to quantify genotype-by-environment (GE) interactions better (Bradley et al. 1988).

The Additive Main effects and Multiplicative Interaction (AMMI) model combines an additive model (analysis of variance, AOV) for main effects and a multiplicative model (principal components analysis, PCA) for interaction effects. It is applicable to trials with at least three genotypes and three environments. AMMI has proven useful for (1) understanding and visualizing complex GE interactions, and (2) gaining accuracy and thereby improving selections (Gauch 1992, 1993b; Gauch and Zobel 1995a).

AMMI is implemented by MATMODEL (Gauch and Furnas 1991; Gauch 1993a). For replicated data, MATMODEL can calculate the AMMI gain factor, which is the effective number of replications with AMMI analysis divided by the actual number of replications. For example, when 10 replications without AMMI are required to match the accuracy of 4 replications with AMMI, the AMMI gain factor is 10/4 or 2.5. For a wide range of yield trials, MATMODEL typically achieves a gain factor of 2 to 4 (Gauch and Zobel 1988; Gauch 1992, 1993a, 1993b). YTDESIGN can calculate the optimal number of replications both with and without AMMI analysis. Clearly, this optimal number is usually different and smaller when MATMODEL is used. That is, the design and the analysis of an experiment interact, so the best design depends on the efficiency of the analysis. Efficient experiments require good designs and good analyses, appropriately integrated.

o Motivation

Why bother with careful design and aggressive analysis of yield-trial experiments? The exact advantages from careful design by YTDESIGN and aggressive analysis by MATMODEL vary from experiment to experiment. Also, the advantages are a mixture of quantifiable factors, like an AMMI gain factor of 2.15, and qualitative factors, like a deeper understanding of complex GE interactions. As an overall conservative estimate, I suggest that aggressive data analysis boosts yield increases by an additional 20% per year beyond that achieved with older analyses. Particularly because most breeding programs have several concurrent stages of selection and advantages compound, the actual figure in specific cases is frequently larger.

In other words, routinely some of the potential benefit from yield-trial data is not captured by conventional analyses and is wasted. As Nielsen (1992) commented in his presidential address to the American Society of Agronomy, ``We still rely on statistical methods developed more than one-half century ago,'' such as ``analysis of variance and regression techniques.'' Indeed, the regression analysis of Yates and Cochran (1938), subsequently popularized by Finlay and Wilkinson (1963), is still by far the most common method for analyzing GE interactions. Yet the empirical comparison by Yau (1995), based on numerous data sets, showed that AMMI usually captures several times as much of the interaction sum of squares as does regression (and mathematical theory proves that the reverse outcome is never possible).

To reject the present thesis of a 20% boost from careful design and aggressive analysis, one would need to defend the counter thesis that conventional statistical methods are already extracting nearly 100% of the information in yield data, leaving little if any room for improvement. I suggest that this counter thesis, rather than the present thesis, is the one that would prove exceedingly difficult to defend. It is not plausible that statistics has made no meaningful advance over the past one-half century.

Furthermore, in a breeding context, a 20% boost each year accumulates over the years. After just several years, the difference between researchers using modern statistics and those using older methods will become substantial. Most importantly, statistical advantages become embodied in germplasm advantages reflecting a history of superior selections. Eventually, a slowly advancing breeding population falls so far behind another more rapidly advancing population that the future offers virtually no prospect of ever catching up again.

During the past few decades, breeders have been increasing yields for many crops by about 0.5% to 2% per year. With no increase in the size of field experiments but rather with addition of experimental design using YTDESIGN and data analysis using MATMODEL, these figures should increase to 0.6% to 2.4% per year. On the other hand, if it turns out for some crops that past gains soon begin to make future gains harder to achieve, the benefit from aggressive statistics may be to maintain increases of 0.5% to 2% for another decade or two, rather than suffering a drop to 0.4% to 1.6%. Whether or not we are approaching a leveling off of the yield curve for a given crop, efficient research is critical in a manner that compounds from year to year. Efficient research makes a modest difference the next year, but a huge difference the next decade.

It will not be possible for farmers to increase crop yields commensurate with expected world population increases, nor will it be possible for seed producers to maintain competitive advantages or increase market shares, if agricultural researchers conduct inefficient experiments. Bad design or bad analysis is equivalent to paying for yield data, and then throwing away half or three-quarters of the data. An inefficient experiment has excessive costs relative to the benefits it delivers. The combination of YTDESIGN for efficient design and MATMODEL for efficient analysis can allow expensive yield-trial experiments to deliver better selections, faster yield increases, and deeper understandings of the genotypes, environments, and GE interactions.

o Statistical Theory

Three publications explain the statistical theory underlying selection experiments and provide numerous citations to the literature. Obviously, greater accuracy implies better selections, but Gauch and Zobel (1989) quantify this relationship. Chapter 5 of Gauch (1992) gives a basic account of order statistics and explores implications for selection experiments. Gauch and Zobel (1995b) give the theory and equations used in YTDESIGN. This documentation merely sketches the relevant theory, leaving a fuller account to these three publications.

Envision taking N random draws from the standard normal distribution (with mean 0 and variance 1) and then ranking or ordering these values from largest to smallest. By definition, the expected or average value of the largest draw is the first order statistic. For example, the first order statistic for N=10, 100, or 500 draws is 1.53, 2.51, or 3.04. Denote the first order statistic for N draws by X(N).

Assume that the true means of the genotypes are distributed normally, as are the errors. Then the expected true yield of the best of G genotypes is simply the grand mean plus the quantity the genotype standard deviation times X(G). But the empirical yield of the trial's apparent winner selected for having the highest empirical yield is larger because of noise. Define the phenotype standard deviation as the square root of the quantity the genotype variance plus the error variance divided by the number of replications times the AMMI gain factor. Then the expected empirical yield of the trial's apparent winner equals the grand mean plus the quantity the phenotypic standard deviation times X(G). Finally, the expected true yield of the trial's apparent winner, which is the quantity of greatest interest in selection experiments, equals the grand mean plus the quantity X(G) times the ratio of the genotypic variance over the phenotypic standard deviation. Gauch and Zobel (1995b) explain these equations in greater detail, and YTDESIGN solves these equations.

Tables in Gauch and Zobel (1995b) provide general results for several S/N ratios from 0.3 to 3.0 and several numbers of plots from 50 to 5000. YTDESIGN provides exact results for any choices of S/N ratio and number of plots.

o Program Use

This program is invoked from a DOS prompt by typing ``YTDESIGN'' and hitting the enter key. It first asks for a name for the output file. If the intended directory for the output file is not the current default directory, include the necessary path with the filename. After entering each set of input parameters, brief output goes to the screen and a more extensive output goes to the output disk file. After each analysis, the user is asked: ``Enter another analysis? <Y, N>.'' The reply ``Y'' allows the user to analyze another experimental setup, whereas ``N'' terminates the program run.

Each analysis of an experimental setup requires specification of seven input parameters: (1) the yield trial's grand mean, (2) the standard deviation of the genotypes' true yields, (3) the standard deviation of the errors, (4) the AMMI gain factor, (5) the number of yield plots, (6) the number of genotypes, and (7) the number of replications. The standard deviation of the genotypes' true yields is the square root of the genotype variance, and similarly the standard deviation of the errors is the square root of the error mean square. Estimation of variance components is explained in many standard plant breeding or statistics texts, such as Searle (1992). The AMMI gain factor for a particular replicated experiment can be obtained from MATMODEL analysis. For the purpose of designing future yield trials, a reasonable procedure is to use a conservative estimate from similar previous yield trials. For example, if a certain kind of yield trial is run repeatedly and numerous past experiments had AMMI gain factors of 2 to 3, then a future expectation of at least 2 is plausible. If AMMI analysis of the data is not planned, which must be the case when a trial has fewer than three genotypes or environments, then the AMMI gain factor should be set to 1. The program checks that the number of yield plots equals the number of genotypes times the number of replications. If not, these three numbers are requested again.

After entering these seven input values, several results appear on the screen: the S/N ratio and the expected values of the top selection's true yield and empirical yield and of the best genotype's true yield. The output file has, in addition, a table of expected values for the top selection's true yield were the number of replications set to each value from 1 to 10. An asterisk draws attention to the optimal number of replications with the highest expected yield. It is rare for the optimal number of replications to exceed 10, but if that is the case, the program continues the search up to 70 replications and an extral line gives the optimal results when this extended search succeeds. This table also gives the variance ratio, which equals the S/N ratio divided by the quantity the number of replications times the AMMI gain factor. For each true yield value (resulting for each number of replications, ordinarily 1 to 10), the best experiment is specified by its numbers of plots, genotypes, and replications. It is the smallest, most efficient experiment that can deliver the specified expected true yield. Finally, each experiment's efficiency is listed, which equals the number of plots in the best experiment divided by the number of plots in the actual experiment. This quantifies the penalty for suboptimal experimental desighs. For example, when the experimental efficiency is 40%, that means that a wiser experiment with only 40% as many plots could deliver selections just as good as the original experiment.

o Sample Output

The following sample file, analyzing two experimental setups, exemplifies YTDESIGN output. Note that the first experimental design has an efficiency of only 75%. It has 400 plots (200 genotypes with 2 replications), but a wiser experiment with only 300 plots (75 genotypes with 4 replications) would perform just as well. Alternatively, one could keep the experiment's size the current 400 plots, but fix the design problem by switching from 2 to 4 replications (and correspondingly from 200 to 100 genotypes). This improved design increases the top selection's improvement above the grand mean by 4.4%. The single difference in the second experiment is that now AMMI analysis is planned, expecting an AMMI gain factor of about 2.5. Then the present 2 replications is optimal. The combination of optimal design and AMMI analysis has increased the expected value of the top selection's true yield from 2299.7050 to 2353.7850, so the increase above the grand mean of 2000 has been improved by 18% with no increase in the size and cost of the field experiment. In a breeding program, such advantages compound over the years.

o YTDESIGN File: ytdesign.lst

Grand Mean           2000.0000

Genotype SD           150.0000

Error SD              200.0000

AMMI Gain Factor        1.0000

Yield Plots                400

Genotypes                  200

Replications                 2

Signal-to-Noise Ratio                   .5625

Top Selection's True Yield          2299.7050

Top Selection's Empirical Yield     2566.1100

Best Genotype's True Yield          2411.9060

Optimal  4 Reps True Yield          2312.9730

Expectations for    400 plots and S/N=   .5625.  Best Experiment

Nrep  Variance Ratio  True Yield    Efficiency   Plots   Gens  Reps

  1         .5625     2267.13700       37.50      150     50     3

  2        1.1250     2299.70500       75.00      300     75     4

  3        1.6875     2309.97300       94.00      376     94     4

  4        2.2500     2312.97300 *    100.00      400    100     4

  5        2.8125     2312.65200      100.00      400    100     4

  6        3.3750     2310.30600       95.00      380     95     4

  7        3.9375     2308.04900       90.00      360     90     4

  8        4.5000     2305.15500       85.00      340     85     4

  9        5.0625     2301.39600       78.00      312     78     4

 10        5.6250     2298.65400       73.00      292     73     4

Grand Mean           2000.0000

Genotype SD           150.0000

Error SD              200.0000

AMMI Gain Factor        2.5000

Yield Plots                400

Genotypes                  200

Replications                 2

Signal-to-Noise Ratio                   .5625

Top Selection's True Yield          2353.7850

Top Selection's Empirical Yield     2479.5750

Best Genotype's True Yield          2411.9060

Optimal  2 Reps True Yield          2353.7850

Expectations for    400 plots and S/N=   .5625.  Best Experiment

Nrep  Variance Ratio  True Yield    Efficiency   Plots   Gens  Reps

  1        1.4063     2340.36400       73.50      294    147     2

  2        2.8125     2353.78500 *    100.00      400    200     2

  3        4.2188     2351.71000       95.50      382    191     2

  4        5.6250     2346.59800       85.00      340    170     2

  5        7.0313     2340.60000       74.00      296    148     2

  6        8.4375     2334.05700       64.00      256    128     2

  7        9.8438     2328.66500       57.00      228    114     2

  8       11.2500     2323.29800       50.50      202    101     2

  9       12.6563     2317.51800       44.50      178     89     2

 10       14.0625     2313.17200       40.50      162     81     2

o References

1. Bradley, J.P., Knittle, K.H., and Troyer, A.F. 1988. Statistical methods in seed corn product selection. Journal of Production Agriculture 1:34-38.

2. Finlay, K.W., and Wilkinson, G.N. 1963. The analysis of adaptation in a plant-breeding programme. Australian Journal of Agricultural Research 14:742-754.

3. Gauch, H.G. 1992. Statistical Analysis of Regional Yield Trials: AMMI Analysis of Factorial Designs. Elsevier, Amsterdam, The Netherlands.

4. Gauch, H.G. 1993a. MATMODEL Version 2.0: AMMI and Related Analyses for Two-way Data Matrices. Microcomputer Power, 111 Clover Lane, Ithaca, New York 14850.

5. Gauch, H.G. 1993b. Prediction, parsimony, and noise. American Scientist 81:468-478.

6. Gauch, H.G., and Furnas, R.E. 1991. Statistical analysis of yield trials with MATMODEL. Agronomy Journal 83:916-920.

7. Gauch, H.G., and Zobel, R.W. 1988. Predictive and postdictive success of statistical analyses of yield trials. Theoretical and Applied Genetics 76:1-10.

8. Gauch, H.G., and Zobel, R.W. 1989. Accuracy and selection success in yield trial analyses. Theoretical and Applied Genetics 77:473-481.

9. Gauch, H.G., and Zobel, R.W. 1995a. AMMI analysis of yield trials. In: Kang, M.S., and Gauch, H.G. (Editors). Genotype- by-Environment Interaction. CRC Press, Boca Raton, Florida. [Until published around October 1995, preprints are available upon request to Hugh Gauch at the above address.]

10. Gauch, H.G., and Zobel, R.W. 1995b. Optimal replication in selection experiments. [Submitted to Crop Science in February 1995; until published, preprints are available upon request to Hugh Gauch at the above address.]

11. Nielsen, D.R. 1992. Global agronomic opportunities. Agronomy Journal 84:131-132.

12. Searle, S.R. 1992. Variance Components. John Wiley, New York, New York.

13. Yau, S.K. 1995. Regression and AMMI analyses of genotype x environment interactions: An empirical comparison. Agronomy Journal 87:121-126.

14. Yates, F., and Cochran, W.G. 1938. The analysis of groups of experiments. Journal of Agricultural Science, Cambridge 28:556-580.