Abstract

Genomic prediction integrates statistical, genomic, and computational tools to improve the estimation of breeding values and increase genetic gain. Due to the wide diversity in mating systems, breeding schemes, propagation methods, and unit of choice, no universal genomic prediction arroyo can be applied in all crops. In a genome-wide family prediction (GWFP) arroyo, the family is the basic unit of selection. We tested GWFP in two loblolly pine (Pinus taeda L.) datasets: a convenance population composed of 63 full-sib families (5–twenty individuals per family), and a fake population with the aforementioned pedigree structure. In both populations, phenotypic and genomic information was pooled at the family level in silico. Marker furnishings were estimated to compute genomic estimated breeding values (GEBV) at the private and family (GWFP) levels. Less than six individuals per family produced inaccurate estimates of family phenotypic performance and allele frequency. Tested across unlike scenarios, GWFP predictive ability was college than those for GEBV in both populations. Validation sets equanimous of families with similar phenotypic hateful and variance as the training population yielded predictions consistently higher and more accurate than other validation sets. Results revealed potential for applying GWFP in breeding programs whose selection unit are family, and for systems where family can serve every bit training sets. The GWFP approach is well suited for crops that are routinely genotyped and phenotyped at the plot-level, but information technology can be extended to other breeding programs. Higher predictive power obtained with GWFP would motivate the application of genomic prediction in these situations.

Introduction

Genomic ( Elshire et al. 2011), statistical ( Meuwissen et al. 2001; Gianola et al. 2009), and computational advances accept allowed significant increases in genetic gain by applying genomic prediction in breeding programs across several species (due east.g., Hayes et al. 2009; et al. 2015, 2016; Gezan et al. 2017; Amadeu et al. 2020; de Bem Oliveira et al. 2020). Taking advantage of the always-reducing price of molecular markers (Wetterstrand, 2020), the concept of genomic prediction was derived ( Meuwissen et al. 2001) as an alternative method to marker-assisted selection. Genomic prediction utilizes a dumbo panel of molecular markers covering the whole genome to predict genomic estimated breeding values (GEBV) of individuals with no phenotypic records ( Meuwissen et al. 2001). Traditional genomic prediction pipelines involve developing a training set, for which available genotypic and phenotypic data is fitted to build a prediction model. This model is afterward used to predict GEBV of choice candidates in a validation set, composed of individuals that are genotyped but not phenotyped. Cross-validation schemes are implemented taking sub-samples from the training prepare to calibrate the model so fit the model into the remaining role of the training set to estimate and evaluate its predictive ability, i.eastward., the correlation between GEBVs and phenotypic values ( Pérez-Cabal et al. 2012).

Genomic prediction has been chop-chop adopted in animal breeding ( Hayes et al. 2009) due to readily accessible genomic data, big reference populations with authentic full-blooded records, and the impossibility of phenotyping sex-linked traits (Stock and Reents 2013). In dairy cattle, genomic prediction can double the genetic gain compared with selection based on progeny test ( Xu et al. 2020). On the contrary, the awarding of genomic prediction in plants has been lagging behind due to less accessible loftier-throughput genotyping methods, lack of accurate full-blooded records, and the wide range of variation in life cycle, ploidy level, and mating systems constitute in plants ( Hough et al. 2013). All these constitute-specific characteristics are key factors affecting predictive power in genomic prediction due to their influence in convenance methods, constructive population size, population structure, and linkage disequilibrium ( Lin et al. 2014). Pioneer studies implementing genomic prediction in plants were performed in major ingather species with traditional hybrid selection such as maize (Combs and Bernardo 2013; Massman et al. 2013) and trees ( Kumar et al. 2012; Resende et al. 2012), or variety option in self-pollinating species ( Poland et al. 2012). Genomic prediction showed to be a powerful tool to accomplish college genetic gain in plant breeding in many other species ( Crossa et al. 2017; Lara et al. 2019; de Bem Oliveira et al. 2020; Esfandyari et al, 2020). Large commercial breeding companies take been applying genomic prediction; even so, the success of the process depends strongly on the species and the breeding program scheme ( Voss-Fels et al. 2019; Xu et al. 2020).

Several species are bred as populations of large full or half-sib families, and commercially used as populations of different levels of relationship (i.e., constructed cultivars) as in some forage species, such alfalfa (Medicago sativa L.; Annicchiarico et al. 2015; Biazzi et al. 2017) and ryegrass (Lolium perenne L.; et al. 2016; Cericola et al. 2018). In those species, the family (total or half-sibs) is the basic unit for phenotyping (eastward.g., plot-level measurement for yield rather than plant level) and selection. Thus, due to the mating organization nature (allogamy), individual plants are of limited involvement because commercial varieties represent a homogenous population equanimous of heterozygous individuals (Poehlman 1987). Also, information technology is non straightforward to link phenotypic data collected on individual spaced-plants to plot-based swards in crops such as forage and turfgrass, which are mostly allogamous (Poehlman 1987), and single-plant performance has been shown to poorly predict plot-based data ( Wang et al. 2016). Therefore, the application of genome-wide family prediction (GWFP) would exist advantageous for traits that are phenotyped using family pools in swards or plots. The phenotypic data collection at the plot level could be extended to other organisms grown and evaluated in families, such as turfgrasses (Fifty.perenne 50.), forages (M.sativa 50.), sugarcane (Saccharum officinarum L.), cassava (Manihot esculenta L.), dearest bees, and to aquaculture species such as shrimp (Litopenaeus vannamei; Barbosa et al. 2012; Wang et al. 2017; Jia et al. 2018; Pembleton et al. 2018; Brascamp and Bijma 2019; Torres et al. 2019) . The awarding of GWFP has already been reported for crops that are bred and farmed as family pools, such as cantankerous-pollinated forage species ( Annicchiarico et al. 2015; et al. 2015, 2016; Biazzi et al. 2017; Cericola et al. 2018; Guo et al. 2018; Jia et al. 2018).

The GWFP arroyo considers family-pools as the measurement unit. Here, both allele frequencies and phenotypic records are expressed every bit a unmarried-average record of a given family. Therefore, the condiment genetic variance in full-sib families is half of the additive variance betwixt individuals. Full-sibs share the same parents, hence the mean genotypic value of a full-sib family is equal to the mean breeding value of the ii parents: ¼(Va +Va') = ½5a. This ½Fivea variance represents the condiment genetic variance amid full-sib families, whereas the other ½5a is the variance within a family unit, i.e., the variance between individuals (i.e., but 50% of the genetic variation is exploited in GWFP; Falconer and Mackay 1996). As a result, higher predictive ability was reported in family pools when compared with GEBV ( Ashraf et al. 2014). Despite the initial efforts to test the predictive ability of GWFP using empirical data, there is a need to explore further implementation of GWFP in convenance schemes. Equally a kickoff aspect, information technology is essential to compare the predictive ability of GEBV vs GWFP models, and to develop strategies to combine both approaches. For this, datasets that contain family structures but genotyped and phenotyped at the single-found level are ideal. Another attribute is the understanding of the influence that family/pool size and phenotypic variances in training/validation sets have in the predictive ability for diverse traits.

In order to evaluate these aspects, two loblolly pine (Pinus taeda 50.) populations were studied: (1) an observed breeding population composed of 63 families (CCLONES_real), and (ii) a simulated population that reproduced the aforementioned pedigree equally CCLONES_real. The objectives of this written report are: (i) to identify the minimum number of individuals per family required to calculate allele frequency and phenotypic hateful values with reasonable accuracy; (ii) to investigate the effect of contrasting phenotypic mean and variance between training and validation sets on predictive ability; and (3) to assess the predictive ability of GEBV and GWFP. Loblolly pine is not normally bred in family pools, but existing real and simulated datasets were used to compare GEVB and GWFP approaches.

Materials and methods

Loblolly pino real population data

The phenotypic data from the loblolly pino (P.taeda L.) population known equally "comparing clonal lines on experimental sites" (CCLONES_real), which take previously been used for predicting the performance of individual trees ( Resende et al. 2012), was used to assess the efficiency of the GWFP. In this study, GWFP was tested past pooling individual trees belonging to the same full-sib family. The population is composed of 923 individuals from 70 total-sib families obtained past crossing 32 parents in a circular diallel mating design with additional off-diagonal crosses ( Baltunis et al. 2007). The number of individuals per family ranged from 1 to 20, with an average of 13 trees per family (standard deviation = 5). In this study, families with less than five individuals were removed, and 63 full-sib families were used for analyses. Data collection was described in detail in Resende et al. (2012) and Munoz et al. (2014). In summary, all 923 genotypes from CCLONES_real was phenotypically characterized in three replicated studies and was genotyped using an Illumina Infinium assay (Illumina, San Diego, CA, The states; Eckert et al. 2010) with 7216 SNPs, each representing a unique pine EST contig. In this study, four traits representing growth, quality, and diseases were selected based on their narrow-sense heritability and genetic architecture as reported by Resende et al. (2012). These correspond to: (1) lignin concentration (Lignin) (h2 = 0.11, polygenic trait), (ii) tree stiffness (Stiffness) at year 4 (km2/s2) (h2 = 0.37, polygenic trait), (iii) rust susceptibility (Rust) caused by Cronartium quercuum Berk. Miyable ex Shirai f. sp. Fusiforme (hii = 0.21, oligogenic trait), and (iv) diameter at breast meridian (Diameter) at year 6 (cm) (h2 = 0.31, polygenic trait).

Simulated population

A false population (CCLONES_sim) exhibiting similar genetic backdrop every bit CCLONES_real was also considered in this study, aiming to assess the efficiency of GWFP for ii different traits and to predict the operation of the next generation. The description for the simulation, and the results for genomic prediction approaches using individual trees (GEBV) were previously reported for this synthetic population ( de Almeida Filho et al. 2016, 2019). In summary, the base population was created (G0 = 1000 diploid individuals) by randomly sampling 2000 haplotypes from a population with an effective size of N e = 10,000 ( Johnson et al. 2001) and a mutation rate of ii.5 × 10−viii. Then, the 10% highest phenotypic values from G0 were selected and randomly mated to generate the first breeding generation (G1). From G1, 42 individuals were selected and used in a circular diallel mating design that reproduced the pedigree as in CCLONES_real (G2), comprised of 923 individuals and 71 total-sib families. However, only 63 families, with more than than five individuals, were used in this study. Subsequently, 42 individuals were selected from G2 and used in crosses to the next generation (G3, CCLONES_sim_prog), a population composed of 1176 individuals and 71 families. Only the 63 families with more v individuals were used for analyses.

The faux genome had 12 chromosomes, each with 100 cM, and 10,000 polymorphic loci were randomly selected to represent the entire genome, and just the scenario exhibiting an absence of dominance (d2 = 0.0) and h2 = 0.25 were used for analyses in this study. Two traits with dissimilar genetic architectures were simulated: (i) oligogenic: 30 QTL were sampled from a gamma distribution with rate 1.66 and shape 0.4, with positive or negative QTL effects ( Meuwissen et al. 2001), and (2) polygenic: 1000 QTL were used, and their additive effects were sampled from a standard normal distribution (Hickey and Gorjanc 2012). The simulations were run using Macs ( Chen et al. 2009) and in the software R using scripts adult by the authors.

Pooling phenotypic and genotypic data at the family level

In both populations, phenotypic and genotypic information were pooled at the family level in silico. Nosotros assumed that the family phenotype was the boilerplate of all individuals in a family. Hence, the phenotypic value for each individual was pooled at the family level in silico by calculating the family mean, without because the experimental design. Therefore, the average phenotypic value by family unit was used equally the response for all analyses.

In the case of the genomic data, the allele frequency (p) was calculated for each SNP per family unit, considering the reference allele (A) as follows:

where pij refers to the allele frequency for SNP i in the j family; n A A i j and 2 n A a i j are number of individuals with genotype AA and Aa respectively for SNP i in the family j; N i j are number of individuals in family j with non-missing genotype data for SNP i. Missing values for allele frequency were imputed at the family level using the average allele frequency for that given SNP across families. Markers were excluded from analyses when more than than 50% of the families exhibited missing values, and SNPs were not removed based on minor allele frequency. A total of 4740 polymorphic SNPs (CCLONES_real) and an average of 5000 polymorphic SNPs for CCLONES_sim and CCLONES_sim_prog (boilerplate across simulated replicates) were used in the analyses.

Minimum number of individuals per family unit to estimate allele frequency and family phenotypic hateful

A total of 10 families from CCLONES_real and CCLONES_sim with at least xv individuals were selected to evaluate the minimum number of individuals required to estimate allele frequency and phenotypic family means with the most reasonable accurateness. Families were specifically selected to represent segregation ratios (1:1 and 1:2:one) for 10 SNPs. Allele frequencies per family unit and family unit phenotypic ways were calculated varying the number of individuals per family from one to 15. These values were used to compute the squared deviations between the hateful value obtained with i number of individuals (i=1–15) and the mean value obtained with the entire family (15 individuals), under the assumption that 15 individuals per family provide accurate estimates of allele frequencies and phenotypic mean in our families. This assumption can exist validated using the concept of genetic representativeness, given by the effective population size (N due east) (Vencovsky and Crossa 2003). The estimator of the N due east within a full sib family is given by N e = [2n/(n + 1)] (Resende and Barbosa 2006). The maximum (when n goes to infinite) Northward east inside a full sib family unit is 2. With n equal to 15 individuals the North e is 1.88, which is 94% of this maximum of ii.

Statistical methods for genomic prediction

Marker effects were estimated at the individual (GEBV) and family unit (GWFP) levels with two distinct whole-genome regression approaches using the bundle BGLR (Perez and de los Campos, 2014) in R (R Development Core Team 2018): (1) Bayes B which considers that markers have heterogeneous variances, i.eastward., many loci with no genetic variance and a few loci explain a large portion of the genetic variation ( Meuwissen et al. 2001; Pérez and de Los Campos 2014); and (2) Bayes RR a Bayesian method that assumes mutual variance across all loci; therefore, SNPs with the same allele frequency explain the same proportion of variance and have the same shrinkage effect (Gianola, 2013; Pérez and de Los Campos 2014).

In total, xx,000 Markov chain Monte Carlo iterations were used, of which the first 5000 were discarded as fire-in, and every third sample was kept for parameter estimation. Nosotros fitted the following model for private and family models:

where y is the vector of the averaged phenotype past family in the case of GWFP and past private in the multiple clones in the case of GEBV, µ is the overall mean fitted equally a fixed result, thou is the vector of random marker furnishings, and due east is the vector of random error effects, i is a vector of ones, and Z is the incidence matrix indicating allele frequencies in the instance of GWFP (ranging from 0 to 1), and marking dosage (0, one, and two) for GEBV.

After fitting the model described above for each trait, the GWFP and GEBV of family/individual j (gj ) were obtained using the following expression:

where Z i j is the allele frequency/marker dosage of the ithursday marker on family/private j, and p is the full number of markers, and thousand ^ i is the estimated effect of ith SNP.

Cross-validation schemes

The prediction models for GEBV and GWFP were validated using 10-fold cross-validation and go out-one-out approaches, for both populations and all traits. For the 10-fold cantankerous-validation, data was randomly partitioned into 10 subsets, and preparation set populations were created with ninety% of the families/individuals, whereas the remaining 10% of families/individuals were used every bit validation set. This scheme was repeated until the ten subsets were used as validation set. In the leave-one-out approach, models were constructed using NorthwardT −1 families (where NorthwardT = is the total number of families) in the training set. The validation fix was the unmarried family not included in the training group. This scheme was repeated NT times until all 63 families were used as the training set up.

Each time the models were fitted using a different validation set, the model'due south predictive power was estimated calculating a Pearson's correlation between the observed/simulated phenotypes and the GWFP/GEBV estimates for the families/individuals included in the validation set.

Creating training/validation sets using contrasting phenotypes

To assess the effect that the validation set structure has in the predictive ability of the models, both populations were divided in three different phenotypic classes for each trait: the smallest x%, the largest x%, and values between both extremes. Five validation sets were created for each trait using these phenotypic classes: (ane) Low: ten% families with the lowest phenotypic values; (2) Loftier: 10% families having the highest values; (three) Depression+High: combining four families from Low and three families from High; (4) Middle: seven families showing phenotypes around the population hateful, (5) Combined: 2 families from Low, 2 families from High, and three families from Middle. For the populations Low+High (iii), Middle (iv), and Combined (v), iii replicates were created past taking random samples from each phenotypic course. The other 56 families were used as grooming sets to build prediction models.

Split-families as grooming/validation sets

Two scenarios were created to explore the ability of the GWFP models to predict the performance of individuals and family pools. All families with more than ten individuals (59 families in full) were randomly split up into ii equivalent size groups. For one group of individuals phenotypic and genotypic data were pooled at the family unit level and used every bit the grooming fix for GWFP models. The other grouping of individuals was used equally the validation set based on two approaches: (ane) predicting the performance of individuals trees not included in the preparation set (GWFP_Fam_Ind), and (2) pooling individuals at the family unit level to predict functioning of families equanimous of individuals non included in the training set (GWFP_Fam_Fam).

Prediction in the following generation using GEBV and GWFP in the simulated population

The genomic prediction models were developed by using the G2 CCLONES_sim population equally the training fix. These training models were used and validated in the G3 generation using GEBV and GWFP, and models were assessed by calculating predictive ability and prediction accurateness. Predicted ability was estimated by calculating a Pearson's correlation betwixt the phenotypic values and the estimated breeding values, and prediction accuracy was estimated past computing a Pearson's correlation between the real breeding value and the estimated breeding value.

Results

Minimum number of individuals per family to guess allele frequency and family phenotypic mean

The minimum number of individuals per family was calculated assessing allele frequency and phenotypic mean deviations using families with at least xv individuals. For genotypic and phenotypic information, the lowest number of individuals needed to accurately approximate allele frequency and family means was six (Figure 1). Allele frequency deviations (Effigy ane, A–D) and mean phenotypic deviations (Effigy 1, Eastward and F) indicated that families with less than six individuals were not providing accurate estimates of the family's genotypic and phenotypic means in both populations. Nosotros assumed that the observed values based on xv individuals per family provides with a reasonable estimation of allele frequency and phenotypic mean for a diploid species. Therefore, all 63 families with six or more individuals were used for further analyses in this written report. Both populations showed similar trends for the genotypic and phenotypic estimates (Figure 1). The average allele frequency deviations were lower for SNPs exhibiting a 1:1 ratio in both populations (Figure 1, A and B), compared with SNPs segregating into a 1:ii:1 ratio (Figure one, C and D). For phenotypic data, CCLONES_sim showed slightly smaller deviations, especially for a lower number of individuals (Effigy 1F), compared with CCLONES_real for the trait diameter (Figure 1E). Other traits in CCLONES_real exhibited a similar beliefs (data not shown).

Figure ane

Average allele frequency deviation (A–D) and family mean phenotypic deviation (E and F) in CCLONES_real (real breeding population composed of 63 families) (A, C, and E) and CCLONES_sim (simulated breeding population exhibiting similar genetic properties of CCLONES_real) (B, D, and F) calculated by increasing the number of individuals from 1 to 15. Five families exhibiting genotypic segregation ratios 1:1 (A and B) and 1:2:1 (C and D) for single nucleotide polymorphisms were included in the analysis. The CCLONES_real phenotypic deviation is for the trait stem diameter (E).

Average allele frequency deviation (A–D) and family mean phenotypic divergence (Due east and F) in CCLONES_real (real breeding population equanimous of 63 families) (A, C, and E) and CCLONES_sim (imitation breeding population exhibiting like genetic properties of CCLONES_real) (B, D, and F) calculated by increasing the number of individuals from 1 to xv. V families exhibiting genotypic segregation ratios ane:1 (A and B) and 1:2:1 (C and D) for unmarried nucleotide polymorphisms were included in the analysis. The CCLONES_real phenotypic difference is for the trait stem diameter (E).

Figure one

Average allele frequency deviation (A–D) and family mean phenotypic deviation (E and F) in CCLONES_real (real breeding population composed of 63 families) (A, C, and E) and CCLONES_sim (simulated breeding population exhibiting similar genetic properties of CCLONES_real) (B, D, and F) calculated by increasing the number of individuals from 1 to 15. Five families exhibiting genotypic segregation ratios 1:1 (A and B) and 1:2:1 (C and D) for single nucleotide polymorphisms were included in the analysis. The CCLONES_real phenotypic deviation is for the trait stem diameter (E).

Average allele frequency difference (A–D) and family mean phenotypic deviation (E and F) in CCLONES_real (real breeding population composed of 63 families) (A, C, and Due east) and CCLONES_sim (simulated convenance population exhibiting like genetic properties of CCLONES_real) (B, D, and F) calculated by increasing the number of individuals from one to fifteen. Five families exhibiting genotypic segregation ratios 1:1 (A and B) and 1:two:1 (C and D) for single nucleotide polymorphisms were included in the analysis. The CCLONES_real phenotypic divergence is for the trait stem bore (E).

Predictive ability of statistical methods for genomic prediction and for different cross-validation schemes

Ii Bayesian statistical methods (Bayes B and Bayes RR) and two cross-validation approaches were used to examination the predictive ability of GWFP in four traits measured in CCLONES_real (Figure 2). Both statistical methods yielded loftier and similar predictive abilities for the 4 traits (Effigy 2, A and B). However, standard errors for predictive ability were larger with the leave-one-out approach (Effigy 2, A and B). Additionally, GWFP predictive abilities obtained with the leave-1-out approach were slightly lower than for the ten-fold cross-validation scheme (except for trait Stiffness) (Figure 2, A and B). Therefore, the 10-fold cross validation approach was selected to perform farther analyses.

Figure ii

Average predictive ability using family pools (GWFP) in four traits in the loblolly pine breeding population CCLONES obtained with 10-fold and leave-one-out cross-validation schemes using Bayes B (A) and Bayes RR (B).

Average predictive power using family pools (GWFP) in four traits in the loblolly pine convenance population CCLONES obtained with ten-fold and leave-one-out cross-validation schemes using Bayes B (A) and Bayes RR (B).

Effigy 2

Average predictive ability using family pools (GWFP) in four traits in the loblolly pine breeding population CCLONES obtained with 10-fold and leave-one-out cross-validation schemes using Bayes B (A) and Bayes RR (B).

Boilerplate predictive ability using family unit pools (GWFP) in iv traits in the loblolly pine convenance population CCLONES obtained with ten-fold and get out-one-out cross-validation schemes using Bayes B (A) and Bayes RR (B).

Predictive ability of GWFP using training/validation sets with contrasting phenotypes

The effect of phenotypic information in the predictive power of GWFP was explored by creating five validation sets using contrasting sets of phenotypic data betwixt training set up and validation set (Figure 3A). The predictive ability for GWFP for all traits were least accurate and had larger standard errors when the validation prepare was composed of families exhibiting small-scale and large phenotypic values (bottom and pinnacle classes; Figure 3B). When validation sets were composed of families exhibiting phenotypes corresponding to the eye class, predictive ability increased for all traits, but standard errors were still large (Effigy 3B). As expected, at that place was an increase in predictive ability and a large reduction in standard errors when validation sets were composed of families showing similar phenotypic mean and variance to the training prepare, corresponding to the classes "Low+High" and "Combined" (Effigy 3B).

Figure three

Phenotypic distribution for testing (orange) and validation (white) sets for fours traits measured the CCLONES_real population and two traits simulated using CCLONES_sim (A). Average predictive ability obtained with Bayes B using GWFP for four traits in the CCLONES_real (lignin, stiffness, rust, and diameter), and two traits with different genetic architecture (Oligogenic and Polygenic) in the CCLONES_sim populations (B). Five scenarios were tested by creating training (56 families) and validation (7 families) populations using phenotypic data: (1) Low: validation set is composed of seven families with lowest phenotypic records; (ii) High: validation set is composed of seven families with highest phenotypic records; (iii) Middle: validation set is composed of seven families with phenotypic records similar to the family mean; (iv) Combined: two families from Low, two families from High, and three families from Middle; and (v) Low + High: four families from Low and three families from High.

Phenotypic distribution for testing (orange) and validation (white) sets for fours traits measured the CCLONES_real population and two traits faux using CCLONES_sim (A). Average predictive power obtained with Bayes B using GWFP for four traits in the CCLONES_real (lignin, stiffness, rust, and diameter), and 2 traits with different genetic architecture (Oligogenic and Polygenic) in the CCLONES_sim populations (B). Five scenarios were tested by creating training (56 families) and validation (7 families) populations using phenotypic data: (1) Low: validation set is composed of seven families with lowest phenotypic records; (ii) High: validation ready is equanimous of seven families with highest phenotypic records; (3) Middle: validation set is composed of vii families with phenotypic records similar to the family mean; (iv) Combined: 2 families from Depression, two families from High, and 3 families from Center; and (v) Low + High: four families from Depression and 3 families from High.

Effigy 3

Phenotypic distribution for testing (orange) and validation (white) sets for fours traits measured the CCLONES_real population and two traits simulated using CCLONES_sim (A). Average predictive ability obtained with Bayes B using GWFP for four traits in the CCLONES_real (lignin, stiffness, rust, and diameter), and two traits with different genetic architecture (Oligogenic and Polygenic) in the CCLONES_sim populations (B). Five scenarios were tested by creating training (56 families) and validation (7 families) populations using phenotypic data: (1) Low: validation set is composed of seven families with lowest phenotypic records; (ii) High: validation set is composed of seven families with highest phenotypic records; (iii) Middle: validation set is composed of seven families with phenotypic records similar to the family mean; (iv) Combined: two families from Low, two families from High, and three families from Middle; and (v) Low + High: four families from Low and three families from High.

Phenotypic distribution for testing (orange) and validation (white) sets for fours traits measured the CCLONES_real population and 2 traits simulated using CCLONES_sim (A). Boilerplate predictive ability obtained with Bayes B using GWFP for four traits in the CCLONES_real (lignin, stiffness, rust, and diameter), and two traits with different genetic architecture (Oligogenic and Polygenic) in the CCLONES_sim populations (B). 5 scenarios were tested by creating training (56 families) and validation (7 families) populations using phenotypic data: (1) Low: validation set is composed of seven families with lowest phenotypic records; (ii) High: validation set is equanimous of vii families with highest phenotypic records; (iii) Centre: validation set is composed of seven families with phenotypic records like to the family mean; (4) Combined: two families from Low, ii families from High, and three families from Middle; and (5) Low + High: 4 families from Low and 3 families from Loftier.

Predictive ability of GEBV and GWFP

Predictive ability obtained with Bayes B using dissimilar methods and schemes (Tabular array 1) is presented in Figure 4 for the 63 families from both populations. The traditional genomic prediction approach with individuals in the training prepare and validation set (GEBV) was assorted with predictive ability obtained with the family-based (GWFP) method post-obit a 10-fold cantankerous validation scheme. The scenarios GWFP_Fam_Ind and GWFP_Fam_Fam were run only once because CCLONES (existent and simulated) had a limited number of individuals per family.

Figure iv

Average predictive ability obtained with Bayes B for four traits in CCLONES-real (lignin, tree stiffness, rust and stem diameter), and two traits with different genetic architecture (Oligogenic and Polygenic) in the CCLONES_sim populations using different genomic prediction methods. GEBV: genomic estimated breeding values individual trees; GWFP_Fam_Ind: genome-wide family prediction using 59 family pools as training set, while different individuals from the same families were used as validation set; GWFP_Fam_Fam: genome-wide family prediction using 59 family pools as the training and validation population, but different full-sib individuals were pooled in both sets; GWFP: genome-wide family prediction using 63 family pools in a 10-fold cross validation scheme. Narrow-sense heritability (h2) estimated at the individual level (Resende et al. 2012).

Boilerplate predictive power obtained with Bayes B for iv traits in CCLONES-real (lignin, tree stiffness, rust and stalk diameter), and two traits with different genetic compages (Oligogenic and Polygenic) in the CCLONES_sim populations using different genomic prediction methods. GEBV: genomic estimated breeding values individual trees; GWFP_Fam_Ind: genome-wide family unit prediction using 59 family pools as training set, while different individuals from the same families were used as validation ready; GWFP_Fam_Fam: genome-wide family prediction using 59 family pools every bit the training and validation population, but different full-sib individuals were pooled in both sets; GWFP: genome-broad family unit prediction using 63 family unit pools in a 10-fold cross validation scheme. Narrow-sense heritability (h2 ) estimated at the individual level ( Resende et al. 2012).

Figure four

Average predictive ability obtained with Bayes B for four traits in CCLONES-real (lignin, tree stiffness, rust and stem diameter), and two traits with different genetic architecture (Oligogenic and Polygenic) in the CCLONES_sim populations using different genomic prediction methods. GEBV: genomic estimated breeding values individual trees; GWFP_Fam_Ind: genome-wide family prediction using 59 family pools as training set, while different individuals from the same families were used as validation set; GWFP_Fam_Fam: genome-wide family prediction using 59 family pools as the training and validation population, but different full-sib individuals were pooled in both sets; GWFP: genome-wide family prediction using 63 family pools in a 10-fold cross validation scheme. Narrow-sense heritability (h2) estimated at the individual level (Resende et al. 2012).

Average predictive ability obtained with Bayes B for four traits in CCLONES-real (lignin, tree stiffness, rust and stem diameter), and two traits with dissimilar genetic architecture (Oligogenic and Polygenic) in the CCLONES_sim populations using different genomic prediction methods. GEBV: genomic estimated breeding values individual copse; GWFP_Fam_Ind: genome-wide family unit prediction using 59 family unit pools equally training set, while unlike individuals from the same families were used as validation set; GWFP_Fam_Fam: genome-wide family prediction using 59 family pools every bit the training and validation population, but different full-sib individuals were pooled in both sets; GWFP: genome-broad family prediction using 63 family pools in a 10-fold cantankerous validation scheme. Narrow-sense heritability (htwo ) estimated at the individual level ( Resende et al. 2012).

Tabular array 1

Scenarios implemented to design training and validation sets to test predictive ability of genomic prediction models

Scenario Set
Grooming Validation
GEBV 830 individuals 93 individuals
GWFP 56 families 7 families
GWFP_Fam_Ind 59 families 422 individuals
GWFP_Fam_Fam 59 families 59 families
GWFP_Low 56 families 7 families with lowest phenotypic values
GWFP_High 56 families 7 families with highest phenotypic values
GWFP_Low_High 56 families vii families, iv low and 3 loftier phenotypic values
GWFP_Middle 56 families 7 families with values similar to the overall mean
GWFP_Combined 56 families vii families (ii low, 2 high and three heart scenarios)
Scenario Prepare
Training Validation
GEBV 830 individuals 93 individuals
GWFP 56 families 7 families
GWFP_Fam_Ind 59 families 422 individuals
GWFP_Fam_Fam 59 families 59 families
GWFP_Low 56 families 7 families with lowest phenotypic values
GWFP_High 56 families 7 families with highest phenotypic values
GWFP_Low_High 56 families 7 families, iv low and 3 high phenotypic values
GWFP_Middle 56 families 7 families with values like to the overall mean
GWFP_Combined 56 families 7 families (two low, ii high and 3 eye scenarios)

GEBV, genomic estimated breeding value; GWFP, genome-wide family prediction; CV, cross-validation.

Table 1

Scenarios implemented to pattern training and validation sets to exam predictive ability of genomic prediction models

Scenario Gear up
Grooming Validation
GEBV 830 individuals 93 individuals
GWFP 56 families 7 families
GWFP_Fam_Ind 59 families 422 individuals
GWFP_Fam_Fam 59 families 59 families
GWFP_Low 56 families vii families with lowest phenotypic values
GWFP_High 56 families 7 families with highest phenotypic values
GWFP_Low_High 56 families 7 families, 4 low and 3 loftier phenotypic values
GWFP_Middle 56 families 7 families with values similar to the overall hateful
GWFP_Combined 56 families 7 families (2 depression, 2 high and 3 middle scenarios)
Scenario Set
Training Validation
GEBV 830 individuals 93 individuals
GWFP 56 families 7 families
GWFP_Fam_Ind 59 families 422 individuals
GWFP_Fam_Fam 59 families 59 families
GWFP_Low 56 families 7 families with lowest phenotypic values
GWFP_High 56 families 7 families with highest phenotypic values
GWFP_Low_High 56 families seven families, 4 low and iii high phenotypic values
GWFP_Middle 56 families 7 families with values similar to the overall mean
GWFP_Combined 56 families 7 families (2 low, 2 high and three centre scenarios)

GEBV, genomic estimated breeding value; GWFP, genome-broad family prediction; CV, cross-validation.

Predictive ability was e'er greater for GWFP methods in both populations and all traits, except for the scenario GWFP_Fam_Ind that showed similar or lower accurateness than GEBV for virtually traits (Figure four). Additionally, predictive power was greater for traits with higher heritability (Figure 4). Specifically, GWFP provided predictive abilities at least xl% greater than traditional GEBV for most of the traits in both populations. Moreover, GWFP_Fam_Fam exhibited like or greater predictive ability than GWFP for most traits in both populations, except for rust (Figure 4). Both sets of traits from the simulated CCLONES population exhibited very similar accuracies for all schemes (Figure 4).

Predictive ability and accurateness of GEBV and GWFP in the following generation

Accuracy and predictive ability of GEBV and GWFP were obtained with the prediction models built with the CCLONES_sim (G2) population equally the training set, and models were validated in the post-obit generation (G3). The GEBV showed higher accuracy than GWFP for the oligogenic trait, and like accuracy for the polygenic trait (Figure five). Predictive ability for the oligogenic and polygenic traits were higher for GWFP (Effigy 5). Additionally, greater predictive ability and accuracy were observed for the oligogenic trait, and the departure between accuracy and predictive ability was greater for the oligogenic trait (Effigy 5).

Figure 5

Average predictive ability and accuracy obtained with Bayes B for two traits with different genetic architecture (Oligogenic and Polygenic) in the CCLONES_sim_progeny population, obtained with individual (GEVB) and family-pooled (GWFP) genomic prediction methods. Predictive ability calculated as the correlation between estimated breeding and phenotypic values are denoted as _Pheno, and accuracy as the correlation between estimated and true breeding values as _BV.

Average predictive ability and accuracy obtained with Bayes B for two traits with different genetic architecture (Oligogenic and Polygenic) in the CCLONES_sim_progeny population, obtained with individual (GEVB) and family-pooled (GWFP) genomic prediction methods. Predictive power calculated as the correlation between estimated convenance and phenotypic values are denoted as _Pheno, and accuracy as the correlation between estimated and truthful breeding values every bit _BV.

Figure 5

Average predictive ability and accuracy obtained with Bayes B for two traits with different genetic architecture (Oligogenic and Polygenic) in the CCLONES_sim_progeny population, obtained with individual (GEVB) and family-pooled (GWFP) genomic prediction methods. Predictive ability calculated as the correlation between estimated breeding and phenotypic values are denoted as _Pheno, and accuracy as the correlation between estimated and true breeding values as _BV.

Average predictive power and accuracy obtained with Bayes B for two traits with different genetic compages (Oligogenic and Polygenic) in the CCLONES_sim_progeny population, obtained with individual (GEVB) and family-pooled (GWFP) genomic prediction methods. Predictive power calculated as the correlation between estimated convenance and phenotypic values are denoted as _Pheno, and accuracy as the correlation between estimated and true breeding values as _BV.

Discussion

Nosotros quantified the predictive ability of GWFP in real and false loblolly pine breeding populations for different traits and cross-validation approaches. Moderate to low predictive ability values were obtained with the traditional genomic prediction approach, every bit previously reported for both populations, using individual trees as the basic phenotypic and genotypic unit ( Resende et al. 2012; de Almeida Filho et al. 2016). In general, GWFP outperformed GEBV in the predictive ability for well-nigh traits; including the predictive ability for the oligogenic and polygenic traits in CCLONES_sim when using the following generation (G3) as the validation gear up.

Effect of family size in genomic prediction

The size and structure of the training population affects the accurateness of genomic prediction models ( VanRaden et al. 2009; Daetwyler et al. 2010; Habier et al. 2010; Grattapaglia and Resende 2011; Edwards et al. 2019; de Bem Oliveira et al. 2020). In our study, the size of the training ready refers to the number of families and the number of individuals within a family. The number of families was fixed and limited to 70 families, so nosotros did non focus on studying the effect of a variable number of families. Yet, the minimum number of individuals per family to obtain reasonable accurate estimates of family allele frequency and family phenotypic mean was found to exist six. When studying the issue of size and limerick of training population in blueberry (Vaccinium spp.), de Bem Oliveira et al. (2020) found a high predictive ability using vi individuals per family for some traits. Thus, in their written report family variance was accurately represented with vi individuals per family unit in this autotetraploid species. Using the estimator of the Northward due east within a total sib family, given past Ne = [2due north/(due north + one)] (Resende and Barbosa 2006), the maximum (when due north goes to infinite) Ne within a full sib family unit is two. With due north equal to 6 individuals the Neast is one.71, which is 86% of the maximum ii. So, n = 6 appears adequate to represent genetically a total-sib family, corroborating our results.

The event of number of individuals within families on accurateness of genomic prediction models was besides demonstrated in perennial ryegrass ( Pembleton et al. 2016, 2018). The authors stated that 48–60 individuals per population are necessary to accurately represent the genetic diversity within a ryegrass population. As an allogamous species, multiple parents are used to create synthetic populations in perennial ryegrass; hence, multiple individuals with a high number of loci in heterozygosis are contributing to the variation in the synthetic population. Perennial ryegrass is unremarkably bred using families and GWPF has been exploited in the species for various traits ( et al. 2015, 2016; Cericola et al. 2018; Guo et al. 2018).

Simulation studies with variable numbers of families and individuals per family would help place the optimum training population sizes for GWFP. Generally, a larger training population (more families in the preparation population) yield college accurateness ( Voss-Fels et al. 2019; de Bem Oliveira et al. 2020), but this is associated with college costs. Therefore, the definition of the optimum number of families, and number of individuals per family are a crucial point for the genomic prediction process. Fé et al. (2015) studied the effect of the number of families in the accuracy of genomic prediction for heading appointment in ryegrass; the authors found loftier accuracies with a depression number of families (<100). The authors showed that increasing the number of families to 500 leads to college accuracy, and more than 500 families did not yield to meaning improvement.

Efficiency of statistical methods and cantankerous-validation schemes

Models considering different Bayesian methods were like in predicting GEBV in traits measured in the real convenance population and the simulated population in this report. Resende et al. (2012), reported a slightly greater predictive ability in the real population for rust incidence with Bayesian methods over RR-BLUP, because fewer genes with large effects control this trait. de Almeida Filho et al. (2016), using the simulated population, reported a slightly lower predictive ability in the oligogenic trait using Bayes RR than Bayes B. In this study, Bayes B and Bayes RR were tested to compare their operation in GWFP because prior distributions and assumptions for both methods are contrasting (Pérez and de Los Campos 2014). Our results showed that both Bayesian methodologies were very similar in predicting family-pools, fifty-fifty for rust incidence in the existent population and for the oligogenic trait in the simulated population.

Both cantankerous-validation schemes, leave-one-out and x-fold, produced similar results in predicting GWFP with a slight advantage for the ten-fold scheme, due to the large variation in the leave-one-out scheme. Resende et al. (2012) reported similar results with the real information set for GEBV, wherein 10-fold and leave-one-out resulted in no pregnant differences in their predictive ability. Also, like predictive abilities betwixt the 10-fold and get out-one-out scheme accept been reported in wheat (Triticum aestivum L.) ( Edwards et al. 2019).

Predictive ability of GWFP using contrasting phenotypes

When the families in the validation gear up had phenotypic values outside the range of phenotypes presented in the grooming set (bottom and superlative classes), lower and much more variable predictive abilities were obtained. Interestingly, college predictive abilities were obtained when families in the validation set had the same phenotypic range as the grooming set. The touch of the phenotypic variance on prediction was demonstrated past Edwards et al. (2019), which reported that the accuracy of genomic prediction in wheat showed higher predictions for crosses (validation fix) with higher phenotypic variance. Würschum et al. (2017) reported equivalent results in triticale (x Triticosecale Wittmack), in which college accuracy was detected for the traits of plant pinnacle and biomass in cases in which families with a large phenotypic variation were included in the training/validation set population.

The differences in predictive power among the scenarios for phenotypic values in the validation set could likewise be related to the limerick of the training sets. For the extreme scenarios (Low and High), the preparation sets did non accept the extreme phenotypic values and alleles frequencies, which could have resulted in poor estimations of markers effects. Studying the optimization process for genomic prediction in wheat, Norman et al. (2018) showed that the genomic prediction accuracy could be improved, in cases when training set and validation set up are not related, by increasing the genetic diversity in the training prepare.

Predictive power of GEBV and GWFP for different traits and scenarios

Predictive ability was always greater for GWFP methods than GEBV in both the existent and simulated populations and for all traits, except when the model was congenital with family pools, and individual performance was predicted (GWFP_Fam_Ind) (Figure 4). Although the total sib families average explores only one-half of condiment genetic variance, the error variance is mitigated with larger number of observations due progeny replication, when compared with single observations ( Hallauer et al. 2010). And so, this higher precision of phenotypic value in family bulks could explain the college accuracy in genomic prediction of families.

The college accurateness in the GWFP method was expected since the additive genetic variance explored in this method is just 50% of the additive genetic variance compared with the GEBV. The genotypic value of a family is equal to the mean breeding value of the two parents: ¼(Va +Va') = ½Va (ignoring the authorization and epistasis furnishings), so the additive variance among total-sib families is only 50% of the full additive variance, whereas the other fifty% represents the variance within a family, which leads to higher accurateness and heritability (Casler and Brummer 2008; Ashraf et al. 2014). Besides, relatedness betwixt the training prepare and the validation southet aland then influence the predictive ability. The relationship between the training gear up and the validation set has a crucial role in the model predictive power (Lorenz and Smith 2015; de Bem Oliveira et al. 2020), information technology can help explain the college predictive ability found in the GWFP_Fam_Fam and GWFP, compared with the GEBV and GWFP_Fam_Ind.

Nevertheless, the predictive ability for virtually traits obtained with GWFP_Fam_Ind scheme was of the same social club of magnitude compared with GEBV, except for the traits stiffness and rust. Therefore, using the numbers from this study equally example, considering the significant reduction in costs incurred in Dna extraction and genotyping 56 families (training set for GWFP), instead of 844 individuals (training set for GEBV), the approach GWFP_Fam_Ind could still be an affordable option for implementing genomic prediction in convenance programs that select private plants, but have limited budgets to phenotype and genotype all individuals in the training fix.

Reduced investments to implementation of genomic prediction with higher predictive ability accuracies tin be obtained with the GWFP approach compared with GEBV. A larger number of families can be included in the models, which, for the present population, would likely result in higher predictive abilities every bit reported in perennial ryegrass for heading date (Fé et al. 2015). Additionally, including more than 10 individuals per family will reduce the sampling variability of the allele frequency and phenotypic hateful, resulting in higher genomic accuracies ( de Bem Oliveira et al. 2020).

Application of GWFP in a breeding program

Genomic prediction has the power to shorten the time of a breeding procedure, which leads to a higher genetic gain per unit fourth dimension, and can allow a reduction in phenotyping process and costs (Grattapaglia and Resende 2011; Crossa et al. 2017; Voss-Fels et al. 2019). However, in some cases, breeders need to genotype a large number of individuals (>x,000) to implement genomic prediction in their programs, increasing costs significantly ( Voss-Fels et al. 2019). The high genotyping costs due to large population sizes can make it impracticable to implement genomic prediction in minor crops, especially in public convenance programs.

For breeding programs with express budgets, the GWFP tin can be an alternative to GEBV due to the reduction in phenotypic and genotypic costs to develop prediction models. GWFP has been used in several forage species that are bred in family bulks and whose phenotyping for disquisitional traits is conducted at the sward/plot level (Fé et al. 2015, 2016; Annicchiarico et al. 2015; Biazzi et al. 2017; Jia et al. 2018; Cericola et al. 2018; Guo et al. 2018). In a GEBV approach, the information (phenotypic and genotypic) is nerveless at the individual level and models are congenital to estimate the performance of individuals ( Figure 6A; Resende et al. 2012; de Almeida Filho et al. 2016, 2019). The GEBV requires significant more resources (labor, economic, and computational) to collect and analyze data. Nether a GWFP approach, the number of genotypic samples (bulked DNA and a single-sequencing effort per family) will exist the exact number of families, representing a significant reduction in the number of samples compared with the traditional GEBV process (Figure 6B). The phenotyping process volition also exist performed at the family/plot level, which is the ideal scenario for critical traits in some crops such every bit fodder and turfgrass species.

Figure 6

Scheme for the different genomic prediction scenarios: (A) GEBV: genomic estimated breeding values for individual trees; (B) GWFP_Fam_Fam: genome-wide family prediction for families prediction; (C) GWFP_Fam_Ind: genome-wide family prediction applied in the selection of individuals.

Scheme for the unlike genomic prediction scenarios: (A) GEBV: genomic estimated breeding values for private copse; (B) GWFP_Fam_Fam: genome-broad family prediction for families prediction; (C) GWFP_Fam_Ind: genome-broad family unit prediction applied in the option of individuals.

Figure 6

Scheme for the different genomic prediction scenarios: (A) GEBV: genomic estimated breeding values for individual trees; (B) GWFP_Fam_Fam: genome-wide family prediction for families prediction; (C) GWFP_Fam_Ind: genome-wide family prediction applied in the selection of individuals.

Scheme for the unlike genomic prediction scenarios: (A) GEBV: genomic estimated breeding values for individual trees; (B) GWFP_Fam_Fam: genome-broad family prediction for families prediction; (C) GWFP_Fam_Ind: genome-wide family prediction applied in the choice of individuals.

Breeders may likewise be interested in employing the GWFP_Fam_Ind approach, where family unit bulks are used as training gear up, simply individuals are the selection unit (Effigy 6C). In this written report, the GWFP_Fam_Ind approach showed similar accuracy to GEBV for about traits, with the addition of lower needs for phenotypic and genotypic data for the model development. Finally, GWFP models could be exploited in scenarios when remnant seeds might be bachelor for the same family, and the goal would exist to predict the operation of the family or individuals within the family unit. The remaining seeds from the selected families tin be used later on to examination their merits in further replicated field trials. For perennial allogamous crops, families used in the training set up can be used as a new crossing block to starting time a new selection bicycle.

Conclusion

Despite the limitation in number of families and number of individuals per family unit tested in this study, less than six individuals per family produced inaccurate estimates of family phenotypic performance and allele frequency. Validation sets with similar phenotypic mean and variance as the grooming set showed greater predictive ability and more than authentic predictions consistently across traits. These results revealed great potential for using GWFP in convenance programs that select family unit bulks as the selection unit, GWFP is well suited for crops that are routinely genotyped and phenotyped at the plot-level. The GWFP approach can also be extended to breeding schemes where family bulks can serve as training sets, while individuals are the selection target.

Data availability

All phenotypic and genotypic data utilized in this written report have been previously published as a standard information set for development of genomic prediction methods ( Resende et al. 2012; de Almeida Filho et al. 2016). Simulated information available from the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.3126v.

Conflicts of interest

None declared.

Literature cited

Amadeu

RR

,

Ferrão

LFV

,

Oliveira

IDB

,

Benevenuto

J

,

Endelman

JB

, et al.

2020

.

Impact of authorization effects on autotetraploid genomic prediction

.

Crop Sci

.

lx

:

656

665

.

Annicchiarico

P

,

Nazzicari

N

,

Li

X

,

Wei

Y

,

Pecetti

Fifty

, et al.

2015

.

Accuracy of genomic choice for alfalfa biomass yield in different reference populations

.

BMC Genomics

.

16

:

1020

.

Ashraf

BH

,

Jensen

J

,

Asp

T

,

Janss

LL.

2014

.

Association studies using family pools of outcrossing crops based on allele-frequency estimates from DNA sequencing

.

Theor Appl Genet

.

127

:

1331

1341

.

Baltunis

BS

,

Huber

DA

,

White

TL

,

Goldfarb

B

,

Stelzer

HE.

2007

.

Genetic gain from selection for rooting ability and early growth in vegetatively propagated clones of loblolly pine

.

Tree Genet Genomes

.

3

:

227

238

.

Barbosa

MHP

,

Resende

MDV

,

Dias

LADS

,

Barbosa

GVDS

,

Oliveira

RAD

, et al.

2012

.

Genetic improvement of sugar cane for bioenergy: the Brazilian experience in network enquiry with RIDESA

.

Ingather Breed Appl Biotechnol

.

12

:

87

98

.

Biazzi

Eastward

,

Nazzicari

Due north

,

Pecetti

L

,

Brummer

EC

,

Palmonari

A

, et al.

2017

.

Genome-wide clan mapping and genomic selection for alfalfa (Medicago sativa) fodder quality traits

.

PLoS One

.

12

:

e0169234

.

Brascamp

EW

,

Bijma

P.

2019

.

A note on genetic parameters and accurateness of estimated breeding values in love bees

.

Genet Sel Evol

.

51

:

i

vi

.

Casler

MD

,

Brummer

EC.

2008

.

Theoretical expected genetic gains for amid-and-inside-family option methods in perennial forage crops

.

Crop Sci

.

48

:

890

902

.

Cericola

F

,

Lenk

I

,

D

,

Byrne

S

,

Jensen

CS

, et al.

2018

.

Optimized use of low-depth genotyping-by-sequencing for genomic prediction among multi-parental family pools and unmarried plants in perennial ryegrass (Lolium perenne L.)

.

Front end Found Sci

.

9

:

369

.

Chen

GK

,

Marjoram

P

,

Wall

JD.

2009

.

Fast and flexible simulation of Dna sequence data

.

Genome Res

.

xix

:

136

142

.

Combs

E

,

Bernardo

R.

2013

.

Accurateness of genomewide selection for unlike traits with constant population size, heritability, and number of markers

.

Plant Genome

.

6

:

1

7

.

Crossa

J

,

Pérez-Rodríguez

P

,

Cuevas

J

,

Montesinos-López

O

,

Jarquín

D

, et al.

2017

.

Genomic selection in institute breeding: methods, models, and perspectives

.

Trends Plant Sci

.

22

:

961

975

.

Daetwyler

Hard disk drive

,

Pong-Wong

R

,

Villanueva

B

,

Woolliams

JA.

2010

.

The bear on of genetic compages on genome-wide evaluation methods

.

Genetics

.

185

:

1021

1031

.

de Almeida Filho

JE

,

Guimarães

JFR

,

Silva

FFE

,

de Resende

MDV

,

Muñoz

P

, et al.

2016

.

The contribution of dominance to phenotype prediction in a pine breeding and simulated population

.

Heredity (Edinb)

.

117

:

33

41

.

de Almeida Filho

JE

,

Guimarães

JFR

,

Silva

FFE

,

de Resende

MDV

,

Muñoz

P

, et al.

2019

.

genomic prediction of additive and not-additive effects using genetic markers and pedigrees

.

G3 (Bethesda)

.

9

:

2739

2748

.

de Bem Oliveira

I

,

Amadeu

RR

,

Ferrão

LFV

,

Muñoz

PR.

2020

.

Optimizing whole-genomic prediction for autotetraploid huckleberry breeding

.

Heredity (Edinb)

.

125

:

437

448

.

Eckert

AJ

,

van Heerwaarden

J

,

Wegrzyn

JL

,

Nelson

CD

,

Ross-Ibarra

J

, et al.

2010

.

Patterns of population structure and environmental associations to aridity across the range of loblolly pine (Pinus taeda L., Pinaceae)

.

Genetics

.

185

:

969

982

.

Edwards

SM

,

Buntjer

JB

,

Jackson

R

,

Bentley

AR

,

Lage

J

, et al.

2019

.

The effects of training population pattern on genomic prediction accurateness in wheat

.

Theor Appl Genet

.

132

:

1943

1952

.

Elshire

RJ

,

Glaubitz

JC

,

Sun

Q

,

Poland

JA

,

Kawamoto

K

, et al.

2011

.

A robust, simple genotyping- by-sequencing (GBS) approach for high diversity species

.

PLoS One

.

half-dozen

:

e19379

.

Esfandyari

H

,

D

,

Tessema

BB

,

Janss

L

,

Jensen

J.

2020

.

Furnishings of dissimilar strategies for exploiting genomic selection in perennial ryegrass breeding programs

.

G3 (Bethesda)

.

x

:

3783

3795

.

Falconer

DS

,

Mackay

FC.

1996

. Introdutction to quantitative genetics. In:

Introdutction to Quantitative Genetics

.

New York

:

John Wiley & Sons

.

D

,

Cericola

F

,

Byrne

Southward

,

Lenk

I

,

Ashraf

BH

, et al.

2015

.

Genomic dissection and prediction of heading date in perennial ryegrass

.

BMC Genomics

.

16

:

921

.

D

,

Ashraf

BH

,

Pedersen

MG

,

Janss

L

,

Byrne

S

, et al.

2016

.

Accuracy of genomic prediction in a commercial perennial ryegrass convenance programme

.

Establish Genome

.

9

:

1

12

.

Grattapaglia

D

,

Resende

Dr..

2011

.

Genomic choice in woods tree breeding

.

Tree Genet Genomes

.

seven

:

241

255

.

Gezan

SA

,

Osorio

LF

,

Verma

S

,

Whitaker

VM.

2017

.

An experimental validation of genomic option in octoploid strawberry

.

Hortic Res

.

4

:

i

9

.

Gianola

D.

2013

.

Priors in whole-genome regression: the Bayesian alphabet returns

.

Genetics

.

194

:

573

596

.

Gianola

D

,

de los Campos

G

,

Hill

WG

,

Manfredi

E

,

Fernando

R.

2009

.

Additive genetic variability and the Bayesian alphabet

.

Genetics

.

183

:

347

363

.

Guo

X

,

Cericola

F

,

D

,

Pedersen

MG

,

Lenk

I

, et al.

2018

.

Genomic prediction in tetraploid ryegrass using allele frequencies based on genotyping past sequencing

.

Front Plant Sci

.

9

:

1165

.

Habier

D

,

Tetens

J

,

Seefried

FR

,

Lichtner

P

,

Thaller

1000.

2010

.

The impact of genetic relationship information on genomic convenance values in German Holstein cattle

.

Genet Sel Evol

.

42

:

5

.

Hallauer

AR

,

Carena

MJ

,

Miranda Filho

JB.

2010

.

Quantitative Genetics in Maize Breeding

. Springer, New York, USA:

Springer Scientific discipline & Business Media

.

Hayes

BJ

,

Daetwyler

HD

,

Bowman

P

,

Moser

Chiliad

,

Tier

B

, et al.

2009

.

Accuracy of genomic option: comparing theory and results

.

Proc Assoc Advmt Anim Breed Genet

.

18

:

34

37

.

Hickey

JM

,

Gorjanc

G.

2012

.

Imitation information for genomic choice and genome-wide clan studies using a combination of coalescent and gene drib methods

.

G3 (Bethesda)

.

2

:

425

427

.

Hough

J

,

Williamson

RJ

,

Wright

SI.

2013

.

Patterns of selection in found genomes

.

Annu Rev Ecol Evol Syst

.

44

:

31

49

.

Jia

C

,

Zhao

F

,

Wang

X

,

Han

J

,

Zhao

H

, et al.

2018

.

Genomic prediction for 25 agronomic and quality traits in alfalfa (Medicago sativa)

.

Forepart Plant Sci

.

nine

:

1220

.

Johnson

R

,

Clair

BS

,

Lipow

Due south.

2001

.

Genetic conservation in applied tree breeding programs

. In: Bart A, Thielges BA, Sastrapradja SD, Rimbawanto A (eds) Proceedings of the ITTO conference on in situ and ex situ conservation of commercial tropical trees. ITTO, Yokohama, Nippon, pp.

215

230

.

Kumar

South

,

Chagné

D

,

Bink

MC

,

Volz

RK

,

Whitworth

C

, et al.

2012

.

Genomic selection for fruit quality traits in apple (Malus× domestica Borkh

.

PLoS Ane

.

seven

:

e36674

.

Lara

LAdC

,

Santos

MF

,

Jank

L

,

Chiari

50

,

Vilela

MDM

, et al.

2019

.

Genomic selection with allele dosage in panicum maximum Jacq

.

G3 (Bethesda)

.

9

:

2463

2475

.

Lin

Z

,

Hayes

BJ

,

Daetwyler

Hard disk drive.

2014

.

Genomic choice in crops, trees and forages: a review

.

Crop Pasture Sci

.

65

:

1177

1191

.

Lorenz

AJ

,

Smith

KP.

2015

.

Calculation genetically distant individuals to training populations reduces genomic prediction accuracy in barley

.

Crop Sci

.

55

:

2657

2667

.

Massman

JM

,

Jung

HJG

,

Bernardo

R.

2013

.

Genomewide option versus mark-assisted recurrent selection to improve grain yield and stover-quality traits for cellulosic ethanol in maize

.

Crop Sci

.

53

:

58

66

.

Meuwissen

THE

,

Hayes

BJ

,

Goddard

ME.

2001

.

Prediction of total genetic value using genome-wide dense marker maps

.

Genetics

.

157

:

1819

1829

.

Munoz

PR

,

Resende

MFR

,

Huber

DA

,

Quesada

T

,

Resende

MDV

, et al.

2014

.

Genomic relationship matrix for correcting pedigree errors in convenance populations: impact on genetic parameters and genomic selection accurateness

.

Crop Sci

.

54

:

1115

1123

.

Norman

A

,

Taylor

J

,

Edwards

J

,

Kuchel

H.

2018

.

Optimising genomic selection in wheat: Consequence of marker density, population size and population structure on prediction accuracy

.

G3 (Bethesda)

.

8

:

2889

2899

.

Pembleton

LW

,

Drayton

MC

,

Bain

M

,

Baillie

RC

,

Inch

C

, et al.

2016

.

Targeted genotyping-by-sequencing permits cost-constructive identification and bigotry of pasture grass species and cultivars

.

Theor Appl Genet

.

129

:

991

1005

.

Pembleton

LW

,

Inch

C

,

Baillie

RC

,

Drayton

MC

,

Thakur

P

, et al.

2018

.

Exploitation of data from breeding programs supports rapid implementation of genomic option for key agronomic traits in perennial ryegrass

.

Theor Appl Genet

.

131

:

1891

1902

.

Pérez

P

,

de Los Campos

One thousand.

2014

.

Genome-wide regression and prediction with the BGLR statistical bundle

.

Genetics

.

198

:

483

495

.

Pérez-Cabal

Chiliad

,

Vazquez

AI

,

Gianola

D

,

Rosa

GJ

,

Weigel

KA.

2012

.

Accuracy of genome-enabled prediction in a dairy cattle population using unlike cross-validation layouts

.

Forepart Genet

.

iii

:

27

.

Poehlman

JM.

1987

. Breeding cross-pollinated and clonally propagated crops. In:

Breeding Field Crops

.

Dordrecht

:

Springer

, p.

214

236

.

Poland

J

,

Endelman

J

,

Dawson

J

,

Rutkoski

J

,

Wu

SY

, et al.

2012

.

Genomic selection in wheat breeding using genotyping-by-sequencing

.

Plant Genome

.

5

:

103

113

.

R Cadre Squad,

2018

R: A language and environment for statistical computing. R Foundation for Statistical Calculating. Vienna. Austria. ISBN 3-900051-07-0. URL http://www.R-project.org/.

Resende

MDVD

,

Barbosa

MHP.

2006

.

Pick via simulated individual BLUP based on family genotypic effects in sugarcane

.

Pesq Agropec Bras

.

41

:

421

429

.

Resende

MF

,

Muñoz

P

,

Resende

MD

,

Garrick

DJ

,

Fernando

RL

, et al.

2012

.

Accurateness of genomic choice methods in a standard data fix of loblolly pine (Pinus taeda L.)

.

Genetics

.

190

:

1503

1510

.

Stock

KF

,

Reents

R.

2013

.

Genomic selection: status in different species and challenges for breeding

.

Reprod Dom Anim

.

48

:

2

10

.

Torres

LG

,

Vilela de Resende

Physician

,

Azevedo

CF

,

Fonseca east Silva

F

,

de Oliveira

EJ.

2019

.

Genomic selection for productive traits in biparental cassava convenance populations

.

PLoS One

.

14

:

e0220245

.

VanRaden

PM

,

Van Tassell

CP

,

Wiggans

GR

,

Sonstegard

TS

,

Schnabel

RD

, et al.

2009

.

Invited review: reliability of genomic predictions for North American Holstein bulls

.

J Dairy Sci

.

92

:

sixteen

24

.

Vencovsky

R

,

Crossa

J.

2003

.

Measurements of representativeness used in genetic resources conservation and establish breeding

.

Crop Sci

.

43

:

1912

1921

.

Voss-Fels

KP

,

Cooper

M

,

Hayes

BJ.

2019

.

Accelerating ingather genetic gains with genomic selection

.

Theor Appl Genet

.

132

:

669

686

.

Wang

Q

,

Yu

Y

,

Yuan

J

,

Zhang

X

,

Huang

H

, et al.

2017

.

Effects of marker density and population structure on the genomic prediction accuracy for growth trait in Pacific white shrimp Litopenaeus vannamei

.

BMC Genet

.

xviii

:

1

9

.

Wang

J

,

Cogan

NO

,

Forster

JW.

2016

.

Prospects for applications of genomic tools in registration testing and seed certification of ryegrass varieties

.

Plant Brood

.

135

:

405

412

.

Würschum

T

,

Maurer

HP

,

Weissmann

Southward

,

Hahn

V

,

Leiser

WL.

2017

. Accuracy of within-and among-family genomic prediction in triticale. Constitute Breeding.

136

:

230

236

.

Xu

Y

,

Liu

Ten

,

Fu

J

,

Wang

H

,

Wang

J

, et al.

2020

.

Enhancing genetic proceeds through genomic option: from livestock to plants

.

Establish Commun

.

1

:

100005

.

This is an Open up Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits not-commercial reproduction and distribution of the work, in whatsoever medium, provided the original piece of work is not altered or transformed in whatsoever manner, and that the piece of work is properly cited. For commercial re-use, delight contact journals.permissions@oup.com

Editor: A E Lipka

Search for other works by this writer on: