What Is a Control Variable in a Family Trait Tree

Article Navigation

Article Contents

Genomic prediction in family bulks using different traits and cross-validations in pine

Esteban F Rios,

Agronomy Section, University of Florida

, Gainesville, FL 32611,

USA

Corresponding author: Agronomy Department, Academy of Florida, 2005 SW 23rd Street, Building 350 Off 5, Gainesville, FL 32608, Usa. Email: estebanrios@ufl.edu

Search for other works past this author on:

Abstract

Genomic prediction integrates statistical, genomic, and computational tools to improve the estimation of breeding values and increase genetic gain. Due to the wide diversity in mating systems, breeding schemes, propagation methods, and unit of choice, no universal genomic prediction arroyo can be applied in all crops. In a genome-wide family prediction (GWFP) arroyo, the family is the basic unit of selection. We tested GWFP in two loblolly pine (Pinus taeda L.) datasets: a convenance population composed of 63 full-sib families (5–twenty individuals per family), and a fake population with the aforementioned pedigree structure. In both populations, phenotypic and genomic information was pooled at the family level in silico. Marker furnishings were estimated to compute genomic estimated breeding values (GEBV) at the private and family (GWFP) levels. Less than six individuals per family produced inaccurate estimates of family phenotypic performance and allele frequency. Tested across unlike scenarios, GWFP predictive ability was college than those for GEBV in both populations. Validation sets equanimous of families with similar phenotypic hateful and variance as the training population yielded predictions consistently higher and more accurate than other validation sets. Results revealed potential for applying GWFP in breeding programs whose selection unit are family, and for systems where family can serve every bit training sets. The GWFP approach is well suited for crops that are routinely genotyped and phenotyped at the plot-level, but information technology can be extended to other breeding programs. Higher predictive power obtained with GWFP would motivate the application of genomic prediction in these situations.

Introduction

Genomic ( Elshire et al. 2011), statistical ( Meuwissen et al. 2001; Gianola et al. 2009), and computational advances accept allowed significant increases in genetic gain by applying genomic prediction in breeding programs across several species (due east.g., Hayes et al. 2009; Fè et al. 2015, 2016; Gezan et al. 2017; Amadeu et al. 2020; de Bem Oliveira et al. 2020). Taking advantage of the always-reducing price of molecular markers (Wetterstrand, 2020), the concept of genomic prediction was derived ( Meuwissen et al. 2001) as an alternative method to marker-assisted selection. Genomic prediction utilizes a dumbo panel of molecular markers covering the whole genome to predict genomic estimated breeding values (GEBV) of individuals with no phenotypic records ( Meuwissen et al. 2001). Traditional genomic prediction pipelines involve developing a training set, for which available genotypic and phenotypic data is fitted to build a prediction model. This model is afterward used to predict GEBV of choice candidates in a validation set, composed of individuals that are genotyped but not phenotyped. Cross-validation schemes are implemented taking sub-samples from the training prepare to calibrate the model so fit the model into the remaining role of the training set to estimate and evaluate its predictive ability, i.eastward., the correlation between GEBVs and phenotypic values ( Pérez-Cabal et al. 2012).

Genomic prediction has been chop-chop adopted in animal breeding ( Hayes et al. 2009) due to readily accessible genomic data, big reference populations with authentic full-blooded records, and the impossibility of phenotyping sex-linked traits (Stock and Reents 2013). In dairy cattle, genomic prediction can double the genetic gain compared with selection based on progeny test ( Xu et al. 2020). On the contrary, the awarding of genomic prediction in plants has been lagging behind due to less accessible loftier-throughput genotyping methods, lack of accurate full-blooded records, and the wide range of variation in life cycle, ploidy level, and mating systems constitute in plants ( Hough et al. 2013). All these constitute-specific characteristics are key factors affecting predictive power in genomic prediction due to their influence in convenance methods, constructive population size, population structure, and linkage disequilibrium ( Lin et al. 2014). Pioneer studies implementing genomic prediction in plants were performed in major ingather species with traditional hybrid selection such as maize (Combs and Bernardo 2013; Massman et al. 2013) and trees ( Kumar et al. 2012; Resende et al. 2012), or variety option in self-pollinating species ( Poland et al. 2012). Genomic prediction showed to be a powerful tool to accomplish college genetic gain in plant breeding in many other species ( Crossa et al. 2017; Lara et al. 2019; de Bem Oliveira et al. 2020; Esfandyari et al, 2020). Large commercial breeding companies take been applying genomic prediction; even so, the success of the process depends strongly on the species and the breeding program scheme ( Voss-Fels et al. 2019; Xu et al. 2020).

Several species are bred as populations of large full or half-sib families, and commercially used as populations of different levels of relationship (i.e., constructed cultivars) as in some forage species, such alfalfa (Medicago sativa L.; Annicchiarico et al. 2015; Biazzi et al. 2017) and ryegrass (Lolium perenne L.; Fè et al. 2016; Cericola et al. 2018). In those species, the family (total or half-sibs) is the basic unit for phenotyping (eastward.g., plot-level measurement for yield rather than plant level) and selection. Thus, due to the mating organization nature (allogamy), individual plants are of limited involvement because commercial varieties represent a homogenous population equanimous of heterozygous individuals (Poehlman 1987). Also, information technology is non straightforward to link phenotypic data collected on individual spaced-plants to plot-based swards in crops such as forage and turfgrass, which are mostly allogamous (Poehlman 1987), and single-plant performance has been shown to poorly predict plot-based data ( Wang et al. 2016). Therefore, the application of genome-wide family prediction (GWFP) would exist advantageous for traits that are phenotyped using family pools in swards or plots. The phenotypic data collection at the plot level could be extended to other organisms grown and evaluated in families, such as turfgrasses (Fifty.perenne 50.), forages (M.sativa 50.), sugarcane (Saccharum officinarum L.), cassava (Manihot esculenta L.), dearest bees, and to aquaculture species such as shrimp (Litopenaeus vannamei; Barbosa et al. 2012; Wang et al. 2017; Jia et al. 2018; Pembleton et al. 2018; Brascamp and Bijma 2019; Torres et al. 2019) . The awarding of GWFP has already been reported for crops that are bred and farmed as family pools, such as cantankerous-pollinated forage species ( Annicchiarico et al. 2015; Fè et al. 2015, 2016; Biazzi et al. 2017; Cericola et al. 2018; Guo et al. 2018; Jia et al. 2018).

The GWFP arroyo considers family-pools as the measurement unit. Here, both allele frequencies and phenotypic records are expressed every bit a unmarried-average record of a given family. Therefore, the condiment genetic variance in full-sib families is half of the additive variance betwixt individuals. Full-sibs share the same parents, hence the mean genotypic value of a full-sib family is equal to the mean breeding value of the ii parents: ¼(V_a +V_a') = ½5_a. This ½Five_a variance represents the condiment genetic variance amid full-sib families, whereas the other ½5_a is the variance within a family unit, i.e., the variance between individuals (i.e., but 50% of the genetic variation is exploited in GWFP; Falconer and Mackay 1996). As a result, higher predictive ability was reported in family pools when compared with GEBV ( Ashraf et al. 2014). Despite the initial efforts to test the predictive ability of GWFP using empirical data, there is a need to explore further implementation of GWFP in convenance schemes. Equally a kickoff aspect, information technology is essential to compare the predictive ability of GEBV vs GWFP models, and to develop strategies to combine both approaches. For this, datasets that contain family structures but genotyped and phenotyped at the single-found level are ideal. Another attribute is the understanding of the influence that family/pool size and phenotypic variances in training/validation sets have in the predictive ability for diverse traits.

In order to evaluate these aspects, two loblolly pine (Pinus taeda 50.) populations were studied: (1) an observed breeding population composed of 63 families (CCLONES_real), and (ii) a simulated population that reproduced the aforementioned pedigree equally CCLONES_real. The objectives of this written report are: (i) to identify the minimum number of individuals per family required to calculate allele frequency and phenotypic hateful values with reasonable accuracy; (ii) to investigate the effect of contrasting phenotypic mean and variance between training and validation sets on predictive ability; and (3) to assess the predictive ability of GEBV and GWFP. Loblolly pine is not normally bred in family pools, but existing real and simulated datasets were used to compare GEVB and GWFP approaches.

Materials and methods

Loblolly pino real population data

The phenotypic data from the loblolly pino (P.taeda L.) population known equally "comparing clonal lines on experimental sites" (CCLONES_real), which take previously been used for predicting the performance of individual trees ( Resende et al. 2012), was used to assess the efficiency of the GWFP. In this study, GWFP was tested past pooling individual trees belonging to the same full-sib family. The population is composed of 923 individuals from 70 total-sib families obtained past crossing 32 parents in a circular diallel mating design with additional off-diagonal crosses ( Baltunis et al. 2007). The number of individuals per family ranged from 1 to 20, with an average of 13 trees per family (standard deviation = 5). In this study, families with less than five individuals were removed, and 63 full-sib families were used for analyses. Data collection was described in detail in Resende et al. (2012) and Munoz et al. (2014). In summary, all 923 genotypes from CCLONES_real was phenotypically characterized in three replicated studies and was genotyped using an Illumina Infinium assay (Illumina, San Diego, CA, The states; Eckert et al. 2010) with 7216 SNPs, each representing a unique pine EST contig. In this study, four traits representing growth, quality, and diseases were selected based on their narrow-sense heritability and genetic architecture as reported by Resende et al. (2012). These correspond to: (1) lignin concentration (Lignin) (h² = 0.11, polygenic trait), (ii) tree stiffness (Stiffness) at year 4 (km²/s²) (h² = 0.37, polygenic trait), (iii) rust susceptibility (Rust) caused by Cronartium quercuum Berk. Miyable ex Shirai f. sp. Fusiforme (hⁱⁱ = 0.21, oligogenic trait), and (iv) diameter at breast meridian (Diameter) at year 6 (cm) (h² = 0.31, polygenic trait).

Simulated population

A false population (CCLONES_sim) exhibiting similar genetic backdrop every bit CCLONES_real was also considered in this study, aiming to assess the efficiency of GWFP for ii different traits and to predict the operation of the next generation. The description for the simulation, and the results for genomic prediction approaches using individual trees (GEBV) were previously reported for this synthetic population ( de Almeida Filho et al. 2016, 2019). In summary, the base population was created (G0 = 1000 diploid individuals) by randomly sampling 2000 haplotypes from a population with an effective size of N _e = 10,000 ( Johnson et al. 2001) and a mutation rate of ii.5 × 10^−viii. Then, the 10% highest phenotypic values from G0 were selected and randomly mated to generate the first breeding generation (G1). From G1, 42 individuals were selected and used in a circular diallel mating design that reproduced the pedigree as in CCLONES_real (G2), comprised of 923 individuals and 71 total-sib families. However, only 63 families, with more than than five individuals, were used in this study. Subsequently, 42 individuals were selected from G2 and used in crosses to the next generation (G3, CCLONES_sim_prog), a population composed of 1176 individuals and 71 families. Only the 63 families with more v individuals were used for analyses.

The faux genome had 12 chromosomes, each with 100 cM, and 10,000 polymorphic loci were randomly selected to represent the entire genome, and just the scenario exhibiting an absence of dominance (d² = 0.0) and h² = 0.25 were used for analyses in this study. Two traits with dissimilar genetic architectures were simulated: (i) oligogenic: 30 QTL were sampled from a gamma distribution with rate 1.66 and shape 0.4, with positive or negative QTL effects ( Meuwissen et al. 2001), and (2) polygenic: 1000 QTL were used, and their additive effects were sampled from a standard normal distribution (Hickey and Gorjanc 2012). The simulations were run using Macs ( Chen et al. 2009) and in the software R using scripts adult by the authors.

Pooling phenotypic and genotypic data at the family level

In both populations, phenotypic and genotypic information were pooled at the family level in silico. Nosotros assumed that the family phenotype was the boilerplate of all individuals in a family. Hence, the phenotypic value for each individual was pooled at the family level in silico by calculating the family mean, without because the experimental design. Therefore, the average phenotypic value by family unit was used equally the response for all analyses.

In the case of the genomic data, the allele frequency (p) was calculated for each SNP per family unit, considering the reference allele (A) as follows:

where p_ij refers to the allele frequency for SNP i in the j family; $n_{A A_{i j}}$ and $2 n_{A a_{i j}}$ are number of individuals with genotype AA and Aa respectively for SNP i in the family j; $N_{i j}$ are number of individuals in family j with non-missing genotype data for SNP i. Missing values for allele frequency were imputed at the family level using the average allele frequency for that given SNP across families. Markers were excluded from analyses when more than than 50% of the families exhibited missing values, and SNPs were not removed based on minor allele frequency. A total of 4740 polymorphic SNPs (CCLONES_real) and an average of 5000 polymorphic SNPs for CCLONES_sim and CCLONES_sim_prog (boilerplate across simulated replicates) were used in the analyses.

Minimum number of individuals per family unit to estimate allele frequency and family phenotypic hateful

A total of 10 families from CCLONES_real and CCLONES_sim with at least xv individuals were selected to evaluate the minimum number of individuals required to estimate allele frequency and phenotypic family means with the most reasonable accurateness. Families were specifically selected to represent segregation ratios (1:1 and 1:2:one) for 10 SNPs. Allele frequencies per family unit and family unit phenotypic ways were calculated varying the number of individuals per family from one to 15. These values were used to compute the squared deviations between the hateful value obtained with i number of individuals (i=1–15) and the mean value obtained with the entire family (15 individuals), under the assumption that 15 individuals per family provide accurate estimates of allele frequencies and phenotypic mean in our families. This assumption can exist validated using the concept of genetic representativeness, given by the effective population size (N _{due east}) (Vencovsky and Crossa 2003). The estimator of the N _{due east} within a full sib family is given by N _e = [2n/(n + 1)] (Resende and Barbosa 2006). The maximum (when n goes to infinite) Northward _east inside a full sib family unit is 2. With n equal to 15 individuals the North _e is 1.88, which is 94% of this maximum of ii.

Statistical methods for genomic prediction

Marker effects were estimated at the individual (GEBV) and family unit (GWFP) levels with two distinct whole-genome regression approaches using the bundle BGLR (Perez and de los Campos, 2014) in R (R Development Core Team 2018): (1) Bayes B which considers that markers have heterogeneous variances, i.eastward., many loci with no genetic variance and a few loci explain a large portion of the genetic variation ( Meuwissen et al. 2001; Pérez and de Los Campos 2014); and (2) Bayes RR a Bayesian method that assumes mutual variance across all loci; therefore, SNPs with the same allele frequency explain the same proportion of variance and have the same shrinkage effect (Gianola, 2013; Pérez and de Los Campos 2014).

In total, xx,000 Markov chain Monte Carlo iterations were used, of which the first 5000 were discarded as fire-in, and every third sample was kept for parameter estimation. Nosotros fitted the following model for private and family models:

where y is the vector of the averaged phenotype past family in the case of GWFP and past private in the multiple clones in the case of GEBV, µ is the overall mean fitted equally a fixed result, thou is the vector of random marker furnishings, and due east is the vector of random error effects, i is a vector of ones, and Z is the incidence matrix indicating allele frequencies in the instance of GWFP (ranging from 0 to 1), and marking dosage (0, one, and two) for GEBV.

After fitting the model described above for each trait, the GWFP and GEBV of family/individual j (g_j ) were obtained using the following expression:

where $Z_{i j}$ is the allele frequency/marker dosage of the ithursday marker on family/private j, and p is the full number of markers, and ${\hat{thousand}}_{i}$ is the estimated effect of ith SNP.

Cross-validation schemes

The prediction models for GEBV and GWFP were validated using 10-fold cross-validation and go out-one-out approaches, for both populations and all traits. For the 10-fold cantankerous-validation, data was randomly partitioned into 10 subsets, and preparation set populations were created with ninety% of the families/individuals, whereas the remaining 10% of families/individuals were used every bit validation set. This scheme was repeated until the ten subsets were used as validation set. In the leave-one-out approach, models were constructed using Northward_T −1 families (where Northward_T = is the total number of families) in the training set. The validation fix was the unmarried family not included in the training group. This scheme was repeated N_T times until all 63 families were used as the training set up.

Each time the models were fitted using a different validation set, the model'due south predictive power was estimated calculating a Pearson's correlation between the observed/simulated phenotypes and the GWFP/GEBV estimates for the families/individuals included in the validation set.

Creating training/validation sets using contrasting phenotypes

To assess the effect that the validation set structure has in the predictive ability of the models, both populations were divided in three different phenotypic classes for each trait: the smallest x%, the largest x%, and values between both extremes. Five validation sets were created for each trait using these phenotypic classes: (ane) Low: ten% families with the lowest phenotypic values; (2) Loftier: 10% families having the highest values; (three) Depression+High: combining four families from Low and three families from High; (4) Middle: seven families showing phenotypes around the population hateful, (5) Combined: 2 families from Low, 2 families from High, and three families from Middle. For the populations Low+High (iii), Middle (iv), and Combined (v), iii replicates were created past taking random samples from each phenotypic course. The other 56 families were used as grooming sets to build prediction models.

Split-families as grooming/validation sets

Two scenarios were created to explore the ability of the GWFP models to predict the performance of individuals and family pools. All families with more than ten individuals (59 families in full) were randomly split up into ii equivalent size groups. For one group of individuals phenotypic and genotypic data were pooled at the family unit level and used every bit the grooming fix for GWFP models. The other grouping of individuals was used equally the validation set based on two approaches: (ane) predicting the performance of individuals trees not included in the preparation set (GWFP_Fam_Ind), and (2) pooling individuals at the family unit level to predict functioning of families equanimous of individuals non included in the training set (GWFP_Fam_Fam).

Prediction in the following generation using GEBV and GWFP in the simulated population

The genomic prediction models were developed by using the G2 CCLONES_sim population equally the training fix. These training models were used and validated in the G3 generation using GEBV and GWFP, and models were assessed by calculating predictive ability and prediction accurateness. Predicted ability was estimated by calculating a Pearson's correlation betwixt the phenotypic values and the estimated breeding values, and prediction accuracy was estimated past computing a Pearson's correlation between the real breeding value and the estimated breeding value.

Results

Minimum number of individuals per family to guess allele frequency and family phenotypic mean

The minimum number of individuals per family was calculated assessing allele frequency and phenotypic mean deviations using families with at least xv individuals. For genotypic and phenotypic information, the lowest number of individuals needed to accurately approximate allele frequency and family means was six (Figure 1). Allele frequency deviations (Effigy ane, A–D) and mean phenotypic deviations (Effigy 1, Eastward and F) indicated that families with less than six individuals were not providing accurate estimates of the family's genotypic and phenotypic means in both populations. Nosotros assumed that the observed values based on xv individuals per family provides with a reasonable estimation of allele frequency and phenotypic mean for a diploid species. Therefore, all 63 families with six or more individuals were used for further analyses in this written report. Both populations showed similar trends for the genotypic and phenotypic estimates (Figure 1). The average allele frequency deviations were lower for SNPs exhibiting a 1:1 ratio in both populations (Figure 1, A and B), compared with SNPs segregating into a 1:ii:1 ratio (Figure one, C and D). For phenotypic data, CCLONES_sim showed slightly smaller deviations, especially for a lower number of individuals (Effigy 1F), compared with CCLONES_real for the trait diameter (Figure 1E). Other traits in CCLONES_real exhibited a similar beliefs (data not shown).

Figure ane

Average allele frequency deviation (A–D) and family mean phenotypic divergence (Due east and F) in CCLONES_real (real breeding population equanimous of 63 families) (A, C, and E) and CCLONES_sim (imitation breeding population exhibiting like genetic properties of CCLONES_real) (B, D, and F) calculated by increasing the number of individuals from 1 to xv. V families exhibiting genotypic segregation ratios ane:1 (A and B) and 1:2:1 (C and D) for unmarried nucleotide polymorphisms were included in the analysis. The CCLONES_real phenotypic difference is for the trait stem diameter (E).

Predictive ability of statistical methods for genomic prediction and for different cross-validation schemes

Ii Bayesian statistical methods (Bayes B and Bayes RR) and two cross-validation approaches were used to examination the predictive ability of GWFP in four traits measured in CCLONES_real (Figure 2). Both statistical methods yielded loftier and similar predictive abilities for the 4 traits (Effigy 2, A and B). However, standard errors for predictive ability were larger with the leave-one-out approach (Effigy 2, A and B). Additionally, GWFP predictive abilities obtained with the leave-1-out approach were slightly lower than for the ten-fold cross-validation scheme (except for trait Stiffness) (Figure 2, A and B). Therefore, the 10-fold cross validation approach was selected to perform farther analyses.

Figure ii

Average predictive power using family pools (GWFP) in four traits in the loblolly pine convenance population CCLONES obtained with ten-fold and leave-one-out cross-validation schemes using Bayes B (A) and Bayes RR (B).

Predictive ability of GWFP using training/validation sets with contrasting phenotypes

The effect of phenotypic information in the predictive power of GWFP was explored by creating five validation sets using contrasting sets of phenotypic data betwixt training set up and validation set (Figure 3A). The predictive ability for GWFP for all traits were least accurate and had larger standard errors when the validation prepare was composed of families exhibiting small-scale and large phenotypic values (bottom and pinnacle classes; Figure 3B). When validation sets were composed of families exhibiting phenotypes corresponding to the eye class, predictive ability increased for all traits, but standard errors were still large (Effigy 3B). As expected, at that place was an increase in predictive ability and a large reduction in standard errors when validation sets were composed of families showing similar phenotypic mean and variance to the training prepare, corresponding to the classes "Low+High" and "Combined" (Effigy 3B).

Figure three

Phenotypic distribution for testing (orange) and validation (white) sets for fours traits measured the CCLONES_real population and two traits faux using CCLONES_sim (A). Average predictive power obtained with Bayes B using GWFP for four traits in the CCLONES_real (lignin, stiffness, rust, and diameter), and 2 traits with different genetic architecture (Oligogenic and Polygenic) in the CCLONES_sim populations (B). Five scenarios were tested by creating training (56 families) and validation (7 families) populations using phenotypic data: (1) Low: validation set is composed of seven families with lowest phenotypic records; (ii) High: validation ready is equanimous of seven families with highest phenotypic records; (3) Middle: validation set is composed of vii families with phenotypic records similar to the family mean; (iv) Combined: 2 families from Depression, two families from High, and 3 families from Center; and (v) Low + High: four families from Depression and 3 families from High.

Predictive ability of GEBV and GWFP

Predictive ability obtained with Bayes B using dissimilar methods and schemes (Tabular array 1) is presented in Figure 4 for the 63 families from both populations. The traditional genomic prediction approach with individuals in the training prepare and validation set (GEBV) was assorted with predictive ability obtained with the family-based (GWFP) method post-obit a 10-fold cantankerous validation scheme. The scenarios GWFP_Fam_Ind and GWFP_Fam_Fam were run only once because CCLONES (existent and simulated) had a limited number of individuals per family.

Figure iv

Boilerplate predictive power obtained with Bayes B for iv traits in CCLONES-real (lignin, tree stiffness, rust and stalk diameter), and two traits with different genetic compages (Oligogenic and Polygenic) in the CCLONES_sim populations using different genomic prediction methods. GEBV: genomic estimated breeding values individual trees; GWFP_Fam_Ind: genome-wide family unit prediction using 59 family pools as training set, while different individuals from the same families were used as validation ready; GWFP_Fam_Fam: genome-wide family prediction using 59 family pools every bit the training and validation population, but different full-sib individuals were pooled in both sets; GWFP: genome-broad family unit prediction using 63 family unit pools in a 10-fold cross validation scheme. Narrow-sense heritability (h² ) estimated at the individual level ( Resende et al. 2012).

Tabular array 1

Scenarios implemented to design training and validation sets to test predictive ability of genomic prediction models

Scenario	Set
Scenario	Grooming	Validation
GEBV	830 individuals	93 individuals
GWFP	56 families	7 families
GWFP_Fam_Ind	59 families	422 individuals
GWFP_Fam_Fam	59 families	59 families
GWFP_Low	56 families	7 families with lowest phenotypic values
GWFP_High	56 families	7 families with highest phenotypic values
GWFP_Low_High	56 families	vii families, iv low and 3 loftier phenotypic values
GWFP_Middle	56 families	7 families with values similar to the overall mean
GWFP_Combined	56 families	vii families (ii low, 2 high and three heart scenarios)

Scenario	Prepare
Scenario	Training	Validation
GEBV	830 individuals	93 individuals
GWFP	56 families	7 families
GWFP_Fam_Ind	59 families	422 individuals
GWFP_Fam_Fam	59 families	59 families
GWFP_Low	56 families	7 families with lowest phenotypic values
GWFP_High	56 families	7 families with highest phenotypic values
GWFP_Low_High	56 families	7 families, iv low and 3 high phenotypic values
GWFP_Middle	56 families	7 families with values like to the overall mean
GWFP_Combined	56 families	7 families (two low, ii high and 3 eye scenarios)

GEBV, genomic estimated breeding value; GWFP, genome-wide family prediction; CV, cross-validation.

Table 1

Scenarios implemented to pattern training and validation sets to exam predictive ability of genomic prediction models

Scenario	Gear up
Scenario	Grooming	Validation
GEBV	830 individuals	93 individuals
GWFP	56 families	7 families
GWFP_Fam_Ind	59 families	422 individuals
GWFP_Fam_Fam	59 families	59 families
GWFP_Low	56 families	vii families with lowest phenotypic values
GWFP_High	56 families	7 families with highest phenotypic values
GWFP_Low_High	56 families	7 families, 4 low and 3 loftier phenotypic values
GWFP_Middle	56 families	7 families with values similar to the overall hateful
GWFP_Combined	56 families	7 families (2 depression, 2 high and 3 middle scenarios)

Scenario	Set
Scenario	Training	Validation
GEBV	830 individuals	93 individuals
GWFP	56 families	7 families
GWFP_Fam_Ind	59 families	422 individuals
GWFP_Fam_Fam	59 families	59 families
GWFP_Low	56 families	7 families with lowest phenotypic values
GWFP_High	56 families	7 families with highest phenotypic values
GWFP_Low_High	56 families	seven families, 4 low and iii high phenotypic values
GWFP_Middle	56 families	7 families with values similar to the overall mean
GWFP_Combined	56 families	7 families (2 low, 2 high and three centre scenarios)

GEBV, genomic estimated breeding value; GWFP, genome-broad family prediction; CV, cross-validation.

Predictive ability was e'er greater for GWFP methods in both populations and all traits, except for the scenario GWFP_Fam_Ind that showed similar or lower accurateness than GEBV for virtually traits (Figure four). Additionally, predictive power was greater for traits with higher heritability (Figure 4). Specifically, GWFP provided predictive abilities at least xl% greater than traditional GEBV for most of the traits in both populations. Moreover, GWFP_Fam_Fam exhibited like or greater predictive ability than GWFP for most traits in both populations, except for rust (Figure 4). Both sets of traits from the simulated CCLONES population exhibited very similar accuracies for all schemes (Figure 4).

Predictive ability and accurateness of GEBV and GWFP in the following generation

Accuracy and predictive ability of GEBV and GWFP were obtained with the prediction models built with the CCLONES_sim (G2) population equally the training set, and models were validated in the post-obit generation (G3). The GEBV showed higher accuracy than GWFP for the oligogenic trait, and like accuracy for the polygenic trait (Figure five). Predictive ability for the oligogenic and polygenic traits were higher for GWFP (Effigy 5). Additionally, greater predictive ability and accuracy were observed for the oligogenic trait, and the departure between accuracy and predictive ability was greater for the oligogenic trait (Effigy 5).

Figure 5

Average predictive ability and accuracy obtained with Bayes B for two traits with different genetic architecture (Oligogenic and Polygenic) in the CCLONES_sim_progeny population, obtained with individual (GEVB) and family-pooled (GWFP) genomic prediction methods. Predictive power calculated as the correlation between estimated convenance and phenotypic values are denoted as _Pheno, and accuracy as the correlation between estimated and truthful breeding values every bit _BV.

Discussion

Nosotros quantified the predictive ability of GWFP in real and false loblolly pine breeding populations for different traits and cross-validation approaches. Moderate to low predictive ability values were obtained with the traditional genomic prediction approach, every bit previously reported for both populations, using individual trees as the basic phenotypic and genotypic unit ( Resende et al. 2012; de Almeida Filho et al. 2016). In general, GWFP outperformed GEBV in the predictive ability for well-nigh traits; including the predictive ability for the oligogenic and polygenic traits in CCLONES_sim when using the following generation (G3) as the validation gear up.

Effect of family size in genomic prediction

The size and structure of the training population affects the accurateness of genomic prediction models ( VanRaden et al. 2009; Daetwyler et al. 2010; Habier et al. 2010; Grattapaglia and Resende 2011; Edwards et al. 2019; de Bem Oliveira et al. 2020). In our study, the size of the training ready refers to the number of families and the number of individuals within a family. The number of families was fixed and limited to 70 families, so nosotros did non focus on studying the effect of a variable number of families. Yet, the minimum number of individuals per family to obtain reasonable accurate estimates of family allele frequency and family phenotypic mean was found to exist six. When studying the issue of size and limerick of training population in blueberry (Vaccinium spp.), de Bem Oliveira et al. (2020) found a high predictive ability using vi individuals per family for some traits. Thus, in their written report family variance was accurately represented with vi individuals per family unit in this autotetraploid species. Using the estimator of the Northward _{due east} within a total sib family, given past N_e = [2due north/(due north + one)] (Resende and Barbosa 2006), the maximum (when due north goes to infinite) N_e within a full sib family unit is two. With due north equal to 6 individuals the N_east is one.71, which is 86% of the maximum ii. So, n = 6 appears adequate to represent genetically a total-sib family, corroborating our results.

The event of number of individuals within families on accurateness of genomic prediction models was besides demonstrated in perennial ryegrass ( Pembleton et al. 2016, 2018). The authors stated that 48–60 individuals per population are necessary to accurately represent the genetic diversity within a ryegrass population. As an allogamous species, multiple parents are used to create synthetic populations in perennial ryegrass; hence, multiple individuals with a high number of loci in heterozygosis are contributing to the variation in the synthetic population. Perennial ryegrass is unremarkably bred using families and GWPF has been exploited in the species for various traits ( Fè et al. 2015, 2016; Cericola et al. 2018; Guo et al. 2018).

Simulation studies with variable numbers of families and individuals per family would help place the optimum training population sizes for GWFP. Generally, a larger training population (more families in the preparation population) yield college accurateness ( Voss-Fels et al. 2019; de Bem Oliveira et al. 2020), but this is associated with college costs. Therefore, the definition of the optimum number of families, and number of individuals per family are a crucial point for the genomic prediction process. Fé et al. (2015) studied the effect of the number of families in the accuracy of genomic prediction for heading appointment in ryegrass; the authors found loftier accuracies with a depression number of families (<100). The authors showed that increasing the number of families to 500 leads to college accuracy, and more than 500 families did not yield to meaning improvement.

Efficiency of statistical methods and cantankerous-validation schemes

Models considering different Bayesian methods were like in predicting GEBV in traits measured in the real convenance population and the simulated population in this report. Resende et al. (2012), reported a slightly greater predictive ability in the real population for rust incidence with Bayesian methods over RR-BLUP, because fewer genes with large effects control this trait. de Almeida Filho et al. (2016), using the simulated population, reported a slightly lower predictive ability in the oligogenic trait using Bayes RR than Bayes B. In this study, Bayes B and Bayes RR were tested to compare their operation in GWFP because prior distributions and assumptions for both methods are contrasting (Pérez and de Los Campos 2014). Our results showed that both Bayesian methodologies were very similar in predicting family-pools, fifty-fifty for rust incidence in the existent population and for the oligogenic trait in the simulated population.

Both cantankerous-validation schemes, leave-one-out and x-fold, produced similar results in predicting GWFP with a slight advantage for the ten-fold scheme, due to the large variation in the leave-one-out scheme. Resende et al. (2012) reported similar results with the real information set for GEBV, wherein 10-fold and leave-one-out resulted in no pregnant differences in their predictive ability. Also, like predictive abilities betwixt the 10-fold and get out-one-out scheme accept been reported in wheat (Triticum aestivum L.) ( Edwards et al. 2019).

Predictive ability of GWFP using contrasting phenotypes

When the families in the validation gear up had phenotypic values outside the range of phenotypes presented in the grooming set (bottom and superlative classes), lower and much more variable predictive abilities were obtained. Interestingly, college predictive abilities were obtained when families in the validation set had the same phenotypic range as the grooming set. The touch of the phenotypic variance on prediction was demonstrated past Edwards et al. (2019), which reported that the accuracy of genomic prediction in wheat showed higher predictions for crosses (validation fix) with higher phenotypic variance. Würschum et al. (2017) reported equivalent results in triticale (x Triticosecale Wittmack), in which college accuracy was detected for the traits of plant pinnacle and biomass in cases in which families with a large phenotypic variation were included in the training/validation set population.

The differences in predictive power among the scenarios for phenotypic values in the validation set could likewise be related to the limerick of the training sets. For the extreme scenarios (Low and High), the preparation sets did non accept the extreme phenotypic values and alleles frequencies, which could have resulted in poor estimations of markers effects. Studying the optimization process for genomic prediction in wheat, Norman et al. (2018) showed that the genomic prediction accuracy could be improved, in cases when training set and validation set up are not related, by increasing the genetic diversity in the training prepare.

Predictive power of GEBV and GWFP for different traits and scenarios

Predictive ability was always greater for GWFP methods than GEBV in both the existent and simulated populations and for all traits, except when the model was congenital with family pools, and individual performance was predicted (GWFP_Fam_Ind) (Figure 4). Although the total sib families average explores only one-half of condiment genetic variance, the error variance is mitigated with larger number of observations due progeny replication, when compared with single observations ( Hallauer et al. 2010). And so, this higher precision of phenotypic value in family bulks could explain the college accuracy in genomic prediction of families.

The college accurateness in the GWFP method was expected since the additive genetic variance explored in this method is just 50% of the additive genetic variance compared with the GEBV. The genotypic value of a family is equal to the mean breeding value of the two parents: ¼(V_a +V_a') = ½V_a (ignoring the authorization and epistasis furnishings), so the additive variance among total-sib families is only 50% of the full additive variance, whereas the other fifty% represents the variance within a family, which leads to higher accurateness and heritability (Casler and Brummer 2008; Ashraf et al. 2014). Besides, relatedness betwixt the training prepare and the validation southet aland then influence the predictive ability. The relationship between the training gear up and the validation set has a crucial role in the model predictive power (Lorenz and Smith 2015; de Bem Oliveira et al. 2020), information technology can help explain the college predictive ability found in the GWFP_Fam_Fam and GWFP, compared with the GEBV and GWFP_Fam_Ind.

Nevertheless, the predictive ability for virtually traits obtained with GWFP_Fam_Ind scheme was of the same social club of magnitude compared with GEBV, except for the traits stiffness and rust. Therefore, using the numbers from this study equally example, considering the significant reduction in costs incurred in Dna extraction and genotyping 56 families (training set for GWFP), instead of 844 individuals (training set for GEBV), the approach GWFP_Fam_Ind could still be an affordable option for implementing genomic prediction in convenance programs that select private plants, but have limited budgets to phenotype and genotype all individuals in the training fix.

Reduced investments to implementation of genomic prediction with higher predictive ability accuracies tin be obtained with the GWFP approach compared with GEBV. A larger number of families can be included in the models, which, for the present population, would likely result in higher predictive abilities every bit reported in perennial ryegrass for heading date (Fé et al. 2015). Additionally, including more than 10 individuals per family will reduce the sampling variability of the allele frequency and phenotypic hateful, resulting in higher genomic accuracies ( de Bem Oliveira et al. 2020).

Application of GWFP in a breeding program

Genomic prediction has the power to shorten the time of a breeding procedure, which leads to a higher genetic gain per unit fourth dimension, and can allow a reduction in phenotyping process and costs (Grattapaglia and Resende 2011; Crossa et al. 2017; Voss-Fels et al. 2019). However, in some cases, breeders need to genotype a large number of individuals (>x,000) to implement genomic prediction in their programs, increasing costs significantly ( Voss-Fels et al. 2019). The high genotyping costs due to large population sizes can make it impracticable to implement genomic prediction in minor crops, especially in public convenance programs.

For breeding programs with express budgets, the GWFP tin can be an alternative to GEBV due to the reduction in phenotypic and genotypic costs to develop prediction models. GWFP has been used in several forage species that are bred in family bulks and whose phenotyping for disquisitional traits is conducted at the sward/plot level (Fé et al. 2015, 2016; Annicchiarico et al. 2015; Biazzi et al. 2017; Jia et al. 2018; Cericola et al. 2018; Guo et al. 2018). In a GEBV approach, the information (phenotypic and genotypic) is nerveless at the individual level and models are congenital to estimate the performance of individuals ( Figure 6A; Resende et al. 2012; de Almeida Filho et al. 2016, 2019). The GEBV requires significant more resources (labor, economic, and computational) to collect and analyze data. Nether a GWFP approach, the number of genotypic samples (bulked DNA and a single-sequencing effort per family) will exist the exact number of families, representing a significant reduction in the number of samples compared with the traditional GEBV process (Figure 6B). The phenotyping process volition also exist performed at the family/plot level, which is the ideal scenario for critical traits in some crops such every bit fodder and turfgrass species.

Figure 6

Scheme for the unlike genomic prediction scenarios: (A) GEBV: genomic estimated breeding values for private copse; (B) GWFP_Fam_Fam: genome-broad family prediction for families prediction; (C) GWFP_Fam_Ind: genome-broad family unit prediction applied in the option of individuals.

Breeders may likewise be interested in employing the GWFP_Fam_Ind approach, where family unit bulks are used as training gear up, simply individuals are the selection unit (Effigy 6C). In this written report, the GWFP_Fam_Ind approach showed similar accuracy to GEBV for about traits, with the addition of lower needs for phenotypic and genotypic data for the model development. Finally, GWFP models could be exploited in scenarios when remnant seeds might be bachelor for the same family, and the goal would exist to predict the operation of the family or individuals within the family unit. The remaining seeds from the selected families tin be used later on to examination their merits in further replicated field trials. For perennial allogamous crops, families used in the training set up can be used as a new crossing block to starting time a new selection bicycle.

Conclusion

Despite the limitation in number of families and number of individuals per family unit tested in this study, less than six individuals per family produced inaccurate estimates of family phenotypic performance and allele frequency. Validation sets with similar phenotypic mean and variance as the grooming set showed greater predictive ability and more than authentic predictions consistently across traits. These results revealed great potential for using GWFP in convenance programs that select family unit bulks as the selection unit, GWFP is well suited for crops that are routinely genotyped and phenotyped at the plot-level. The GWFP approach can also be extended to breeding schemes where family bulks can serve as training sets, while individuals are the selection target.

Data availability

All phenotypic and genotypic data utilized in this written report have been previously published as a standard information set for development of genomic prediction methods ( Resende et al. 2012; de Almeida Filho et al. 2016). Simulated information available from the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.3126v.

Conflicts of interest

None declared.

Literature cited

Amadeu

Ferrão

LFV

Oliveira

IDB

Benevenuto

Endelman

, et al.

2020

Impact of authorization effects on autotetraploid genomic prediction

Crop Sci

656

–

665

Annicchiarico

Nazzicari

Wei

Pecetti

Fifty

, et al.

2015

Accuracy of genomic choice for alfalfa biomass yield in different reference populations

BMC Genomics

1020

Ashraf

Jensen

Asp

Janss

LL.

2014

Association studies using family pools of outcrossing crops based on allele-frequency estimates from DNA sequencing

Theor Appl Genet

127

1331

–

1341

Baltunis

Huber

White

Goldfarb

Stelzer

HE.

2007

Genetic gain from selection for rooting ability and early growth in vegetatively propagated clones of loblolly pine

Tree Genet Genomes

227

–

238

Barbosa

MHP

Resende

MDV

Dias

LADS

Barbosa

GVDS

Oliveira

RAD

, et al.

2012

Genetic improvement of sugar cane for bioenergy: the Brazilian experience in network enquiry with RIDESA

Ingather Breed Appl Biotechnol

–

Biazzi

Eastward

Nazzicari

Due north

Pecetti

Brummer

Palmonari

, et al.

2017

Genome-wide clan mapping and genomic selection for alfalfa (Medicago sativa) fodder quality traits

PLoS One

e0169234

Brascamp

Bijma

2019

A note on genetic parameters and accurateness of estimated breeding values in love bees

Genet Sel Evol

–

Casler

Brummer

EC.

2008

Theoretical expected genetic gains for amid-and-inside-family option methods in perennial forage crops

Crop Sci

890

–

902

Cericola

Lenk

Fè

Byrne

Jensen

, et al.

2018

Optimized use of low-depth genotyping-by-sequencing for genomic prediction among multi-parental family pools and unmarried plants in perennial ryegrass (Lolium perenne L.)

Front end Found Sci

369

Chen

Marjoram

Wall

JD.

2009

Fast and flexible simulation of Dna sequence data

Genome Res

xix

136

–

142

Combs

Bernardo

2013

Accurateness of genomewide selection for unlike traits with constant population size, heritability, and number of markers

Plant Genome

–

Crossa

Pérez-Rodríguez

Cuevas

Montesinos-López

Jarquín

, et al.

2017

Genomic selection in institute breeding: methods, models, and perspectives

Trends Plant Sci

961

–

975

Daetwyler

Hard disk drive

Pong-Wong

Villanueva

Woolliams

JA.

2010

The bear on of genetic compages on genome-wide evaluation methods

Genetics

185

1021

–

1031

de Almeida Filho

Guimarães

JFR

Silva

FFE

de Resende

MDV

Muñoz

, et al.

2016

The contribution of dominance to phenotype prediction in a pine breeding and simulated population

Heredity (Edinb)

117

–

de Almeida Filho

Guimarães

JFR

Silva

FFE

de Resende

MDV

Muñoz

, et al.

2019

genomic prediction of additive and not-additive effects using genetic markers and pedigrees

G3 (Bethesda)

2739

–

2748

de Bem Oliveira

Amadeu

Ferrão

LFV

Muñoz

PR.

2020

Optimizing whole-genomic prediction for autotetraploid huckleberry breeding

Heredity (Edinb)

125

437

–

448

Eckert

van Heerwaarden

Wegrzyn

Nelson

Ross-Ibarra

, et al.

2010

Patterns of population structure and environmental associations to aridity across the range of loblolly pine (Pinus taeda L., Pinaceae)

Genetics

185

969

–

982

Edwards

Buntjer

Jackson

Bentley

Lage

, et al.

2019

The effects of training population pattern on genomic prediction accurateness in wheat

Theor Appl Genet

132

1943

–

1952

Elshire

Glaubitz

Sun

Poland

Kawamoto

, et al.

2011

A robust, simple genotyping- by-sequencing (GBS) approach for high diversity species

PLoS One

half-dozen

e19379

Esfandyari

Fè

Tessema

Janss

Jensen

2020

Furnishings of dissimilar strategies for exploiting genomic selection in perennial ryegrass breeding programs

G3 (Bethesda)

3783

–

3795

Falconer

Mackay

FC.

1996

. Introdutction to quantitative genetics. In:

Introdutction to Quantitative Genetics

New York

John Wiley & Sons

Fè

Cericola

Byrne

Southward

Lenk

Ashraf

, et al.

2015

Genomic dissection and prediction of heading date in perennial ryegrass

BMC Genomics

921

Fè

Ashraf

Pedersen

Janss

Byrne

, et al.

2016

Accuracy of genomic prediction in a commercial perennial ryegrass convenance programme

Establish Genome

–

Grattapaglia

Resende

Dr..

2011

Genomic choice in woods tree breeding

Tree Genet Genomes

seven

241

–

255

Gezan

Osorio

Verma

Whitaker

VM.

2017

An experimental validation of genomic option in octoploid strawberry

Hortic Res

–

Gianola

2013

Priors in whole-genome regression: the Bayesian alphabet returns

Genetics

194

573

–

596

Gianola

de los Campos

Hill

Manfredi

Fernando

2009

Additive genetic variability and the Bayesian alphabet

Genetics

183

347

–

363

Guo

Cericola

Fè

Pedersen

Lenk

, et al.

2018

Genomic prediction in tetraploid ryegrass using allele frequencies based on genotyping past sequencing

Front Plant Sci

1165

Habier

Tetens

Seefried

Lichtner

Thaller

1000.

2010

The impact of genetic relationship information on genomic convenance values in German Holstein cattle

Genet Sel Evol

Hallauer

Carena

Miranda Filho

JB.

2010

Quantitative Genetics in Maize Breeding

. Springer, New York, USA:

Springer Scientific discipline & Business Media

Hayes

Daetwyler

Bowman

Moser

Chiliad

Tier

, et al.

2009

Accuracy of genomic option: comparing theory and results

Proc Assoc Advmt Anim Breed Genet

–

Hickey

Gorjanc

2012

Imitation information for genomic choice and genome-wide clan studies using a combination of coalescent and gene drib methods

G3 (Bethesda)

425

–

427

Hough

Williamson

Wright

SI.

2013

Patterns of selection in found genomes

Annu Rev Ecol Evol Syst

–

Jia

Zhao

Wang

Han

Zhao

, et al.

2018

Genomic prediction for 25 agronomic and quality traits in alfalfa (Medicago sativa)

Forepart Plant Sci

nine

1220

Johnson

Clair

Lipow

Due south.

2001

Genetic conservation in applied tree breeding programs

. In: Bart A, Thielges BA, Sastrapradja SD, Rimbawanto A (eds) Proceedings of the ITTO conference on in situ and ex situ conservation of commercial tropical trees. ITTO, Yokohama, Nippon, pp.

215

–

230

Kumar

South

Chagné

Bink

Volz

Whitworth

, et al.

2012

Genomic selection for fruit quality traits in apple (Malus× domestica Borkh

PLoS Ane

seven

e36674

Lara

LAdC

Santos

Jank

Chiari

Vilela

MDM

, et al.

2019

Genomic selection with allele dosage in panicum maximum Jacq

G3 (Bethesda)

2463

–

2475

Lin

Hayes

Daetwyler

Hard disk drive.

2014

Genomic choice in crops, trees and forages: a review

Crop Pasture Sci

1177

–

1191

Lorenz

Smith

KP.

2015

Calculation genetically distant individuals to training populations reduces genomic prediction accuracy in barley

Crop Sci

2657

–

2667

Massman

Jung

HJG

Bernardo

2013

Genomewide option versus mark-assisted recurrent selection to improve grain yield and stover-quality traits for cellulosic ethanol in maize

Crop Sci

–

Meuwissen

THE

Hayes

Goddard

ME.

2001

Prediction of total genetic value using genome-wide dense marker maps

Genetics

157

1819

–

1829

Munoz

Resende

MFR

Huber

Quesada

Resende

MDV

, et al.

2014

Genomic relationship matrix for correcting pedigree errors in convenance populations: impact on genetic parameters and genomic selection accurateness

Crop Sci

1115

–

1123

Norman

Taylor

Edwards

Kuchel

2018

Optimising genomic selection in wheat: Consequence of marker density, population size and population structure on prediction accuracy

G3 (Bethesda)

2889

–

2899

Pembleton

Drayton

Bain

Baillie

Inch

, et al.

2016

Targeted genotyping-by-sequencing permits cost-constructive identification and bigotry of pasture grass species and cultivars

Theor Appl Genet

129

991

–

1005

Pembleton

Inch

Baillie

Drayton

Thakur

, et al.

2018

Exploitation of data from breeding programs supports rapid implementation of genomic option for key agronomic traits in perennial ryegrass

Theor Appl Genet

131

1891

–

1902

Pérez

de Los Campos

One thousand.

2014

Genome-wide regression and prediction with the BGLR statistical bundle

Genetics

198

483

–

495

Pérez-Cabal

Chiliad

Vazquez

Gianola

Rosa

Weigel

KA.

2012

Accuracy of genome-enabled prediction in a dairy cattle population using unlike cross-validation layouts

Forepart Genet

iii

Poehlman

JM.

1987

. Breeding cross-pollinated and clonally propagated crops. In:

Breeding Field Crops

Dordrecht

Springer

, p.

214

–

236

Poland

Endelman

Dawson

Rutkoski

, et al.

2012

Genomic selection in wheat breeding using genotyping-by-sequencing

Plant Genome

103

–

113

R Cadre Squad,

2018

R: A language and environment for statistical computing. R Foundation for Statistical Calculating. Vienna. Austria. ISBN 3-900051-07-0. URL http://www.R-project.org/.

Resende

MDVD

Barbosa

MHP.

2006

Pick via simulated individual BLUP based on family genotypic effects in sugarcane

Pesq Agropec Bras

421

–

429

Resende

Muñoz

Resende

Garrick

Fernando

, et al.

2012

Accurateness of genomic choice methods in a standard data fix of loblolly pine (Pinus taeda L.)

Genetics

190

1503

–

1510

Stock

Reents

2013

Genomic selection: status in different species and challenges for breeding

Reprod Dom Anim

–

Torres

Vilela de Resende

Physician

Azevedo

Fonseca east Silva

de Oliveira

EJ.

2019

Genomic selection for productive traits in biparental cassava convenance populations

PLoS One

e0220245

VanRaden

Van Tassell

Wiggans

Sonstegard

Schnabel

, et al.

2009

Invited review: reliability of genomic predictions for North American Holstein bulls

J Dairy Sci

sixteen

–

Vencovsky

Crossa

2003

Measurements of representativeness used in genetic resources conservation and establish breeding

Crop Sci

1912

–

1921

Voss-Fels

Cooper

Hayes

BJ.

2019

Accelerating ingather genetic gains with genomic selection

Theor Appl Genet

132

669

–

686

Wang

Yuan

Zhang

Huang

, et al.

2017

Effects of marker density and population structure on the genomic prediction accuracy for growth trait in Pacific white shrimp Litopenaeus vannamei

BMC Genet

xviii

–

Wang

Cogan

Forster

JW.

2016

Prospects for applications of genomic tools in registration testing and seed certification of ryegrass varieties

Plant Brood

135

405

–

412

Würschum

Maurer

Weissmann

Southward

Hahn

Leiser

WL.

2017

. Accuracy of within-and among-family genomic prediction in triticale. Constitute Breeding.

136

230

–

236

Liu

Ten

Wang

, et al.

2020

Enhancing genetic proceeds through genomic option: from livestock to plants

Establish Commun

100005

This is an Open up Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits not-commercial reproduction and distribution of the work, in whatsoever medium, provided the original piece of work is not altered or transformed in whatsoever manner, and that the piece of work is properly cited. For commercial re-use, delight contact journals.permissions@oup.com

roserthentoa46.blogspot.com

Source: https://academic.oup.com/g3journal/article/11/9/jkab249/6321952

What Is a Control Variable in a Family Trait Tree

Article Contents

Genomic prediction in family bulks using different traits and cross-validations in pine

Abstract

Introduction

Materials and methods

Loblolly pino real population data

Simulated population

Pooling phenotypic and genotypic data at the family level

Minimum number of individuals per family unit to estimate allele frequency and family phenotypic hateful

Statistical methods for genomic prediction

Cross-validation schemes

Creating training/validation sets using contrasting phenotypes

Split-families as grooming/validation sets

Prediction in the following generation using GEBV and GWFP in the simulated population

Results

Minimum number of individuals per family to guess allele frequency and family phenotypic mean

Predictive ability of statistical methods for genomic prediction and for different cross-validation schemes

Predictive ability of GWFP using training/validation sets with contrasting phenotypes

Predictive ability of GEBV and GWFP

Predictive ability and accurateness of GEBV and GWFP in the following generation

Discussion

Effect of family size in genomic prediction

Efficiency of statistical methods and cantankerous-validation schemes

Predictive ability of GWFP using contrasting phenotypes

Predictive power of GEBV and GWFP for different traits and scenarios

Application of GWFP in a breeding program

Conclusion

Data availability

Conflicts of interest

Literature cited

0 Response to "What Is a Control Variable in a Family Trait Tree"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel