| Title: | Mixed-Effects Models Enhancing Functions |
|---|---|
| Description: | Special functions that enhance other mixed effect model packages by creating overlayed, reduced rank, and reduced model matrices together with multiple data sets to practice the use of these models. For more details see Covarrubias-Pazaran (2016) <doi:10.1371/journal.pone.0156744>. |
| Authors: | Giovanny Covarrubias-Pazaran [aut, cre]
|
| Maintainer: | Giovanny Covarrubias-Pazaran <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.1.1 |
| Built: | 2026-05-21 09:30:26 UTC |
| Source: | https://github.com/covaruber/enhancer |
enhancer is a collection of functions useful in the fitting of linear mixed effect models and general data manipulation. As more of my packages use the same special functions I decided to have the special functions as its own package to facilitate the maintenance.
The package provides kernels to the estimate additive (A.matr)
relationship matrix for diploid and polyploid organisms. A good converter from
letter code to numeric format is implemented in the function atcg1234
and atcg1234BackTransform,
which supports higher ploidy levels than diploid. Additional functions for genetic
analysis have been included such as build a genotypic hybrid marker matrix
(build.HMM). If you need to use pedigree you need to convert your
pedigree into a relationship matrix (use the 'getA' function from the pedigreemm package).
Also the LD.decay and the neMarker functions can be used
to calculate linkage disequilibrium and effective population size.
Recently, spatial modeling has been added added to enhancer using the two-dimensional
spline tps function. Based on bbasis
Available are add.diallel.vars, leg,
overlay, rrm, redmm,
simGECorMat function.
Available are adiag1, imputev, propMissing
replaceValues, stan, logspace
function.
Simple coloring functions available are jet.colors, bathy.colors,
transp
The enhancer package is updated on CRAN every 4-months due to CRAN policies but you can find the latest source at https://github.com/covaruber/enhancer. This can be easily installed typing the following in the R console:
library(devtools)
install_github("covaruber/enhancer")
This is recommended if you reported a bug, was fixed and was immediately pushed to GitHub but not in CRAN until the next update.
If you have any questions or suggestions please post it in https://stackoverflow.com or https://stats.stackexchange.com
I'll be glad to help or answer any question. I have spent a valuable amount of time developing this package. Please cite this package in your publication. Type 'citation("enhancer")' to know how to cite it.
Giovanny Covarrubias-Pazaran
Calculates the realized additive relationship matrix (R implementation).
A.matr(X,min.MAF=NULL)A.matr(X,min.MAF=NULL)
X |
Matrix ( |
min.MAF |
Minimum minor allele frequency. The A matrix is not sensitive to rare alleles, so by default only monomorphic markers are removed. |
For vanraden method: the marker matrix is centered by subtracting column means where ms is the coumn means. Then , where , the mean value of the diagonal values of the portion.
If return.imputed = FALSE, the additive relationship matrix is returned.
If return.imputed = TRUE, the function returns a list containing
the A matrix
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
## random population of 200 lines with 1000 markers X <- matrix(rep(0,200*1000),200,1000) for (i in 1:200) { X[i,] <- ifelse(runif(1000)<0.5,-1,1) } A <- A.matr(X) ## take a look at the Genomic relationship matrix colfunc <- colorRampPalette(c("steelblue4","springgreen","yellow")) hv <- heatmap(A[1:15,1:15], col = colfunc(100),Colv = "Rowv") str(hv)## random population of 200 lines with 1000 markers X <- matrix(rep(0,200*1000),200,1000) for (i in 1:200) { X[i,] <- ifelse(runif(1000)<0.5,-1,1) } A <- A.matr(X) ## take a look at the Genomic relationship matrix colfunc <- colorRampPalette(c("steelblue4","springgreen","yellow")) hv <- heatmap(A[1:15,1:15], col = colfunc(100),Colv = "Rowv") str(hv)
'add.diallel.vars' adds 4 columns to the provided diallel dataset. Specifically, the user provides a dataset with indicator variables for who is the male and female parent and the function returns the same dataset with 4 new dummy variables to allow the model fit of diallel models.
add.diallel.vars(df, par1="Par1", par2="Par2",sep.cross="-")add.diallel.vars(df, par1="Par1", par2="Par2",sep.cross="-")
df |
a dataset with the two indicator variables for who is the male and female parent. |
par1 |
the name of the column indicating who is the first parent (e.g. male). |
par2 |
the name of the column indicating who is the second parent (e.g. female). |
sep.cross |
the character that should be used when creating the column for cross.id. A simple paste of the columns par1 and par2. |
A new data set with the following 4 new dummy variables to allow the fit of complex diallel models:
returns a 0 if is a self and a 1 for a cross.
returns a 0 if is a cross and a 1 is is a self.
returns a -1 for a direct cross, a 0 for a self and a 1 for a reciprocal cross.
returns a column psting the par1 and par2 columns.
Giovanny Covarrubias-Pazaran
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
DT = data.frame(Block=c( rep(1,64), rep(2,64) ), Par1=c( sort(rep(1:8,8)), sort(rep(1:8,8)) ), Par2=rep(1:8, 16), Ftime=rnorm(128) ) DT2 <- add.diallel.vars(DT,par1="Par1", par2="Par2") head(DT2)DT = data.frame(Block=c( rep(1,64), rep(2,64) ), Par1=c( sort(rep(1:8,8)), sort(rep(1:8,8)) ), Par2=rep(1:8, 16), Ftime=rnorm(128) ) DT2 <- add.diallel.vars(DT,par1="Par1", par2="Par2") head(DT2)
Array generalization of blockdiag()
adiag1(... , pad=as.integer(0), do.dimnames=TRUE)adiag1(... , pad=as.integer(0), do.dimnames=TRUE)
... |
Arrays to be binded together |
pad |
Value to pad array with; note default keeps integer status of arrays |
do.dimnames |
Boolean, with default |
Binds any number of arrays together, corner-to-corner. Because the function is associative provided pad is of length 1, this page discusses the two array case.
If x=adiag1(a,b) and dim(a)=c(a_1,...,a_d),
dim(b)=c(b_1,...,b_d); then all(dim(x)==dim(a)+dim(b)) and
x[1:a_1,...,1:a_d]=a and
x[(a_1+1):(a_1+b_1),...,(a_d+1):(a_d+b_d)]=b.
Dimnames are preserved, if both arrays have non-null dimnames, and
do.dimnames is TRUE.
Argument pad is usually a length-one vector, but any vector is
acceptable; standard recycling is used. Be aware that the output array
(of dimension dim(a)+dim(b)) is filled with (copies of)
pad before a and b are copied. This can be
confusing.
Returns an array of dimensions dim(a)+dim(b) as described above.
In adiag1(a,b), if a is a length-one vector, it is coerced
to an array of dimensions rep(1,length(dim(b))); likewise
b. If both a and b are length-one vectors, return
diag(c(a,b)).
If a and b are arrays, function adiag1() requires
length(dim(a))==length(dim(b)) (the function does not guess which
dimensions have been dropped; see examples section). In particular,
note that vectors are not coerced except if of length one.
adiag1() is used when padding magic hypercubes in the context
of evaluating subarray sums.
Peter Wolf with some additions by Robin Hankin
a <- array( 1,c(2,2)) b <- array(-1,c(2,2)) adiag1(a,b) ## dropped dimensions can count: b2 <- b1 <- b dim(a) <- c(2,1,2) dim(b1) <- c(2,2,1) dim(b2) <- c(1,2,2) dim(adiag1(a,b1)) dim(adiag1(a,b2)) ## dimnames are preserved if not null: a <- matrix(1,2,2,dimnames=list(col=c("red","blue"),size=c("big","small"))) b <- 8 dim(b) <- c(1,1) dimnames(b) <- list(col=c("green"),size=c("tiny")) adiag1(a,b) #dimnames preserved adiag1(a,8) #dimnames lost because second argument has none. ## non scalar values for pad can be confusing: q <- matrix(0,3,3) adiag1(q,q,pad=1:4) ## following example should make the pattern clear: adiag1(q,q,pad=1:36) # Now, a use for arrays with dimensions of zero extent: z <- array(dim=c(0,3)) colnames(z) <- c("foo","bar","baz") adiag1(a,z) # Observe how this has # added no (ie zero) rows to "a" but # three extra columns filled with the pad value adiag1(a,t(z)) adiag1(z,t(z)) # just the pad valuea <- array( 1,c(2,2)) b <- array(-1,c(2,2)) adiag1(a,b) ## dropped dimensions can count: b2 <- b1 <- b dim(a) <- c(2,1,2) dim(b1) <- c(2,2,1) dim(b2) <- c(1,2,2) dim(adiag1(a,b1)) dim(adiag1(a,b2)) ## dimnames are preserved if not null: a <- matrix(1,2,2,dimnames=list(col=c("red","blue"),size=c("big","small"))) b <- 8 dim(b) <- c(1,1) dimnames(b) <- list(col=c("green"),size=c("tiny")) adiag1(a,b) #dimnames preserved adiag1(a,8) #dimnames lost because second argument has none. ## non scalar values for pad can be confusing: q <- matrix(0,3,3) adiag1(q,q,pad=1:4) ## following example should make the pattern clear: adiag1(q,q,pad=1:36) # Now, a use for arrays with dimensions of zero extent: z <- array(dim=c(0,3)) colnames(z) <- c("foo","bar","baz") adiag1(a,z) # Observe how this has # added no (ie zero) rows to "a" but # three extra columns filled with the pad value adiag1(a,t(z)) adiag1(z,t(z)) # just the pad value
This function was designed to help users to transform their data in letter format to numeric format. Details in the format are not complex, just a dataframe with markers in columns and individuals in rows. Only markers, NO extra columns of plant names etc (names of plants can be stored as rownames). The function expects a matrix of only polymorphic markers, please make sure you clean your data before using this function. The apply function can help you identify and separate monomorphic from polymorphic markers.
atcg1234(data, ploidy=2, format="ATCG", maf=0, multi=TRUE, silent=FALSE, by.allele=FALSE, imp=TRUE, ref.alleles=NULL)atcg1234(data, ploidy=2, format="ATCG", maf=0, multi=TRUE, silent=FALSE, by.allele=FALSE, imp=TRUE, ref.alleles=NULL)
data |
a dataframe with markers in columns and individuals in rows. Preferable the rownames are the ID of the plants so you don't lose track of what is what. |
ploidy |
a numeric value indicating the ploidy level of the specie. The default is 2 which means diploid. |
format |
one of the two possible values allowed by the program "ATCG", which means your calls are in base-pair-letter code, i.e. "AT" in a diploid call, "AATT" tetraploid etc (just example). Therefore possible codes can be "A", "T", "C", "G", "-" (deletion), "+" (insertion). Alternatively "AB" format can be used as well. Commonly this depends from the genotyping technologies used, such as GBS or microarrays. In addition, we have enabled also the use of single-letter code used by Cornell, i.e. A=AA, C=CC, T=TT, G=GG, R=AG, Y=CT, S=CG, W=AT, K=GT, M=AC. In the case of GBS code please make sure that you set the N codes to regular NAs handled by R. The "ATCG" format also works for the bi-allelic marker codes from join map such as "lm", "ll","nn", "np","hh","hk","kk" |
maf |
minor allele frequency used to filter the SNP markers, the default is zero which means all markers are returned in numeric format. |
multi |
a TRUE/FALSE statement indicating if the function should get rid of the markers with more than 2 alleles. If FALSE, which indicates that if markers with multiple alleles are found, the alternate and reference alleles will be the first 2 alleles found. This could be risky since some alleles will be masked, i.e. AA AG AT would take only A and G as reference and alternate alleles, converting to numeric format 2 1 1, giving the same effect to AG and AT which could be a wrong assumption. The default is TRUE, removes markers with more than two alleles. |
silent |
a TRUE/FALSE value indicating if a progress bar should be drawn for each step of the conversion. The default is silent=FALSE, which means that we want progress bar to be drawn. |
by.allele |
a TRUE/FALSE value indicating if the program should transform the data in a zero/one matrix of presence/absense per allele. For example, a marker with 3 alleles A,T,C in a diploid organism will yield 6 possible configurations; AA, AT, AC, TT, TC, CC. Therefore, the program would create 3 columns for this marker indicating the presence/absence of each allele for each genotype. |
imp |
a TRUE/FALSE value indicating if the function should impute the missing data using the median for each marker. If FALSE, then the program will not impute. |
ref.alleles |
a matrix with reference alleles to be used for the conversion. The matrix should have as many columns as markers with reference alleles and with 2 rows, being the first row the alternate allele (Alt) and the second row the reference allele (Ref). Rownames should be "Alt" and "Ref" respectively. If not provided the program will decide the reference allele. |
a new dataframe of markers in numeric format with markers in columns and individuals in rows.
Giovanny Covarrubias-Pazaran
genotypes <- rbind( c("AAAG" , "AAGG" , "TTCC" , "TTTT" , "CCCC"), c("AAGG" , "AGGG" , "TTCC" , "TTTT" , "TCCC"), c("AAGG" , "GGGG" , "TTCC" , "TTTT" , NA), c("AAAA" , "GGGG" , "CCCC" , "TTTT" , NA), c("AAAA" , "GGGG" , "CCCC" , "TTCC" , NA) ) rownames(genotypes) <- letters[1:5] colnames(genotypes) <- paste0("m",1:5) ## convert markers to numeric format polyploid potatoes numo <- atcg1234(data=genotypes, ploidy=4) numo ## convert markers back to letters Xb <- atcg1234BackTransform(marks= numo$M, refs= numo$ref.alleles) Xbgenotypes <- rbind( c("AAAG" , "AAGG" , "TTCC" , "TTTT" , "CCCC"), c("AAGG" , "AGGG" , "TTCC" , "TTTT" , "TCCC"), c("AAGG" , "GGGG" , "TTCC" , "TTTT" , NA), c("AAAA" , "GGGG" , "CCCC" , "TTTT" , NA), c("AAAA" , "GGGG" , "CCCC" , "TTCC" , NA) ) rownames(genotypes) <- letters[1:5] colnames(genotypes) <- paste0("m",1:5) ## convert markers to numeric format polyploid potatoes numo <- atcg1234(data=genotypes, ploidy=4) numo ## convert markers back to letters Xb <- atcg1234BackTransform(marks= numo$M, refs= numo$ref.alleles) Xb
This function was designed to help users back transform the numeric marker matrices from the function atcg1234 into letters.
atcg1234BackTransform(marks, refs)atcg1234BackTransform(marks, refs)
marks |
a centered marker matrix coming from atcg1234. |
refs |
a 2 x m matrix for m markers (columns) and 2 rows where the reference and alternate alleles for each marker are indicated. |
a new marker matrix leter coded according to the reference allele matrix.
Giovanny Covarrubias-Pazaran
genotypes <- rbind( c("AAAG" , "AAGG" , "TTCC" , "TTTT" , "CCCC"), c("AAGG" , "AGGG" , "TTCC" , "TTTT" , "TCCC"), c("AAGG" , "GGGG" , "TTCC" , "TTTT" , NA), c("AAAA" , "GGGG" , "CCCC" , "TTTT" , NA), c("AAAA" , "GGGG" , "CCCC" , "TTCC" , NA) ) rownames(genotypes) <- letters[1:5] colnames(genotypes) <- paste0("m",1:5) ## convert markers to numeric format polyploid potatoes numo <- atcg1234(data=genotypes, ploidy=4) numo ## convert markers back to letters Xb <- atcg1234BackTransform(marks= numo$M, refs= numo$ref.alleles) Xbgenotypes <- rbind( c("AAAG" , "AAGG" , "TTCC" , "TTTT" , "CCCC"), c("AAGG" , "AGGG" , "TTCC" , "TTTT" , "TCCC"), c("AAGG" , "GGGG" , "TTCC" , "TTTT" , NA), c("AAAA" , "GGGG" , "CCCC" , "TTTT" , NA), c("AAAA" , "GGGG" , "CCCC" , "TTCC" , NA) ) rownames(genotypes) <- letters[1:5] colnames(genotypes) <- paste0("m",1:5) ## convert markers to numeric format polyploid potatoes numo <- atcg1234(data=genotypes, ploidy=4) numo ## convert markers back to letters Xb <- atcg1234BackTransform(marks= numo$M, refs= numo$ref.alleles) Xb
bathy.colors(n) generates a sequence of
colors along a linear scale from light grey to pure blue.
bathy.colors(n, alpha = 1)bathy.colors(n, alpha = 1)
n |
The number of colors to return. |
alpha |
Alpha values to be passed to |
A vector of blue scale colors.
{ # Plot a colorbar using bathy.colors image(matrix(seq(100), 100), col=bathy.colors(100)) }{ # Plot a colorbar using bathy.colors image(matrix(seq(100), 100), col=bathy.colors(100)) }
Construct a B-spline basis of degree deg
with ndx-1 equally-spaced internal knots (ndx segments) on range [x1,xr].
Code copied from Eilers & Marx 2010, WIR: Comp Stat 2, 637-653.
bbasis(x, xl, xr, ndx, deg)bbasis(x, xl, xr, ndx, deg)
x |
A vector. Data values for spline. |
xl |
A numeric value. Lower bound for data (lower external knot). |
xr |
A numeric value. Upper bound for data (upper external knot). |
ndx |
A numeric value. Number of divisions for x range (equal to number of segments = number of internal knots + 1) |
deg |
A numeric value. Degree of the polynomial spline. |
Not yet amended to coerce values that should be zero to zero!
A matrix with columns holding the P-spline for each value of x.
Matrix has ndx+deg columns and length(x) rows.
Uses the 2 marker matrices from both sets of inbred or partially inbred parents and creates all possible combinations unless the user specifies which hybrid genotypes to build (custom.hyb argument). It returns the additive and dominance marker matrices (-1,0,1; homo,het,homo in additive and 0,1,0; homo,het,homo for dominance).
build.HMM(M1,M2, custom.hyb=NULL, return.combos.only=FALSE, separator=":",n.batch=1000, verbose=TRUE)build.HMM(M1,M2, custom.hyb=NULL, return.combos.only=FALSE, separator=":",n.batch=1000, verbose=TRUE)
M1 |
Matrix ( |
M2 |
Matrix ( |
custom.hyb |
A data frame with columns 'Var1' 'Var2', 'hybrid' which specifies which hybrids should be built using the M1 and M2 matrices provided. |
return.combos.only |
A TRUE/FALSE statement inicating if the function should skip building the geotype matrix for hybrids and only return the data frame with all possible combinations to be build. In case the user wants to subset the hybrids before building the marker matrix. |
separator |
Any desired character to be used when pasting the male and female columns to assign the name to the hybrids. |
n.batch |
An optional integer value to indicate how many hybrids should be constructed at once. When the number of hybrids and number of markers is big it is better to partition the problem into multiple matrix products. By default we assume that no more than 1000 hybrids should be computed at once to use the memory more efficiently. |
verbose |
A logical value indicating if progress and warning messages should be printed in the console. |
It returns the marker matrix for hybrids coded as additive (-1,0,1; homo,het,homo) and dominance (0,1,0; homo,het,homo). This function is deviced for building marker matrices for hybrids coming from inbreds. If the parents are close to inbred >F5 you can try deleting the heterozygote calls (0's) and imputing those cells with the most common genotype (1 or -1). The expectation is that for mostly inbred individuals this may not change drastically the result but will make the results more interpretable. For non-inbred parents (F1 to F3) the cross of an F1 x F1 has many possibilities and is not the intention of this function to build genotypes for heterzygote x heterozygote crosses.
It returns the marker matrix for hybrids coded as additive (-1,0,1; homo,het,homo) and dominance (0,1,0; homo,het,homo).
marker matrix for hybrids coded as additive (-1,0,1; homo,het,homo)
marker matrix for hybrids coded as dominance (0,1,0; homo,het,homo)
the data frame used to build the hybrid genotypes
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Nishio M and Satoh M. 2014. Including Dominance Effects in the Genomic BLUP Method for Genomic Evaluation. Plos One 9(1), doi:10.1371/journal.pone.0085792
Su G, Christensen OF, Ostersen T, Henryon M, Lund MS. 2012. Estimating Additive and Non-Additive Genetic Variances and Predicting Genetic Merits Using Genome-Wide Dense Single Nucleotide Polymorphism Markers. PLoS ONE 7(9): e45293. doi:10.1371/journal.pone.0045293
nInds=3 nMarks=5 # set of lines 1 M1 <- matrix(rep(0,nInds*nMarks),nInds,nMarks) for (i in 1:nInds) { M1[i,] <- ifelse(runif(nMarks)<0.5,-1,1) } rownames(M1) <- letters[1:nInds] # set of lines 2 M2 <- matrix(rep(0,nInds*nMarks),nInds,nMarks) for (i in 1:nInds) { M2[i,] <- ifelse(runif(nMarks)<0.5,-1,1) } rownames(M2) <- letters[(nInds+1):(nInds*2)] # build additive and dominance matrices for single crosses build.HMM(M1,M2)nInds=3 nMarks=5 # set of lines 1 M1 <- matrix(rep(0,nInds*nMarks),nInds,nMarks) for (i in 1:nInds) { M1[i,] <- ifelse(runif(nMarks)<0.5,-1,1) } rownames(M1) <- letters[1:nInds] # set of lines 2 M2 <- matrix(rep(0,nInds*nMarks),nInds,nMarks) for (i in 1:nInds) { M2[i,] <- ifelse(runif(nMarks)<0.5,-1,1) } rownames(M2) <- letters[(nInds+1):(nInds*2)] # build additive and dominance matrices for single crosses build.HMM(M1,M2)
This dataset contains phenotpic data for one trait evaluated in the experimental design known as augmented design. This model allows to obtain BLUPs for genotypes that are unreplicated by dividing the field in blocks and replicating 'check genotypes' in the blocks and unreplicated genotypes randomly within the blocks. The presence of check genotypes (usually cultivars) allows the adjustment of unreplicated genotypes.
data("DT_augment")data("DT_augment")
The format is: chr "DT_augment"
This data was generated by a potato study.
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
## AUGMENTED DESIGN EXAMPLE data(DT_augment) DT <- DT_augment head(DT) ############## sommer ################## if(requireNamespace("sommer")){ library(sommer) mix1 <- mmes(TSW ~ Check.Gen, random = ~ Block + Genotype:Check, data=DT) summary(mix1)$varcomp } ############## lme4breeding ################## if(requireNamespace("lme4breeding")){ library(lme4breeding) ## fit the mixed model and check summary mix <- lmeb(TSW ~ Check.Gen + (1|Block) + (1|Genotype:Check), data=DT) vc <- VarCorr(mix); print(vc,comp=c("Variance")) sigma(mix)^2 # error variance BLUP <- ranef(mix, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs } ############## end #################### AUGMENTED DESIGN EXAMPLE data(DT_augment) DT <- DT_augment head(DT) ############## sommer ################## if(requireNamespace("sommer")){ library(sommer) mix1 <- mmes(TSW ~ Check.Gen, random = ~ Block + Genotype:Check, data=DT) summary(mix1)$varcomp } ############## lme4breeding ################## if(requireNamespace("lme4breeding")){ library(lme4breeding) ## fit the mixed model and check summary mix <- lmeb(TSW ~ Check.Gen + (1|Block) + (1|Genotype:Check), data=DT) vc <- VarCorr(mix); print(vc,comp=c("Variance")) sigma(mix)^2 # error variance BLUP <- ranef(mix, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs } ############## end ##################
A data frame and genetic markers for a population of 1000 individuals evaluated in 25 environments and 2 replicates per environment giving a total of 50 K records to show how to fit big GxE models with genomic information.
data("DT_big")data("DT_big")
The format is: chr "DT_big"
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
data(DT_big) DT = DT_big M = apply(M_big,2,as.numeric) rownames(M) <- rownames(M_big) # compute the relationship matrix MMT <- tcrossprod(M) m <- sum(diag(MMT))/nrow(MMT) MMT <- MMT/m MMT <- MMT + diag(1e-05, ncol(MMT), ncol(MMT)) # let's fit the GxE model (diagonal first) DT[,"envf_repf"] = paste(DT[,"envf"],DT[,"repf"],sep="_") DT=DT[with(DT, order(envf_repf)), ] if(requireNamespace("lme4breeding")){ library(lme4breeding) # fit the models mixm2 <- lmeb(value ~ (1|envf_repf) + (0+envf_repf||id), data = DT, relmat = list(id = MMT), control = lmerControl( restart_edge = FALSE ), nIters=1000, verbose=1L, rotation=TRUE ) vc <- VarCorr(mixm2); print(vc,comp=c("Variance")) ran2 = ranef(mixm2, condVar=FALSE) H0 <- ran2$id }data(DT_big) DT = DT_big M = apply(M_big,2,as.numeric) rownames(M) <- rownames(M_big) # compute the relationship matrix MMT <- tcrossprod(M) m <- sum(diag(MMT))/nrow(MMT) MMT <- MMT/m MMT <- MMT + diag(1e-05, ncol(MMT), ncol(MMT)) # let's fit the GxE model (diagonal first) DT[,"envf_repf"] = paste(DT[,"envf"],DT[,"repf"],sep="_") DT=DT[with(DT, order(envf_repf)), ] if(requireNamespace("lme4breeding")){ library(lme4breeding) # fit the models mixm2 <- lmeb(value ~ (1|envf_repf) + (0+envf_repf||id), data = DT, relmat = list(id = MMT), control = lmerControl( restart_edge = FALSE ), nIters=1000, verbose=1L, rotation=TRUE ) vc <- VarCorr(mixm2); print(vc,comp=c("Variance")) ran2 = ranef(mixm2, condVar=FALSE) H0 <- ran2$id }
a data frame with 828 rows and 7 columns, with variables tarsus length (tarsus) and colour (back) measured on 828 individuals (animal). The mother of each is also recorded (dam) together with the foster nest (fosternest) in which the chicks were reared. The date on which the first egg in each nest hatched (hatchdate) is recorded together with the sex (sex) of the individuals.
data("DT_btdata")data("DT_btdata")
The format is: chr "DT_btdata"
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
data(DT_btdata) DT <- DT_btdata head(DT) ############# sommer ################ if(requireNamespace("sommer")){ library(sommer) mix4 <- mmes(tarsus ~ sex, random = ~ dam + fosternest, rcov=~units, data = DT) summary(mix4)$varcomp # MULTI-TRAIT EXAMPLE traits <- c("tarsus","back","hatchdate") DT[,traits] <- apply(DT[,traits],2,scale) DTL <- reshape(DT[,c("animal","sex","dam","fosternest", traits)], idvar = c("animal","sex","dam","fosternest"), varying = traits, v.names = "value", direction = "long", timevar = "trait", times = traits ) DTL <- DTL[with(DTL, order(trait,animal)), ] head(DTL) mix3 <- mmes(value ~ trait:sex - 1, henderson=TRUE, random = ~ vsm(usm(trait),ism(dam)) + vsm(usm(trait), ism(fosternest)), rcov= ~ vsm(dsm(trait),ism(units)), data = DTL) summary(mix3)$varcomp #### calculate the genetic correlation cov2cor(mix3$theta[[1]]) cov2cor(mix3$theta[[2]]) } ############# lme4breeding ################ if(requireNamespace("lme4breeding")){ library(lme4breeding) mix4 <- lmeb(tarsus ~ sex + (1|dam) + (1|fosternest), data = DT) vc <- VarCorr(mix4); print(vc,comp=c("Variance")) sigma(mix4)^2 # error variance BLUP <- ranef(mix4, condVar=TRUE) PEV <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs ### multi-trait model traits <- c("tarsus" ,"back", "hatchdate") for(iTrait in traits){DT[,iTrait] <- scale(DT[,iTrait])} DTL <- reshape(DT[,c("animal", traits)], idvar = "animal", varying = traits, v.names = "value", direction = "long", timevar = "trait", times = traits ) DTL <- merge(DTL, unique(DT[,c("animal","dam","fosternest","sex")]), by="animal", all.x = TRUE) DTL <- DTL[with(DTL, order(trait)), ] head(DTL) system.time( mix <- lmeb(value ~ (0+trait|dam), # relmat = list(geno=A), # rotation = TRUE, data=DTL) ) vc <- VarCorr(mix); print(vc,comp=c("Variance")) cov2cor(vc$dam) sigma(mix)^2 # error variance } ############# end ################data(DT_btdata) DT <- DT_btdata head(DT) ############# sommer ################ if(requireNamespace("sommer")){ library(sommer) mix4 <- mmes(tarsus ~ sex, random = ~ dam + fosternest, rcov=~units, data = DT) summary(mix4)$varcomp # MULTI-TRAIT EXAMPLE traits <- c("tarsus","back","hatchdate") DT[,traits] <- apply(DT[,traits],2,scale) DTL <- reshape(DT[,c("animal","sex","dam","fosternest", traits)], idvar = c("animal","sex","dam","fosternest"), varying = traits, v.names = "value", direction = "long", timevar = "trait", times = traits ) DTL <- DTL[with(DTL, order(trait,animal)), ] head(DTL) mix3 <- mmes(value ~ trait:sex - 1, henderson=TRUE, random = ~ vsm(usm(trait),ism(dam)) + vsm(usm(trait), ism(fosternest)), rcov= ~ vsm(dsm(trait),ism(units)), data = DTL) summary(mix3)$varcomp #### calculate the genetic correlation cov2cor(mix3$theta[[1]]) cov2cor(mix3$theta[[2]]) } ############# lme4breeding ################ if(requireNamespace("lme4breeding")){ library(lme4breeding) mix4 <- lmeb(tarsus ~ sex + (1|dam) + (1|fosternest), data = DT) vc <- VarCorr(mix4); print(vc,comp=c("Variance")) sigma(mix4)^2 # error variance BLUP <- ranef(mix4, condVar=TRUE) PEV <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs ### multi-trait model traits <- c("tarsus" ,"back", "hatchdate") for(iTrait in traits){DT[,iTrait] <- scale(DT[,iTrait])} DTL <- reshape(DT[,c("animal", traits)], idvar = "animal", varying = traits, v.names = "value", direction = "long", timevar = "trait", times = traits ) DTL <- merge(DTL, unique(DT[,c("animal","dam","fosternest","sex")]), by="animal", all.x = TRUE) DTL <- DTL[with(DTL, order(trait)), ] head(DTL) system.time( mix <- lmeb(value ~ (0+trait|dam), # relmat = list(geno=A), # rotation = TRUE, data=DTL) ) vc <- VarCorr(mix); print(vc,comp=c("Variance")) cov2cor(vc$dam) sigma(mix)^2 # error variance } ############# end ################
This dataset contains phenotpic data for plant height and grain yield for 100 out of 400 possible hybrids originated from 40 inbred lines belonging to 2 heterotic groups, 20 lines in each, 1600 rows exist for the 400 possible hybrids evaluated in 4 locations but only 100 crosses have phenotypic information. The purpose of this data is to show how to predict the other 300 crosses.
The data contains 3 elements. The first is the phenotypic data and the parent information for each cross evaluated in the 4 locations. 1200 rows should have missing data but the 100 crosses performed were chosen to be able to estimate the GCA and SCA effects of everything.
The second element of the data set is the phenotypic data and other relevant information for the 40.
The third element is the genomic relationship matrix for the 40 inbred lines originated from 511 SNP markers and calculated using the A.mat function.
data("DT_cornhybrids")data("DT_cornhybrids")
The format is: chr "DT_cornhybrids"
This data was generated by a corn study.
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
data(DT_cornhybrids) DT <- DT_cornhybrids GT <- GT_cornhybrids A <- GT K1 <- A[levels(DT$GCA1), levels(DT$GCA1)]; dim(K1) K2 <- A[levels(DT$GCA2), levels(DT$GCA2)]; dim(K2) S <- kronecker(K1, K2) ; dim(S) rownames(S) <- colnames(S) <- levels(DT$SCA) ################ sommer ################### if(requireNamespace("sommer")){ library(sommer) ans <- mmes(Yield ~ Location, random = ~ vsm(ism(GCA1),Gu=K1) + vsm(ism(GCA2),Gu=K2), # + vsm(ism(SCA),Gu=S), rcov=~units, data=DT) summary(ans)$varcomp ## mmec uses the inverse of the relationship matrix K1i <- solve(K1 + diag(1e-4,ncol(K1),ncol(K1))) K1i <- as(as(as( K1i, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(K1i, 'inverse')=TRUE K2i <- solve(K2 + diag(1e-4,ncol(K2),ncol(K2))) K2i <- as(as(as( K2i, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(K2i, 'inverse')=TRUE Si <- solve(S + diag(1e-4,ncol(S),ncol(S))) Si <- as(as(as( Si, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(Si, 'inverse')=TRUE ans2 <- mmes(Yield ~ Location, random = ~ vsm(ism(GCA1),Gu=K1i) + vsm(ism(GCA2),Gu=K2i), # + vsm(ism(SCA),Gu=Si), henderson=TRUE, rcov=~units, data=DT) summary(ans2)$varcomp } ################ lme4breeding ################### if(requireNamespace("lme4breeding")){ library(lme4breeding) ans <- lmeb(Yield ~ Location + (1| GCA1) + (1|GCA2), relmat = list(GCA1=K1, GCA2=K2), data=DT) vc <- VarCorr(ans); print(vc,comp=c("Variance")) sigma(ans)^2 # error variance BLUP <- ranef(ans, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs }data(DT_cornhybrids) DT <- DT_cornhybrids GT <- GT_cornhybrids A <- GT K1 <- A[levels(DT$GCA1), levels(DT$GCA1)]; dim(K1) K2 <- A[levels(DT$GCA2), levels(DT$GCA2)]; dim(K2) S <- kronecker(K1, K2) ; dim(S) rownames(S) <- colnames(S) <- levels(DT$SCA) ################ sommer ################### if(requireNamespace("sommer")){ library(sommer) ans <- mmes(Yield ~ Location, random = ~ vsm(ism(GCA1),Gu=K1) + vsm(ism(GCA2),Gu=K2), # + vsm(ism(SCA),Gu=S), rcov=~units, data=DT) summary(ans)$varcomp ## mmec uses the inverse of the relationship matrix K1i <- solve(K1 + diag(1e-4,ncol(K1),ncol(K1))) K1i <- as(as(as( K1i, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(K1i, 'inverse')=TRUE K2i <- solve(K2 + diag(1e-4,ncol(K2),ncol(K2))) K2i <- as(as(as( K2i, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(K2i, 'inverse')=TRUE Si <- solve(S + diag(1e-4,ncol(S),ncol(S))) Si <- as(as(as( Si, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(Si, 'inverse')=TRUE ans2 <- mmes(Yield ~ Location, random = ~ vsm(ism(GCA1),Gu=K1i) + vsm(ism(GCA2),Gu=K2i), # + vsm(ism(SCA),Gu=Si), henderson=TRUE, rcov=~units, data=DT) summary(ans2)$varcomp } ################ lme4breeding ################### if(requireNamespace("lme4breeding")){ library(lme4breeding) ans <- lmeb(Yield ~ Location + (1| GCA1) + (1|GCA2), relmat = list(GCA1=K1, GCA2=K2), data=DT) vc <- VarCorr(ans); print(vc,comp=c("Variance")) sigma(ans)^2 # error variance BLUP <- ranef(ans, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs }
A CP population or F1 cross is the designation for a cross between 2 highly heterozygote individuals; i.e. humans, fruit crops, bredding populations in recurrent selection.
This dataset contains phenotpic data for 363 siblings for an F1 cross. These are averages over 2 environments evaluated for 4 traits; color, yield, fruit average weight, and firmness. The columns in the CPgeno file are the markers whereas the rows are the individuals. The CPpheno data frame contains the measurements for the 363 siblings, and as mentioned before are averages over 2 environments.
data("DT_cpdata")data("DT_cpdata")
The format is: chr "DT_cpdata"
This data was simulated for fruit breeding applications.
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Giovanny Covarrubias-Pazaran (2024). evola: a simple evolutionary algorithm for complex problems. To be submitted to Bioinformatics.
Gaynor, R. Chris, Gregor Gorjanc, and John M. Hickey. 2021. AlphaSimR: an R package for breeding program simulations. G3 Gene|Genomes|Genetics 11(2):jkaa017. https://doi.org/10.1093/g3journal/jkaa017.
Chen GK, Marjoram P, Wall JD (2009). Fast and Flexible Simulation of DNA Sequence Data. Genome Research, 19, 136-142. http://genome.cshlp.org/content/19/1/136.
data(DT_cpdata) DT <- DT_cpdata GT <- GT_cpdata MP <- MP_cpdata ## create the variance-covariance matrix A <- A.matr(GT) # additive relationship matrix A <- A + diag(1e-4, ncol(A), ncol(A)) ## look at the data and fit the model head(DT) ######## sommer ######### if(requireNamespace("sommer")){ library(sommer) mix1 <- mmes(Yield~1, henderson=FALSE, random=~vsm(ism(id),Gu=A) + Rowf + Colf, rcov=~units, data=DT) summary(mix1)$varcomp ## mmec uses the inverse of the relationship matrix Ai <- solve(A + diag(1e-4,ncol(A),ncol(A))) Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(Ai, 'inverse')=TRUE mix2 <- mmes(Yield~1, henderson=TRUE, random=~vsm(ism(id),Gu=Ai) + Rowf + Colf, rcov=~units, data=DT) summary(mix2)$varcomp vg <- summary(mix2)$varcomp[1,1] # genetic variance G <- A*vg # genetic variance-covariance Ci <- mix2$Ci # coefficient matrix ind <- as.vector(mix2$partitions$`vsm(ism(id), Gu = Ai)`) ind <- seq(ind[1],ind[2]) Ctt <- Ci[ind,ind] # portion of Ci for genotypes R2 <- (G - Ctt)/G # reliability matrix mean(diag(R2)) # average reliability of the trial ####====================#### #### multivariate model #### #### 2 traits #### ####====================#### traits <- c("color","Yield") DT[,traits] <- apply(DT[,traits],2,scale) DTL <- reshape(DT[,c("id", traits)], idvar = c("id"), varying = traits, v.names = "value", direction = "long", timevar = "trait", times = traits ) DTL <- DTL[with(DTL, order(trait)), ] head(DTL) # if using mmes=TRUE you need to provide the inverse Ai <- solve(A + diag(1e-4,ncol(A),ncol(A))) Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(Ai, 'inverse')=TRUE #### be patient take some time ansm <- mmes( value ~ trait, # henderson=TRUE, random=~ vsm(usm(trait), ism(id), Gu=A), # Ai if henderson rcov=~ vsm(dsm(trait), ism(units)), data=DTL) cov2cor(ansm$theta[[1]]) } ######## lme4breeding ######### if(requireNamespace("lme4breeding")){ library(lme4breeding) mix1 <- lmeb(Yield~ (1|id) + (1|Rowf) + (1|Colf), relmat=list(id=A), data=DT) vc <- VarCorr(mix1); print(vc,comp=c("Variance")) sigma(mix1)^2 # error variance # run one last iteration with imputed data # to make sure you get predictions for every level DT2 <- DT DT2$Yield <- imputev(DT2$Yield) mix2 <- update(mix1, suppressOpt = TRUE, start=getME(mix1, "theta"), data=DT2) predsMix2 <- ranef(mix2) condVAR <- lapply(predsMix2, function(x){attr(x, which="postVar")}) # take sqrt() for SEs # if you don't want the imputed vector to have an effect in # the predictions you can use the getMME function to use # the extended model and get predictions without including the # imputed data (I know is a bit messy) preds <- getMME(object=mix2, # extended model vc=VarCorr(mix1), # variance components recordsToUse = which(!is.na(DT$Yield)) # records to use for MME ) # now you could compare between both types of predictions, the last ones are in # theory the correct ones. plot(preds$bu[2:364,], predsMix2$id[,1]) } ######## evola ######### if(requireNamespace("evola")){ library(evola) DT$occ <- 1 DT$Yield <- imputev(DT$Yield) A <- A[DT$id,DT$id] # get best 20 individuals weighting variance by ~0.5=(30*pi)/180 res<-evolafit(formula=cbind(Yield, occ)~id, dt= DT, # constraints: if sum is greater than this ignore constraintsUB = c(Inf,20), # constraints: if sum is smaller than this ignore constraintsLB= c(-Inf,-Inf), # weight the traits for the selection b = c(1,0), # population parameters nCrosses = 100, nProgeny = 10, recombGens=1, nChr=1, mutRateAllele=0, # coancestry parameters D=A, lambda= (30*pi)/180 , nQtlStart = 20, # selection parameters propSelBetween = 0.5, propSelWithin =0.5, nGenerations = 40) Q <- pullQtlGeno(res$pop, simParam = res$simParam, trait=1); Q <- Q/2 best = bestSol(res$pop)[,"fitness"];best qa = (Q %*% DT$Yield)[best,]; qa qDq = Q[best,] %*% A %*% Q[best,]; qDq sum(Q[best,]) # total # of inds selected evolmonitor(res) plot(DT$Yield, col=as.factor(Q[best,]), pch=(Q[best,]*19)+1) pareto(res) } ############ end ############data(DT_cpdata) DT <- DT_cpdata GT <- GT_cpdata MP <- MP_cpdata ## create the variance-covariance matrix A <- A.matr(GT) # additive relationship matrix A <- A + diag(1e-4, ncol(A), ncol(A)) ## look at the data and fit the model head(DT) ######## sommer ######### if(requireNamespace("sommer")){ library(sommer) mix1 <- mmes(Yield~1, henderson=FALSE, random=~vsm(ism(id),Gu=A) + Rowf + Colf, rcov=~units, data=DT) summary(mix1)$varcomp ## mmec uses the inverse of the relationship matrix Ai <- solve(A + diag(1e-4,ncol(A),ncol(A))) Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(Ai, 'inverse')=TRUE mix2 <- mmes(Yield~1, henderson=TRUE, random=~vsm(ism(id),Gu=Ai) + Rowf + Colf, rcov=~units, data=DT) summary(mix2)$varcomp vg <- summary(mix2)$varcomp[1,1] # genetic variance G <- A*vg # genetic variance-covariance Ci <- mix2$Ci # coefficient matrix ind <- as.vector(mix2$partitions$`vsm(ism(id), Gu = Ai)`) ind <- seq(ind[1],ind[2]) Ctt <- Ci[ind,ind] # portion of Ci for genotypes R2 <- (G - Ctt)/G # reliability matrix mean(diag(R2)) # average reliability of the trial ####====================#### #### multivariate model #### #### 2 traits #### ####====================#### traits <- c("color","Yield") DT[,traits] <- apply(DT[,traits],2,scale) DTL <- reshape(DT[,c("id", traits)], idvar = c("id"), varying = traits, v.names = "value", direction = "long", timevar = "trait", times = traits ) DTL <- DTL[with(DTL, order(trait)), ] head(DTL) # if using mmes=TRUE you need to provide the inverse Ai <- solve(A + diag(1e-4,ncol(A),ncol(A))) Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(Ai, 'inverse')=TRUE #### be patient take some time ansm <- mmes( value ~ trait, # henderson=TRUE, random=~ vsm(usm(trait), ism(id), Gu=A), # Ai if henderson rcov=~ vsm(dsm(trait), ism(units)), data=DTL) cov2cor(ansm$theta[[1]]) } ######## lme4breeding ######### if(requireNamespace("lme4breeding")){ library(lme4breeding) mix1 <- lmeb(Yield~ (1|id) + (1|Rowf) + (1|Colf), relmat=list(id=A), data=DT) vc <- VarCorr(mix1); print(vc,comp=c("Variance")) sigma(mix1)^2 # error variance # run one last iteration with imputed data # to make sure you get predictions for every level DT2 <- DT DT2$Yield <- imputev(DT2$Yield) mix2 <- update(mix1, suppressOpt = TRUE, start=getME(mix1, "theta"), data=DT2) predsMix2 <- ranef(mix2) condVAR <- lapply(predsMix2, function(x){attr(x, which="postVar")}) # take sqrt() for SEs # if you don't want the imputed vector to have an effect in # the predictions you can use the getMME function to use # the extended model and get predictions without including the # imputed data (I know is a bit messy) preds <- getMME(object=mix2, # extended model vc=VarCorr(mix1), # variance components recordsToUse = which(!is.na(DT$Yield)) # records to use for MME ) # now you could compare between both types of predictions, the last ones are in # theory the correct ones. plot(preds$bu[2:364,], predsMix2$id[,1]) } ######## evola ######### if(requireNamespace("evola")){ library(evola) DT$occ <- 1 DT$Yield <- imputev(DT$Yield) A <- A[DT$id,DT$id] # get best 20 individuals weighting variance by ~0.5=(30*pi)/180 res<-evolafit(formula=cbind(Yield, occ)~id, dt= DT, # constraints: if sum is greater than this ignore constraintsUB = c(Inf,20), # constraints: if sum is smaller than this ignore constraintsLB= c(-Inf,-Inf), # weight the traits for the selection b = c(1,0), # population parameters nCrosses = 100, nProgeny = 10, recombGens=1, nChr=1, mutRateAllele=0, # coancestry parameters D=A, lambda= (30*pi)/180 , nQtlStart = 20, # selection parameters propSelBetween = 0.5, propSelWithin =0.5, nGenerations = 40) Q <- pullQtlGeno(res$pop, simParam = res$simParam, trait=1); Q <- Q/2 best = bestSol(res$pop)[,"fitness"];best qa = (Q %*% DT$Yield)[best,]; qa qDq = Q[best,] %*% A %*% Q[best,]; qDq sum(Q[best,]) # total # of inds selected evolmonitor(res) plot(DT$Yield, col=as.factor(Q[best,]), pch=(Q[best,]*19)+1) pareto(res) } ############ end ############
This is a simulated dataset that aims to show how to pick the most representative locations and how to optimize sparse testing allocation.
data("DT_envs")data("DT_envs")
The format is: chr "DT_envs"
This data was simulated for experimental design applications.
Giovanny Covarrubias-Pazaran (2024). evola: a simple evolutionary algorithm for complex problems. To be submitted to Bioinformatics.
Gaynor, R. Chris, Gregor Gorjanc, and John M. Hickey. 2021. AlphaSimR: an R package for breeding program simulations. G3 Gene|Genomes|Genetics 11(2):jkaa017. https://doi.org/10.1093/g3journal/jkaa017.
Chen GK, Marjoram P, Wall JD (2009). Fast and Flexible Simulation of DNA Sequence Data. Genome Research, 19, 136-142. http://genome.cshlp.org/content/19/1/136.
data(DT_envs) # scale passport data for environments S=scale(W) # build a relationship matrix between environments G=tcrossprod(S)/ncol(S) G[1:4,1:4] # build the dataset for the genetic algorithm DT <- data.frame(cov=rnorm(nrow(G)), occ=1, id=rownames(G)) nLocs=10 # desired number of representative locations if(requireNamespace("evola")){ library(evola) res<-evolafit(formula=cbind(cov, occ)~id, dt= DT, # constraints: if sum is greater than this ignore constraintsUB = c(Inf, Inf), # constraints: if sum is smaller than this ignore constraintsLB= c(-Inf, nLocs), # weight the traits for the selection b = c(1,0), # population parameters nCrosses = 100, nProgeny = 20, recombGens=1, nChr=1, mutRateAllele=0, # coancestry parameters D=G, lambda= 0 , nQtlStart = nLocs, # selection parameters fitnessf = inbFun, selectTop = FALSE, # fixNumQtlPerInd = TRUE, # propSelBetween = 0.5, propSelWithin =0.5, nGenerations = 50, verbose = TRUE) Q <- pullQtlGeno(res$pop, simParam = res$simParam, trait=1); Q <- Q/2 best <- bestSol(res$pop, selectTop = FALSE)[,"fitness"] solution <- Q[best,] sum(solution) # total # of inds selected names(solution[which(solution>0)]) evolmonitor(res) pareto(res) svdW <- svd(G, nu = 2, nv = 2) PCW <- G %*% svdW$v rownames(PCW) <- rownames(G) plot(PCW[,1], PCW[,2], col = (solution*2)+2, pch=(solution*14)+4, xlab = "pc1", ylab = "pc2") labs <- rownames(PCW) labs[which(solution==0)]="" text(x=PCW[,1], y=PCW[,2], labels=labs, cex=0.5, pos=3) }data(DT_envs) # scale passport data for environments S=scale(W) # build a relationship matrix between environments G=tcrossprod(S)/ncol(S) G[1:4,1:4] # build the dataset for the genetic algorithm DT <- data.frame(cov=rnorm(nrow(G)), occ=1, id=rownames(G)) nLocs=10 # desired number of representative locations if(requireNamespace("evola")){ library(evola) res<-evolafit(formula=cbind(cov, occ)~id, dt= DT, # constraints: if sum is greater than this ignore constraintsUB = c(Inf, Inf), # constraints: if sum is smaller than this ignore constraintsLB= c(-Inf, nLocs), # weight the traits for the selection b = c(1,0), # population parameters nCrosses = 100, nProgeny = 20, recombGens=1, nChr=1, mutRateAllele=0, # coancestry parameters D=G, lambda= 0 , nQtlStart = nLocs, # selection parameters fitnessf = inbFun, selectTop = FALSE, # fixNumQtlPerInd = TRUE, # propSelBetween = 0.5, propSelWithin =0.5, nGenerations = 50, verbose = TRUE) Q <- pullQtlGeno(res$pop, simParam = res$simParam, trait=1); Q <- Q/2 best <- bestSol(res$pop, selectTop = FALSE)[,"fitness"] solution <- Q[best,] sum(solution) # total # of inds selected names(solution[which(solution>0)]) evolmonitor(res) pareto(res) svdW <- svd(G, nu = 2, nv = 2) PCW <- G %*% svdW$v rownames(PCW) <- rownames(G) plot(PCW[,1], PCW[,2], col = (solution*2)+2, pch=(solution*14)+4, xlab = "pc1", ylab = "pc2") labs <- rownames(PCW) labs[which(solution==0)]="" text(x=PCW[,1], y=PCW[,2], labels=labs, cex=0.5, pos=3) }
This dataset contains phenotpic data for 41 potato lines evaluated in 3 environments in an RCBD design. The phenotypic trait is tuber quality and we show how to obtain an estimate of DT_example for the trait.
data("DT_example")data("DT_example")
The format is: chr "DT_example"
This data was generated by a potato study.
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
data(DT_example) DT <- DT_example A <- A_example head(DT) ################## sommer ##################### if(requireNamespace("sommer")){ library(sommer) ## Compound simmetry (CS) model ans1 <- mmes(Yield~Env, random= ~ Name + Env:Name, rcov= ~ units, data=DT) summary(ans1)$varcomp ####===========================================#### #### Univariate heterogeneous variance models #### ####===========================================#### ## Compound simmetry (CS) + Diagonal (DIAG) model DT=DT[with(DT, order(Env)), ] ans2 <- mmes(Yield~Env, random= ~Name + vsm(dsm(Env),ism(Name)), rcov= ~ vsm(dsm(Env),ism(units)), data=DT) summary(ans2) ####===========================================#### #### Multi-trait variance models #### ####===========================================#### # stack traits traits <- c("Yield","Weight") DT[,traits] <- apply(DT[,traits],2,scale) DTL <- reshape(DT[,c("Name","Env","Block", traits)], idvar = c("Name","Env","Block"), varying = traits, v.names = "value", direction = "long", timevar = "trait", times = traits ) DTL <- DTL[with(DTL, order(trait,Env)), ] head(DTL) ## model ans1 <- mmes(value~ trait, random= ~ vsm(usm(trait), ism(Name)), rcov= ~ vsm(dsm(trait), ism(units)), data=DTL) summary(ans1)$varcomp cov2cor(ans1$theta$`vsm(usm(trait), ism(Name`) } ################## lme4breeding ##################### if(requireNamespace("lme4breeding")){ library(lme4breeding) ## Compound simmetry (CS) model ans1 <- lmeb(Yield~Env + (1|Name) + (1|Env:Name), data=DT) vc <- VarCorr(ans1); print(vc,comp=c("Variance")) BLUP <- ranef(ans1, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs ## Main (M) + Diagonal (DIAG) model ## with relationship matrix ansCSDG <- lmeb(Yield ~ Env + (Env||Name), relmat = list(Name = A ), data=DT) vc <- VarCorr(ansCSDG); print(vc,comp=c("Variance")) ## Main (M) + Diagonal (DIAG) model ## with diagonal residuals ## with relationship matrix ansCSDG <- lmeb(Yield ~ Env + (Env||Name) + (0+Env||unitsR), relmat = list(Name = A ), data=DT) vc <- VarCorr(ansCSDG); print(vc,comp=c("Variance")) sigma(ansCSDG)^2 # error variance }data(DT_example) DT <- DT_example A <- A_example head(DT) ################## sommer ##################### if(requireNamespace("sommer")){ library(sommer) ## Compound simmetry (CS) model ans1 <- mmes(Yield~Env, random= ~ Name + Env:Name, rcov= ~ units, data=DT) summary(ans1)$varcomp ####===========================================#### #### Univariate heterogeneous variance models #### ####===========================================#### ## Compound simmetry (CS) + Diagonal (DIAG) model DT=DT[with(DT, order(Env)), ] ans2 <- mmes(Yield~Env, random= ~Name + vsm(dsm(Env),ism(Name)), rcov= ~ vsm(dsm(Env),ism(units)), data=DT) summary(ans2) ####===========================================#### #### Multi-trait variance models #### ####===========================================#### # stack traits traits <- c("Yield","Weight") DT[,traits] <- apply(DT[,traits],2,scale) DTL <- reshape(DT[,c("Name","Env","Block", traits)], idvar = c("Name","Env","Block"), varying = traits, v.names = "value", direction = "long", timevar = "trait", times = traits ) DTL <- DTL[with(DTL, order(trait,Env)), ] head(DTL) ## model ans1 <- mmes(value~ trait, random= ~ vsm(usm(trait), ism(Name)), rcov= ~ vsm(dsm(trait), ism(units)), data=DTL) summary(ans1)$varcomp cov2cor(ans1$theta$`vsm(usm(trait), ism(Name`) } ################## lme4breeding ##################### if(requireNamespace("lme4breeding")){ library(lme4breeding) ## Compound simmetry (CS) model ans1 <- lmeb(Yield~Env + (1|Name) + (1|Env:Name), data=DT) vc <- VarCorr(ans1); print(vc,comp=c("Variance")) BLUP <- ranef(ans1, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs ## Main (M) + Diagonal (DIAG) model ## with relationship matrix ansCSDG <- lmeb(Yield ~ Env + (Env||Name), relmat = list(Name = A ), data=DT) vc <- VarCorr(ansCSDG); print(vc,comp=c("Variance")) ## Main (M) + Diagonal (DIAG) model ## with diagonal residuals ## with relationship matrix ansCSDG <- lmeb(Yield ~ Env + (Env||Name) + (0+Env||unitsR), relmat = list(Name = A ), data=DT) vc <- VarCorr(ansCSDG); print(vc,comp=c("Variance")) sigma(ansCSDG)^2 # error variance }
The following data is a list containing data frames for different type of experimental designs relevant in plant breeding:
1) Augmented designs (2 examples)
2) Incomplete block designs (1 example)
3) Split plot design (2 examples)
4) Latin square designs (1 example)
5) North Carolina designs I,II and III
How to fit each is shown at the Examples section. This may help you get introduced to experimental designs relevant to plant breeding. Good luck.
Different based on the design.
Datasets and more detail about them can be found in the agricolae package. Here we just show the datasets and how to analyze them using mixed model packages.
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
data(DT_expdesigns) DT <- DT_expdesigns names(DT) data1 <- DT$au1 head(data1) ## response variable: "yield" ## check indicator: "entryc" ('nc' for all unreplicated, but personal.name for checks) ## blocking factor: "block" ## treatments, personal names for replicated and non-replicated: "trt" ## check no check indicator: "new" ############ sommer ############## if(requireNamespace("sommer")){ library(sommer) mix1 <- mmes(yield~entryc, random=~block+trt, rcov=~units, data=data1) summary(mix1)$varcomp } ############ lme4breeding ############## if(requireNamespace("lme4breeding")){ library(lme4breeding) mix1 <- lmeb(yield~entryc + (1|block)+(1|trt), data=data1) vc <- VarCorr(mix1); print(vc,comp=c("Variance")) sigma(mix1)^2 # error variance BLUP <- ranef(mix1, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs }data(DT_expdesigns) DT <- DT_expdesigns names(DT) data1 <- DT$au1 head(data1) ## response variable: "yield" ## check indicator: "entryc" ('nc' for all unreplicated, but personal.name for checks) ## blocking factor: "block" ## treatments, personal names for replicated and non-replicated: "trt" ## check no check indicator: "new" ############ sommer ############## if(requireNamespace("sommer")){ library(sommer) mix1 <- mmes(yield~entryc, random=~block+trt, rcov=~units, data=data1) summary(mix1)$varcomp } ############ lme4breeding ############## if(requireNamespace("lme4breeding")){ library(lme4breeding) mix1 <- lmeb(yield~entryc + (1|block)+(1|trt), data=data1) vc <- VarCorr(mix1); print(vc,comp=c("Variance")) sigma(mix1)^2 # error variance BLUP <- ranef(mix1, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs }
This dataset contains phenotpic data for 36 winter bean hybrids, coming from a full diallel design and evaluated for 9 traits. The column male and female origin columns are included as well.
data("DT_fulldiallel")data("DT_fulldiallel")
The format is: chr "DT_fulldiallel"
This data was generated by a winter bean study and originally included in the agridat package.
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
data(DT_fulldiallel) DT <- DT_fulldiallel head(DT) ############## sommer ################ if(requireNamespace("sommer")){ library(sommer) mix <- mmes(stems~1, random=~female+male, data=DT) summary(mix) } ############## lme4breeding ################ if(requireNamespace("lme4breeding")){ library(lme4breeding) mix <- lmeb(stems~1 + (1|female)+(1|male), data=DT) vc <- VarCorr(mix); print(vc,comp=c("Variance")) sigma(mix)^2 # error variance BLUP <- ranef(mix, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs }data(DT_fulldiallel) DT <- DT_fulldiallel head(DT) ############## sommer ################ if(requireNamespace("sommer")){ library(sommer) mix <- mmes(stems~1, random=~female+male, data=DT) summary(mix) } ############## lme4breeding ################ if(requireNamespace("lme4breeding")){ library(lme4breeding) mix <- lmeb(stems~1 + (1|female)+(1|male), data=DT) vc <- VarCorr(mix); print(vc,comp=c("Variance")) sigma(mix)^2 # error variance BLUP <- ranef(mix, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs }
This is a dataset that was included in the Journal of animal ecology by Wilson et al. (2010; see references) to help users understand how to use mixed models with animal datasets with pedigree data.
The dataset contains 3 elements:
gryphon; variables indicating the animal, the mother of the animal, sex of the animal, and two quantitative traits named 'BWT' and 'TARSUS'.
pedi; dataset with 2 columns indicating the sire and the dam of the animals contained in the gryphon dataset.
A; additive relationship matrix formed using the 'getA()' function used over the pedi dataframe.
data("DT_gryphon")data("DT_gryphon")
The format is: chr "DT_gryphon"
This data comes from the Journal of Animal Ecology. Please, if using this data cite Wilson et al. publication. If using our mixed model solver please cite Covarrubias' publication.
Wilson AJ, et al. (2010) An ecologist's guide to the animal model. Journal of Animal Ecology 79(1): 13-26.
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
data(DT_gryphon) DT <- DT_gryphon A <- A_gryphon P <- P_gryphon #### look at the data head(DT) ############## sommer ################# if(requireNamespace("sommer")){ library(sommer) mix1 <- mmes(BWT~1, random=~vsm(ism(ANIMAL),Gu=A), rcov=~units, data=DT) summary(mix1)$varcomp ## mmes algorithm uses the inverse of the relationship matrix ## if you select henderson=TRUE Ai <- solve(A + diag(1e-4,ncol(A),ncol(A))) Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(Ai, 'inverse')=TRUE ####====================#### #### multivariate model #### #### 2 traits #### ####====================#### head(DT) traits <- c("BWT","TARSUS") DT[,traits] <- apply(DT[,traits],2,scale) DTL <- reshape(DT[,c("ANIMAL","MOTHER","BYEAR","SEX", traits)], idvar = c("ANIMAL","MOTHER","BYEAR","SEX"), varying = traits, v.names = "value", direction = "long", timevar = "trait", times = traits ) DTL <- DTL[with(DTL, order(trait,ANIMAL)), ] head(DTL) # #### fit the multivariate model with no fixed effects (intercept only) mix2 <- mmes(value~trait, # henderson=TRUE, random=~vsm(usm(trait),ism(ANIMAL),Gu=A), rcov=~vsm(dsm(trait),ism(units)), data=DTL) summary(mix2)$varcomp cov2cor(mix2$theta[[1]]) } ############## lme4breeding ################# if(requireNamespace("lme4breeding")){ library(lme4breeding) ## fit the model with no fixed effects (intercept only) mix1 <- lmeb(BWT~ (1|ANIMAL), relmat = list(ANIMAL=A), data=DT) vc <- VarCorr(mix1); print(vc,comp=c("Variance")) sigma(mix1)^2 # error variance BLUP <- ranef(mix1, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs ### multi-trait model traits <- c("BWT","TARSUS") for(iTrait in traits){DT[,iTrait] <- scale(imputev(DT[,iTrait]))} DTL <- reshape(DT[,c("ANIMAL", traits)], idvar = "ANIMAL", varying = traits, v.names = "value", direction = "long", timevar = "trait", times = traits ) DTL <- DTL[with(DTL, order(trait)), ] head(DTL) system.time( mix <- lmeb(value ~ (0+trait|ANIMAL), relmat = list(ANIMAL=A), # rotation = TRUE, data=DTL) ) vc <- VarCorr(mix); print(vc,comp=c("Variance")) cov2cor(vc$ANIMAL) sigma(mix)^2 # error variance }data(DT_gryphon) DT <- DT_gryphon A <- A_gryphon P <- P_gryphon #### look at the data head(DT) ############## sommer ################# if(requireNamespace("sommer")){ library(sommer) mix1 <- mmes(BWT~1, random=~vsm(ism(ANIMAL),Gu=A), rcov=~units, data=DT) summary(mix1)$varcomp ## mmes algorithm uses the inverse of the relationship matrix ## if you select henderson=TRUE Ai <- solve(A + diag(1e-4,ncol(A),ncol(A))) Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(Ai, 'inverse')=TRUE ####====================#### #### multivariate model #### #### 2 traits #### ####====================#### head(DT) traits <- c("BWT","TARSUS") DT[,traits] <- apply(DT[,traits],2,scale) DTL <- reshape(DT[,c("ANIMAL","MOTHER","BYEAR","SEX", traits)], idvar = c("ANIMAL","MOTHER","BYEAR","SEX"), varying = traits, v.names = "value", direction = "long", timevar = "trait", times = traits ) DTL <- DTL[with(DTL, order(trait,ANIMAL)), ] head(DTL) # #### fit the multivariate model with no fixed effects (intercept only) mix2 <- mmes(value~trait, # henderson=TRUE, random=~vsm(usm(trait),ism(ANIMAL),Gu=A), rcov=~vsm(dsm(trait),ism(units)), data=DTL) summary(mix2)$varcomp cov2cor(mix2$theta[[1]]) } ############## lme4breeding ################# if(requireNamespace("lme4breeding")){ library(lme4breeding) ## fit the model with no fixed effects (intercept only) mix1 <- lmeb(BWT~ (1|ANIMAL), relmat = list(ANIMAL=A), data=DT) vc <- VarCorr(mix1); print(vc,comp=c("Variance")) sigma(mix1)^2 # error variance BLUP <- ranef(mix1, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs ### multi-trait model traits <- c("BWT","TARSUS") for(iTrait in traits){DT[,iTrait] <- scale(imputev(DT[,iTrait]))} DTL <- reshape(DT[,c("ANIMAL", traits)], idvar = "ANIMAL", varying = traits, v.names = "value", direction = "long", timevar = "trait", times = traits ) DTL <- DTL[with(DTL, order(trait)), ] head(DTL) system.time( mix <- lmeb(value ~ (0+trait|ANIMAL), relmat = list(ANIMAL=A), # rotation = TRUE, data=DTL) ) vc <- VarCorr(mix); print(vc,comp=c("Variance")) cov2cor(vc$ANIMAL) sigma(mix)^2 # error variance }
This dataset contains phenotpic data for 41 potato lines evaluated in 5 locations across 3 years in an RCBD design. The phenotypic trait is tuber quality and we show how to obtain an estimate of DT_h2 for the trait.
data("DT_h2")data("DT_h2")
The format is: chr "DT_h2"
This data was generated by a potato study.
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
data(DT_h2) DT <- DT_h2 head(DT) ############### sommer ################ if(requireNamespace("sommer")){ library(sommer) DT=DT[with(DT, order(Env)), ] ans1 <- mmes(y~Env, henderson=TRUE, random=~vsm(dsm(Env),ism(Name)) + vsm(dsm(Env),ism(Block)), rcov=~vsm(dsm(Env),ism(units)), data=DT) summary(ans1)$varcomp } ############### lme4breeding ################ if(requireNamespace("lme4breeding")){ library(lme4breeding) # model with: # Main+Diagonal for Name # Diagonal for Block # Diagonal for residuals DT=DT[with(DT, order(Env)), ] ans1b <- lmeb(y ~ Env + (Env||Name) + (0+Env||Block) + (0+Env||unitsR), verbose = 1L, data=DT) vc <- VarCorr(ans1b); print(vc,comp=c("Variance")) sigma(ans1b)^2 # error variance BLUP <- ranef(ans1b, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs } ############### end ################data(DT_h2) DT <- DT_h2 head(DT) ############### sommer ################ if(requireNamespace("sommer")){ library(sommer) DT=DT[with(DT, order(Env)), ] ans1 <- mmes(y~Env, henderson=TRUE, random=~vsm(dsm(Env),ism(Name)) + vsm(dsm(Env),ism(Block)), rcov=~vsm(dsm(Env),ism(units)), data=DT) summary(ans1)$varcomp } ############### lme4breeding ################ if(requireNamespace("lme4breeding")){ library(lme4breeding) # model with: # Main+Diagonal for Name # Diagonal for Block # Diagonal for residuals DT=DT[with(DT, order(Env)), ] ans1b <- lmeb(y ~ Env + (Env||Name) + (0+Env||Block) + (0+Env||unitsR), verbose = 1L, data=DT) vc <- VarCorr(ans1b); print(vc,comp=c("Variance")) sigma(ans1b)^2 # error variance BLUP <- ranef(ans1b, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs } ############### end ################
This dataset contains phenotpic data for 21 corn hybrids, with 2 technical repetitions, coming from a half diallel design and evaluated for sugar content. The column geno indicates the hybrid and male and female origin columns are included as well.
data("DT_halfdiallel")data("DT_halfdiallel")
The format is: chr "DT_halfdiallel"
This data was generated by a corn study.
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
data("DT_halfdiallel") DT <- DT_halfdiallel head(DT) DT$femalef <- as.factor(DT$female) DT$malef <- as.factor(DT$male) DT$genof <- as.factor(DT$geno) ############# sommer ############# if(requireNamespace("sommer")){ library(sommer) A <- diag(7); colnames(A) <- rownames(A) <- 1:7;A # if you want to provide a covariance matrix #### model using overlay modh <- mmes(sugar~1, random=~vsm(ism(overlay(femalef,malef, sparse = FALSE)), Gu=A) + genof, data=DT) summary(modh)$varcomp # if ussing mmes=TRUE provide Gu with inverses and give more iterations # Ai <- solve(A + diag(1e-4,ncol(A),ncol(A))) # Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix") # attr(Ai, 'inverse')=TRUE } ############# lme4breeding ############# if(requireNamespace("lme4breeding")){ library(lme4breeding) # overlay matrix to be added to the addmat argument Z <- with(DT, overlay(femalef,malef) ) ## model using overlay without relationship matrix modh <- lmeb(sugar ~ (1|genof) + (1|fema), addmat = list(fema=Z), data=DT) vc <- VarCorr(modh); print(vc,comp=c("Variance")) ## model using overlay with relationship matrix ## relationship matrix to be added to the relmat argument A <- diag(7); colnames(A) <- rownames(A) <- 1:7;A modh <- lmeb(sugar ~ (1|genof) + (1|fema), addmat = list(fema=Z), relmat = list(fema=A), data=DT) vc <- VarCorr(modh); print(vc,comp=c("Variance")) sigma(modh)^2 # error variance BLUP <- ranef(modh, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs } ############# end #############data("DT_halfdiallel") DT <- DT_halfdiallel head(DT) DT$femalef <- as.factor(DT$female) DT$malef <- as.factor(DT$male) DT$genof <- as.factor(DT$geno) ############# sommer ############# if(requireNamespace("sommer")){ library(sommer) A <- diag(7); colnames(A) <- rownames(A) <- 1:7;A # if you want to provide a covariance matrix #### model using overlay modh <- mmes(sugar~1, random=~vsm(ism(overlay(femalef,malef, sparse = FALSE)), Gu=A) + genof, data=DT) summary(modh)$varcomp # if ussing mmes=TRUE provide Gu with inverses and give more iterations # Ai <- solve(A + diag(1e-4,ncol(A),ncol(A))) # Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix") # attr(Ai, 'inverse')=TRUE } ############# lme4breeding ############# if(requireNamespace("lme4breeding")){ library(lme4breeding) # overlay matrix to be added to the addmat argument Z <- with(DT, overlay(femalef,malef) ) ## model using overlay without relationship matrix modh <- lmeb(sugar ~ (1|genof) + (1|fema), addmat = list(fema=Z), data=DT) vc <- VarCorr(modh); print(vc,comp=c("Variance")) ## model using overlay with relationship matrix ## relationship matrix to be added to the relmat argument A <- diag(7); colnames(A) <- rownames(A) <- 1:7;A modh <- lmeb(sugar ~ (1|genof) + (1|fema), addmat = list(fema=Z), relmat = list(fema=A), data=DT) vc <- VarCorr(modh); print(vc,comp=c("Variance")) sigma(modh)^2 # error variance BLUP <- ranef(modh, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs } ############# end #############
This dataset contains phenotpic data for 98 individuals where they are measured with the purpose of identifying the effect of the neighbour in a focal individual.
data("DT_ige")data("DT_ige")
The format is: chr "DT_ige"
This data was masked from a shared study.
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
data(DT_ige) DT <- DT_ige ######################################## ############### sommer ################# ######################################## if(requireNamespace("sommer")){ library(sommer) # Indirect genetic effects model without covariance between DGE and IGE modIGE <- mmes(trait ~ block, dateWarning = FALSE, random = ~ focal + neighbour, rcov = ~ units, nIters=100, data = DT) summary(modIGE)$varcomp pmonitor(modIGE) # Indirect genetic effects model with covariance between DGE and IGE using relationship matrices modIGE <- mmes(trait ~ block, dateWarning = FALSE, random = ~ covm( vsm(ism(focal)), vsm(ism(neighbour)) ), rcov = ~ units, nIters=100, data = DT) summary(modIGE)$varcomp pmonitor(modIGE) # form relationship matrix Ai <- solve(A_ige + diag(1e-5, nrow(A_ige),nrow(A_ige) )) Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(Ai, 'inverse')=TRUE # Indirect genetic effects model with covariance between DGE and IGE using relationship matrices modIGE <- mmes(trait ~ block, dateWarning = FALSE, random = ~ covm( vsm(ism(focal), Gu=Ai), vsm(ism(neighbour), Gu=Ai) ), rcov = ~ units, nIters=100, data = DT) summary(modIGE)$varcomp pmonitor(modIGE) } ############################################## ############### lme4breeding ################# ############################################## if(requireNamespace("lme4breeding")){ library(lme4breeding) # Indirect genetic effects model without covariance between DGE and IGE modIGE <- lmeb(trait ~ block + (1|focal) + (1|neighbour), data = DT) vc <- VarCorr(modIGE); print(vc,comp=c("Variance")) ## Add relationship matrices A_ige <- A_ige + diag(1e-4, ncol(A_ige), ncol(A_ige) ) modIGE <- lmeb(trait ~ block + (1|focal) + (1|neighbour), relmat = list(focal=A_ige, neighbour=A_ige), data = DT) vc <- VarCorr(modIGE); print(vc,comp=c("Variance")) ## Indirect genetic effects model with covariance between DGE and IGE using relationship matrices ## Relationship matrix A_ige <- A_ige + diag(1e-4, ncol(A_ige), ncol(A_ige) ) ## Define 2 dummy variables to make a fake covariance ## for two different random effects DT$fn <- DT$nn <- 1 ## Create the incidence matrix for the first random effect Zf <- Matrix::sparse.model.matrix( ~ focal-1, data=DT ) colnames(Zf) <- gsub("focal","", colnames(Zf)) ## Create the incidence matrix for the second random effect Zn <- Matrix::sparse.model.matrix( ~ neighbour-1, data=DT ) colnames(Zn) <- gsub("neighbour","", colnames(Zn)) ## Fit the model modIGE <- lmeb(trait ~ block + (0+fn+nn|both), addmat = list(both=list(Zf,Zn)), relmat = list(both=A_ige), data = DT) vc <- VarCorr(modIGE); print(vc,comp=c("Variance")) sigma(modIGE)^2 # error variance blups <- ranef(modIGE) condVAR <- lapply(blups, function(x){attr(x, which="postVar")}) # take sqrt() for SEs pairs(blups$both) cov2cor(vc$both) } ########## end ###########data(DT_ige) DT <- DT_ige ######################################## ############### sommer ################# ######################################## if(requireNamespace("sommer")){ library(sommer) # Indirect genetic effects model without covariance between DGE and IGE modIGE <- mmes(trait ~ block, dateWarning = FALSE, random = ~ focal + neighbour, rcov = ~ units, nIters=100, data = DT) summary(modIGE)$varcomp pmonitor(modIGE) # Indirect genetic effects model with covariance between DGE and IGE using relationship matrices modIGE <- mmes(trait ~ block, dateWarning = FALSE, random = ~ covm( vsm(ism(focal)), vsm(ism(neighbour)) ), rcov = ~ units, nIters=100, data = DT) summary(modIGE)$varcomp pmonitor(modIGE) # form relationship matrix Ai <- solve(A_ige + diag(1e-5, nrow(A_ige),nrow(A_ige) )) Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(Ai, 'inverse')=TRUE # Indirect genetic effects model with covariance between DGE and IGE using relationship matrices modIGE <- mmes(trait ~ block, dateWarning = FALSE, random = ~ covm( vsm(ism(focal), Gu=Ai), vsm(ism(neighbour), Gu=Ai) ), rcov = ~ units, nIters=100, data = DT) summary(modIGE)$varcomp pmonitor(modIGE) } ############################################## ############### lme4breeding ################# ############################################## if(requireNamespace("lme4breeding")){ library(lme4breeding) # Indirect genetic effects model without covariance between DGE and IGE modIGE <- lmeb(trait ~ block + (1|focal) + (1|neighbour), data = DT) vc <- VarCorr(modIGE); print(vc,comp=c("Variance")) ## Add relationship matrices A_ige <- A_ige + diag(1e-4, ncol(A_ige), ncol(A_ige) ) modIGE <- lmeb(trait ~ block + (1|focal) + (1|neighbour), relmat = list(focal=A_ige, neighbour=A_ige), data = DT) vc <- VarCorr(modIGE); print(vc,comp=c("Variance")) ## Indirect genetic effects model with covariance between DGE and IGE using relationship matrices ## Relationship matrix A_ige <- A_ige + diag(1e-4, ncol(A_ige), ncol(A_ige) ) ## Define 2 dummy variables to make a fake covariance ## for two different random effects DT$fn <- DT$nn <- 1 ## Create the incidence matrix for the first random effect Zf <- Matrix::sparse.model.matrix( ~ focal-1, data=DT ) colnames(Zf) <- gsub("focal","", colnames(Zf)) ## Create the incidence matrix for the second random effect Zn <- Matrix::sparse.model.matrix( ~ neighbour-1, data=DT ) colnames(Zn) <- gsub("neighbour","", colnames(Zn)) ## Fit the model modIGE <- lmeb(trait ~ block + (0+fn+nn|both), addmat = list(both=list(Zf,Zn)), relmat = list(both=A_ige), data = DT) vc <- VarCorr(modIGE); print(vc,comp=c("Variance")) sigma(modIGE)^2 # error variance blups <- ranef(modIGE) condVAR <- lapply(blups, function(x){attr(x, which="postVar")}) # take sqrt() for SEs pairs(blups$both) cov2cor(vc$both) } ########## end ###########
A data frame with 4 columns; SUBJECT, X, Xf and Y to show how to use the Legendre polynomials in the lmeb function using a numeric variable X and a response variable Y.
data("DT_legendre")data("DT_legendre")
The format is: chr "DT_legendre"
This data was simulated for fruit breeding applications.
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
data(DT_legendre) DT <- DT_legendre head(DT) ######################################## ############### sommer ################# ######################################## if(requireNamespace("sommer")){ library(sommer) library(orthopolynom) mRR2<-mmes(Y~ 1 + Xf , random=~ vsm(usm(leg(X,1)),ism(SUBJECT)) , rcov=~units , data=DT) summary(mRR2)$varcomp } ############################################## ############### lme4breeding ################# ############################################## if(requireNamespace("lme4breeding")){ library(lme4breeding) library(orthopolynom) Z <- with(DT, smm(leg(X,1)) ) for(i in 1:ncol(Z)){DT[,colnames(Z)[i]] <- Z[,i]} ## diagonal random regression Y ~ Xf + (0+leg0+leg1|| SUBJECT) ## unstructured random regression Y ~ Xf + (0+leg0+leg1| SUBJECT) mRR2b<-lmeb(Y ~ Xf + (0+leg0+leg1| SUBJECT), , data=DT) vc <- VarCorr(mRR2b); print(vc,comp=c("Variance")) sigma(mRR2b)^2 # error variance BLUP <- ranef(mRR2b, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs }data(DT_legendre) DT <- DT_legendre head(DT) ######################################## ############### sommer ################# ######################################## if(requireNamespace("sommer")){ library(sommer) library(orthopolynom) mRR2<-mmes(Y~ 1 + Xf , random=~ vsm(usm(leg(X,1)),ism(SUBJECT)) , rcov=~units , data=DT) summary(mRR2)$varcomp } ############################################## ############### lme4breeding ################# ############################################## if(requireNamespace("lme4breeding")){ library(lme4breeding) library(orthopolynom) Z <- with(DT, smm(leg(X,1)) ) for(i in 1:ncol(Z)){DT[,colnames(Z)[i]] <- Z[,i]} ## diagonal random regression Y ~ Xf + (0+leg0+leg1|| SUBJECT) ## unstructured random regression Y ~ Xf + (0+leg0+leg1| SUBJECT) mRR2b<-lmeb(Y ~ Xf + (0+leg0+leg1| SUBJECT), , data=DT) vc <- VarCorr(mRR2b); print(vc,comp=c("Variance")) sigma(mRR2b)^2 # error variance BLUP <- ranef(mRR2b, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs }
This dataset contains phenotpic data for 36 winter bean hybrids, coming from a full diallel design and evaluated for 9 traits. The column male and female origin columns are included as well.
data("DT_mohring")data("DT_mohring")
The format is: chr "DT_mohring"
This data was generated by a winter bean study and originally included in the agridat package.
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
data(DT_mohring) DT <- DT_mohring head(DT) DT2 <- add.diallel.vars(DT,par1="Par1", par2="Par2") head(DT2) ## is.cross denotes a hybrid (1) ## is.self denotes an inbred (1) ## cross.type denotes one way (-1, e.g. AxB) and reciprocal (1, e.g., BxA) and no cross (0) ## cross.id denotes the name of the cross (same name for direct & reciprocal) ############################################## ############### sommer. ################# ############################################## if(requireNamespace("sommer")){ library(sommer) # GRIFFING MODEL 2 with reciprocal effects ########################### mod1hb <- mmes(Ftime ~ 1, data=DT2, random = ~ Block # GCA male & female overlayed + vsm(ism(overlay(Par1, Par2))) # SCA effects (includes cross and selfs) + cross.id # SCAR reciprocal effects (remaining variance in crosses; # if zero there's no reciprocal effects) + vsm(dsm(cross.type), ism(cross.id)) ) summary(mod1hb)$varcomp ## VarComp VarCompSE Zratio ## Block.Ftime-Ftime 0.00000 9.32181 0.000000 ## overlay(Par1, Par2).Ftime-Ftime 1276.73089 750.17269 1.701916 ## cross.id.Ftime-Ftime 1110.99090 330.16921 3.364914 ## cross.id:cross.type.Ftime-Ftime 66.02295 49.26876 1.340057 ## units.Ftime-Ftime 418.47949 74.56442 5.612321 ## # GRIFFING MODEL 2, no reciprocal effects ############################## mod1h <- mmes(Ftime ~ Block + is.cross, data=DT2, nIters = 50, random = ~ # GCA effects for all (hybrids and inbreds) vsm(ism(overlay(Par1, Par2))) # GCA effect (calculated only in hybrids; remaining variance) + vsm(ism(is.cross),ism(overlay(Par1, Par2))) # SCA effect (calculated in hybrids only) + vsm(ism(is.cross), ism(cross.id)) ) summary(mod1h)$varcomp ## VarComp VarCompSE Zratio ## overlay(Par1, Par2).Ftime-Ftime 2304.1781 1261.63193 1.826347 ## overlay(Par1, Par2):is.cross.Ftime-Ftime 613.6040 402.74347 1.523560 ## cross.id:is.cross.Ftime-Ftime 340.7030 148.56225 2.293335 ## units.Ftime-Ftime 501.6275 74.36075 6.745864 ## # GRIFFING MODEL 3, no reciprocal effects ############################### mod1h <- mmes(Ftime ~ Block + is.cross, data=DT2, nIters = 100, random = ~ # GCAC (only for hybrids) vsm(ism(is.cross),ism(overlay(Par1, Par2))) # male GCA (only for inbreds) + vsm(ism(is.self),ism(Par1)) # SCA (for hybrids only) + vsm(ism(is.cross), ism(cross.id)) ) summary(mod1h)$varcomp ## VarComp VarCompSE Zratio ## overlay(Par1, Par2):is.cross.Ftime-Ftime 927.7895 537.91218 1.724797 ## Par1:is.self.Ftime-Ftime 9960.9247 5456.58188 1.825488 ## cross.id:is.cross.Ftime-Ftime 341.4567 148.53667 2.298804 ## units.Ftime-Ftime 498.5974 73.92066 6.745035 ## # GRIFFING MODEL 2, with reciprocal effects ############################# # In Mohring: mixed model 3 reduced mod1h <- mmes(Ftime ~ Block + is.cross, data=DT2, nIters = 100, random = ~ # GCAC (for hybrids only) vsm(ism(is.cross),ism(overlay(Par1, Par2))) # male GCA (for selfs only) + vsm(ism(is.self),ism(Par1)) # SCA (for hybrids only) + vsm(ism(is.cross), ism(cross.id)) # SCAR reciprocal effects (remaning SCA variance) + vsm(ism(cross.type), ism(cross.id)) ) summary(mod1h)$varcomp ## VarComp VarCompSE Zratio ## overlay(Par1, Par2):is.cross.Ftime-Ftime 927.78742 537.89981 1.724833 ## Par1:is.self.Ftime-Ftime 10001.78854 5456.47578 1.833013 ## cross.id:is.cross.Ftime-Ftime 361.89712 148.54264 2.436318 ## cross.id:cross.type.Ftime-Ftime 66.43695 49.24492 1.349113 ## units.Ftime-Ftime 416.82960 74.27202 5.612203 ## # GRIFFING MODEL 3, with RGCA + RSCA #################################### # In Mohring: mixed model 3 mod1h <- mmes(Ftime ~ Block + is.cross, data=DT2,nIters = 100, random = ~ # GCAC (for hybrids only) vsm(ism(is.cross),ism(overlay(Par1, Par2))) # RGCA: exclude selfs (to identify reciprocal GCA effects) + vsm(ism(cross.type),ism(overlay(Par1, Par2))) # male GCA (for selfs only) + vsm(ism(is.self),ism(Par1)) # SCA (for hybrids only) + vsm(ism(is.cross), ism(cross.id)) # SCAR: exclude selfs (if zero there's no reciprocal SCA effects) + vsm(ism(cross.type), ism(cross.id)) ) summary(mod1h)$varcomp ## VarComp VarCompSE Zratio ## overlay(Par1, Par2):is.cross.Ftime-Ftime 927.7843 537.88164 1.7248857 ## Par1:is.self.Ftime-Ftime 10001.7570 5456.30125 1.8330654 ## cross.id:is.cross.Ftime-Ftime 361.8958 148.53670 2.4364068 ## overlay(Par1, Par2):cross.type.Ftime-Ftime 17.9799 19.92428 0.9024114 ## cross.id:cross.type.Ftime-Ftime 30.9519 46.43908 0.6665054 ## units.Ftime-Ftime } ############################################## ############### lme4breeding ################# ############################################## if(requireNamespace("lme4breeding")){ library(lme4breeding) ## GRIFFING MODEL 2 with reciprocal effects ########################### ## overlay matrix to be added to the addmat argument Z <- with(DT, overlay(Par1, Par2) ) fema <- (rep(colnames(Z), nrow(Z)))[1:nrow(Z)] mod1h <- lmeb(Ftime ~ 1 + (1|Block) + ## GCA male & female overlayed (1|fema) + ## SCA effects (includes cross and selfs) (cross.type||cross.id), addmat=list(fema=Z), data=DT2 ) vc <- VarCorr(mod1h); print(vc,comp=c("Variance")) BLUP <- ranef(mod1h, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs ## VarComp VarCompSE Zratio ## Block.Ftime-Ftime 0.00000 9.32181 0.000000 ## overlay(Par1, Par2).Ftime-Ftime 1276.73089 750.17269 1.701916 ## cross.id.Ftime-Ftime 1110.99090 330.16921 3.364914 ## cross.id:cross.type.Ftime-Ftime 66.02295 49.26876 1.340057 ## units.Ftime-Ftime 418.47949 74.56442 5.612321 ## ## GRIFFING MODEL 2, no reciprocal effects ############################## mod1h <- lmeb(Ftime ~ Block + is.cross + ## GCA male & female overlayed main and rr ## GCA effect (calculated only in hybrids; remaining variance) (is.cross||fema) + ## SCA effect (calculated in hybrids only) (0+is.cross||cross.id), addmat=list(fema=Z), data=DT2 ) vc <- VarCorr(mod1h); print(vc,comp=c("Variance")) ## VarComp VarCompSE Zratio ## overlay(Par1, Par2).Ftime-Ftime 2304.1781 1261.63193 1.826347 ## overlay(Par1, Par2):is.cross.Ftime-Ftime 613.6040 402.74347 1.523560 ## cross.id:is.cross.Ftime-Ftime 340.7030 148.56225 2.293335 ## units.Ftime-Ftime 501.6275 74.36075 6.745864 ## ## GRIFFING MODEL 3, no reciprocal effects ############################### mod1h <- lmeb(Ftime ~ Block + is.cross + ## GCAC (only for hybrids) (0+is.cross||fema) + ## male GCA (only for inbreds) (0+is.self||Par1) + ## SCA (for hybrids only) (0+is.cross||cross.id), addmat=list(fema=Z), data=DT2 ) vc <- VarCorr(mod1h); print(vc,comp=c("Variance")) ## VarComp VarCompSE Zratio ## overlay(Par1, Par2):is.cross.Ftime-Ftime 927.7895 537.91218 1.724797 ## Par1:is.self.Ftime-Ftime 9960.9247 5456.58188 1.825488 ## cross.id:is.cross.Ftime-Ftime 341.4567 148.53667 2.298804 ## units.Ftime-Ftime 498.5974 73.92066 6.745035 ## ## GRIFFING MODEL 2, with reciprocal effects ############################# ## In Mohring: mixed model 3 reduced mod1h <- lmeb(Ftime ~ Block + is.cross + ## GCAC (for hybrids only) (0+is.cross||fema) + ## male GCA (for selfs only) (0+is.self||Par1) + ## SCA (for hybrids only) (0+is.cross||cross.id)+ ## SCAR reciprocal effects (remaning SCA variance) (0+cross.type||cross.id), addmat=list(fema=Z), data=DT2 ) vc <- VarCorr(mod1h); print(vc,comp=c("Variance")) ## VarComp VarCompSE Zratio ## overlay(Par1, Par2):is.cross.Ftime-Ftime 927.78742 537.89981 1.724833 ## Par1:is.self.Ftime-Ftime 10001.78854 5456.47578 1.833013 ## cross.id:is.cross.Ftime-Ftime 361.89712 148.54264 2.436318 ## cross.id:cross.type.Ftime-Ftime 66.43695 49.24492 1.349113 ## units.Ftime-Ftime 416.82960 74.27202 5.612203 ## ## GRIFFING MODEL 3, with RGCA + RSCA #################################### ## In Mohring: mixed model 3 mod1h <- lmeb(Ftime ~ Block + is.cross + ## GCAC (for hybrids only) (0+is.cross||fema) + ## RGCA: exclude selfs (to identify reciprocal GCA effects) (0+cross.type||fema) + ## male GCA (for selfs only) (0+is.self||Par1) + ## SCA (for hybrids only) (0+is.cross||cross.id)+ ## SCAR reciprocal effects (remaning SCA variance) (0+cross.type||cross.id), addmat=list(fema=Z), data=DT2 ) vc <- VarCorr(mod1h); print(vc,comp=c("Variance")) ## VarComp VarCompSE Zratio ## overlay(Par1, Par2):is.cross.Ftime-Ftime 927.7843 537.88164 1.7248857 ## Par1:is.self.Ftime-Ftime 10001.7570 5456.30125 1.8330654 ## cross.id:is.cross.Ftime-Ftime 361.8958 148.53670 2.4364068 ## overlay(Par1, Par2):cross.type.Ftime-Ftime 17.9799 19.92428 0.9024114 ## cross.id:cross.type.Ftime-Ftime 30.9519 46.43908 0.6665054 ## units.Ftime-Ftime 416.09922 447.2101 0.93043333 }data(DT_mohring) DT <- DT_mohring head(DT) DT2 <- add.diallel.vars(DT,par1="Par1", par2="Par2") head(DT2) ## is.cross denotes a hybrid (1) ## is.self denotes an inbred (1) ## cross.type denotes one way (-1, e.g. AxB) and reciprocal (1, e.g., BxA) and no cross (0) ## cross.id denotes the name of the cross (same name for direct & reciprocal) ############################################## ############### sommer. ################# ############################################## if(requireNamespace("sommer")){ library(sommer) # GRIFFING MODEL 2 with reciprocal effects ########################### mod1hb <- mmes(Ftime ~ 1, data=DT2, random = ~ Block # GCA male & female overlayed + vsm(ism(overlay(Par1, Par2))) # SCA effects (includes cross and selfs) + cross.id # SCAR reciprocal effects (remaining variance in crosses; # if zero there's no reciprocal effects) + vsm(dsm(cross.type), ism(cross.id)) ) summary(mod1hb)$varcomp ## VarComp VarCompSE Zratio ## Block.Ftime-Ftime 0.00000 9.32181 0.000000 ## overlay(Par1, Par2).Ftime-Ftime 1276.73089 750.17269 1.701916 ## cross.id.Ftime-Ftime 1110.99090 330.16921 3.364914 ## cross.id:cross.type.Ftime-Ftime 66.02295 49.26876 1.340057 ## units.Ftime-Ftime 418.47949 74.56442 5.612321 ## # GRIFFING MODEL 2, no reciprocal effects ############################## mod1h <- mmes(Ftime ~ Block + is.cross, data=DT2, nIters = 50, random = ~ # GCA effects for all (hybrids and inbreds) vsm(ism(overlay(Par1, Par2))) # GCA effect (calculated only in hybrids; remaining variance) + vsm(ism(is.cross),ism(overlay(Par1, Par2))) # SCA effect (calculated in hybrids only) + vsm(ism(is.cross), ism(cross.id)) ) summary(mod1h)$varcomp ## VarComp VarCompSE Zratio ## overlay(Par1, Par2).Ftime-Ftime 2304.1781 1261.63193 1.826347 ## overlay(Par1, Par2):is.cross.Ftime-Ftime 613.6040 402.74347 1.523560 ## cross.id:is.cross.Ftime-Ftime 340.7030 148.56225 2.293335 ## units.Ftime-Ftime 501.6275 74.36075 6.745864 ## # GRIFFING MODEL 3, no reciprocal effects ############################### mod1h <- mmes(Ftime ~ Block + is.cross, data=DT2, nIters = 100, random = ~ # GCAC (only for hybrids) vsm(ism(is.cross),ism(overlay(Par1, Par2))) # male GCA (only for inbreds) + vsm(ism(is.self),ism(Par1)) # SCA (for hybrids only) + vsm(ism(is.cross), ism(cross.id)) ) summary(mod1h)$varcomp ## VarComp VarCompSE Zratio ## overlay(Par1, Par2):is.cross.Ftime-Ftime 927.7895 537.91218 1.724797 ## Par1:is.self.Ftime-Ftime 9960.9247 5456.58188 1.825488 ## cross.id:is.cross.Ftime-Ftime 341.4567 148.53667 2.298804 ## units.Ftime-Ftime 498.5974 73.92066 6.745035 ## # GRIFFING MODEL 2, with reciprocal effects ############################# # In Mohring: mixed model 3 reduced mod1h <- mmes(Ftime ~ Block + is.cross, data=DT2, nIters = 100, random = ~ # GCAC (for hybrids only) vsm(ism(is.cross),ism(overlay(Par1, Par2))) # male GCA (for selfs only) + vsm(ism(is.self),ism(Par1)) # SCA (for hybrids only) + vsm(ism(is.cross), ism(cross.id)) # SCAR reciprocal effects (remaning SCA variance) + vsm(ism(cross.type), ism(cross.id)) ) summary(mod1h)$varcomp ## VarComp VarCompSE Zratio ## overlay(Par1, Par2):is.cross.Ftime-Ftime 927.78742 537.89981 1.724833 ## Par1:is.self.Ftime-Ftime 10001.78854 5456.47578 1.833013 ## cross.id:is.cross.Ftime-Ftime 361.89712 148.54264 2.436318 ## cross.id:cross.type.Ftime-Ftime 66.43695 49.24492 1.349113 ## units.Ftime-Ftime 416.82960 74.27202 5.612203 ## # GRIFFING MODEL 3, with RGCA + RSCA #################################### # In Mohring: mixed model 3 mod1h <- mmes(Ftime ~ Block + is.cross, data=DT2,nIters = 100, random = ~ # GCAC (for hybrids only) vsm(ism(is.cross),ism(overlay(Par1, Par2))) # RGCA: exclude selfs (to identify reciprocal GCA effects) + vsm(ism(cross.type),ism(overlay(Par1, Par2))) # male GCA (for selfs only) + vsm(ism(is.self),ism(Par1)) # SCA (for hybrids only) + vsm(ism(is.cross), ism(cross.id)) # SCAR: exclude selfs (if zero there's no reciprocal SCA effects) + vsm(ism(cross.type), ism(cross.id)) ) summary(mod1h)$varcomp ## VarComp VarCompSE Zratio ## overlay(Par1, Par2):is.cross.Ftime-Ftime 927.7843 537.88164 1.7248857 ## Par1:is.self.Ftime-Ftime 10001.7570 5456.30125 1.8330654 ## cross.id:is.cross.Ftime-Ftime 361.8958 148.53670 2.4364068 ## overlay(Par1, Par2):cross.type.Ftime-Ftime 17.9799 19.92428 0.9024114 ## cross.id:cross.type.Ftime-Ftime 30.9519 46.43908 0.6665054 ## units.Ftime-Ftime } ############################################## ############### lme4breeding ################# ############################################## if(requireNamespace("lme4breeding")){ library(lme4breeding) ## GRIFFING MODEL 2 with reciprocal effects ########################### ## overlay matrix to be added to the addmat argument Z <- with(DT, overlay(Par1, Par2) ) fema <- (rep(colnames(Z), nrow(Z)))[1:nrow(Z)] mod1h <- lmeb(Ftime ~ 1 + (1|Block) + ## GCA male & female overlayed (1|fema) + ## SCA effects (includes cross and selfs) (cross.type||cross.id), addmat=list(fema=Z), data=DT2 ) vc <- VarCorr(mod1h); print(vc,comp=c("Variance")) BLUP <- ranef(mod1h, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs ## VarComp VarCompSE Zratio ## Block.Ftime-Ftime 0.00000 9.32181 0.000000 ## overlay(Par1, Par2).Ftime-Ftime 1276.73089 750.17269 1.701916 ## cross.id.Ftime-Ftime 1110.99090 330.16921 3.364914 ## cross.id:cross.type.Ftime-Ftime 66.02295 49.26876 1.340057 ## units.Ftime-Ftime 418.47949 74.56442 5.612321 ## ## GRIFFING MODEL 2, no reciprocal effects ############################## mod1h <- lmeb(Ftime ~ Block + is.cross + ## GCA male & female overlayed main and rr ## GCA effect (calculated only in hybrids; remaining variance) (is.cross||fema) + ## SCA effect (calculated in hybrids only) (0+is.cross||cross.id), addmat=list(fema=Z), data=DT2 ) vc <- VarCorr(mod1h); print(vc,comp=c("Variance")) ## VarComp VarCompSE Zratio ## overlay(Par1, Par2).Ftime-Ftime 2304.1781 1261.63193 1.826347 ## overlay(Par1, Par2):is.cross.Ftime-Ftime 613.6040 402.74347 1.523560 ## cross.id:is.cross.Ftime-Ftime 340.7030 148.56225 2.293335 ## units.Ftime-Ftime 501.6275 74.36075 6.745864 ## ## GRIFFING MODEL 3, no reciprocal effects ############################### mod1h <- lmeb(Ftime ~ Block + is.cross + ## GCAC (only for hybrids) (0+is.cross||fema) + ## male GCA (only for inbreds) (0+is.self||Par1) + ## SCA (for hybrids only) (0+is.cross||cross.id), addmat=list(fema=Z), data=DT2 ) vc <- VarCorr(mod1h); print(vc,comp=c("Variance")) ## VarComp VarCompSE Zratio ## overlay(Par1, Par2):is.cross.Ftime-Ftime 927.7895 537.91218 1.724797 ## Par1:is.self.Ftime-Ftime 9960.9247 5456.58188 1.825488 ## cross.id:is.cross.Ftime-Ftime 341.4567 148.53667 2.298804 ## units.Ftime-Ftime 498.5974 73.92066 6.745035 ## ## GRIFFING MODEL 2, with reciprocal effects ############################# ## In Mohring: mixed model 3 reduced mod1h <- lmeb(Ftime ~ Block + is.cross + ## GCAC (for hybrids only) (0+is.cross||fema) + ## male GCA (for selfs only) (0+is.self||Par1) + ## SCA (for hybrids only) (0+is.cross||cross.id)+ ## SCAR reciprocal effects (remaning SCA variance) (0+cross.type||cross.id), addmat=list(fema=Z), data=DT2 ) vc <- VarCorr(mod1h); print(vc,comp=c("Variance")) ## VarComp VarCompSE Zratio ## overlay(Par1, Par2):is.cross.Ftime-Ftime 927.78742 537.89981 1.724833 ## Par1:is.self.Ftime-Ftime 10001.78854 5456.47578 1.833013 ## cross.id:is.cross.Ftime-Ftime 361.89712 148.54264 2.436318 ## cross.id:cross.type.Ftime-Ftime 66.43695 49.24492 1.349113 ## units.Ftime-Ftime 416.82960 74.27202 5.612203 ## ## GRIFFING MODEL 3, with RGCA + RSCA #################################### ## In Mohring: mixed model 3 mod1h <- lmeb(Ftime ~ Block + is.cross + ## GCAC (for hybrids only) (0+is.cross||fema) + ## RGCA: exclude selfs (to identify reciprocal GCA effects) (0+cross.type||fema) + ## male GCA (for selfs only) (0+is.self||Par1) + ## SCA (for hybrids only) (0+is.cross||cross.id)+ ## SCAR reciprocal effects (remaning SCA variance) (0+cross.type||cross.id), addmat=list(fema=Z), data=DT2 ) vc <- VarCorr(mod1h); print(vc,comp=c("Variance")) ## VarComp VarCompSE Zratio ## overlay(Par1, Par2):is.cross.Ftime-Ftime 927.7843 537.88164 1.7248857 ## Par1:is.self.Ftime-Ftime 10001.7570 5456.30125 1.8330654 ## cross.id:is.cross.Ftime-Ftime 361.8958 148.53670 2.4364068 ## overlay(Par1, Par2):cross.type.Ftime-Ftime 17.9799 19.92428 0.9024114 ## cross.id:cross.type.Ftime-Ftime 30.9519 46.43908 0.6665054 ## units.Ftime-Ftime 416.09922 447.2101 0.93043333 }
This dataset contains phenotpic data for 18 traits measured in 187 individuals from a potato diversity panel. In addition contains genotypic data for 221 individuals genotyped with 3522 SNP markers. Please if using this data for your own research make sure you cite Rosyara's (2015) publication (see References).
data("DT_polyploid")data("DT_polyploid")
The format is: chr "DT_polyploid"
This data was extracted from Rosyara (2016).
If using this data for your own research please cite:
Rosyara Umesh R., Walter S. De Jong, David S. Douches, Jeffrey B. Endelman. Software for genome-wide association studies in autopolyploids and its application to potato. The Plant Genome 2015.
Other references:
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
data(DT_polyploid) DT <- DT_polyploid GT <- GT_polyploid MP <- MP_polyploid ## convert markers to numeric format numo <- atcg1234(data=GT[,1:100], ploidy=4); numo$M[1:5,1:5]; numo$ref.allele[,1:5] ## plants with both genotypes and phenotypes common <- intersect(DT$Name,rownames(numo$M)) ## get the markers and phenotypes for such inds marks <- numo$M[common,]; marks[1:5,1:5] DT2 <- DT[match(common,DT$Name),]; DT2 <- as.data.frame(DT2) DT2[1:5,] ## Additive relationship matrix, specify ploidy A <- A.matr(marks) ############################################## ############### sommer. ################# ############################################## if(requireNamespace("sommer")){ library(sommer) A <- A.mat(marks) D <- D.mat(marks) ###=========================================#### ### run as mixed model ###=========================================#### ans <- mmes(tuber_shape~1, random=~vsm(ism(Name), Gu=A), data=DT2) summary(ans)$varcomp # if using mmes=TRUE provide Gu as inverse Ai <- solve(A + diag(1e-4,ncol(A),ncol(A))) Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(Ai, 'inverse')=TRUE } ############################################## ############### lme4breeding ################# ############################################## if(requireNamespace("lme4breeding")){ library(lme4breeding) ## run as mixed model A <- A + diag(1e-4,ncol(A),ncol(A)) ans <- lmeb(tuber_shape~ (1|Name), relmat = list(Name=A), data=DT2) vc <- VarCorr(ans); print(vc,comp=c("Variance")) sigma(ans)^2 # error variance BLUP <- ranef(ans, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs }data(DT_polyploid) DT <- DT_polyploid GT <- GT_polyploid MP <- MP_polyploid ## convert markers to numeric format numo <- atcg1234(data=GT[,1:100], ploidy=4); numo$M[1:5,1:5]; numo$ref.allele[,1:5] ## plants with both genotypes and phenotypes common <- intersect(DT$Name,rownames(numo$M)) ## get the markers and phenotypes for such inds marks <- numo$M[common,]; marks[1:5,1:5] DT2 <- DT[match(common,DT$Name),]; DT2 <- as.data.frame(DT2) DT2[1:5,] ## Additive relationship matrix, specify ploidy A <- A.matr(marks) ############################################## ############### sommer. ################# ############################################## if(requireNamespace("sommer")){ library(sommer) A <- A.mat(marks) D <- D.mat(marks) ###=========================================#### ### run as mixed model ###=========================================#### ans <- mmes(tuber_shape~1, random=~vsm(ism(Name), Gu=A), data=DT2) summary(ans)$varcomp # if using mmes=TRUE provide Gu as inverse Ai <- solve(A + diag(1e-4,ncol(A),ncol(A))) Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(Ai, 'inverse')=TRUE } ############################################## ############### lme4breeding ################# ############################################## if(requireNamespace("lme4breeding")){ library(lme4breeding) ## run as mixed model A <- A + diag(1e-4,ncol(A),ncol(A)) ans <- lmeb(tuber_shape~ (1|Name), relmat = list(Name=A), data=DT2) vc <- VarCorr(ans); print(vc,comp=c("Variance")) sigma(ans)^2 # error variance BLUP <- ranef(ans, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs }
Information from a collection of 413 rice lines. The DT_rice data set is from Rice Diversity Org. Program. The lines are genotyped with 36,901 SNP markers and phenotyped for more than 30 traits. This data set was included in the package to play with it. If using it for your research make sure you cite the original publication from Zhao et al.(2011).
data(DT_rice)data(DT_rice)
RicePheno contains the phenotypes RiceGeno contains genotypes letter code RiceGenoN contains the genotypes in numerical code using atcg1234 converter function
Rice Diversity Organization http://www.ricediversity.org/data/index.cfm.
Keyan Zhao, Chih-Wei Tung, Georgia C. Eizenga, Mark H. Wright, M. Liakat Ali, Adam H. Price, Gareth J. Norton, M. Rafiqul Islam, Andy Reynolds, Jason Mezey, Anna M. McClung, Carlos D. Bustamante & Susan R. McCouch (2011). Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat Comm 2:467 DOI: 10.1038/ncomms1467, Published Online 13 Sep 2011.
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
data(DT_rice) DT <- DT_rice GT <- GT_rice GTn <- GTn_rice head(DT) M <- atcg1234(GT) ############################################## ############### sommer. ################# ############################################## if(requireNamespace("sommer")){ library(sommer) A <- A.mat(M$M) mix <- mmes(Protein.content~1, random = ~vsm(ism(geno), Gu=A) + geno, rcov=~units, data=DT) summary(mix)$varcomp # if using henderson=TRUE provide Gu as inverse Ai <- solve(A + diag(1e-6,ncol(A),ncol(A))) Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(Ai, 'inverse')=TRUE ## MULTI-TRAIT MODEL ## reshape in long format the dataset traits <- c("Protein.content","Flag.leaf.length") DTL <- reshape(DT[,c("geno", traits)], idvar = "geno", varying = traits, v.names = "value", direction = "long", timevar = "trait", times = traits ) DTL <- DTL[with(DTL, order(trait)), ] head(DTL) M <- DTLM <- atcg1234(GT) A <- A.mat(M$M) mix <- mmes(value~trait, random = ~vsm(usm(trait),ism(geno), Gu=A) , rcov=~vsm(dsm(trait), ism(units)), data=DTL) summary(mix)$varcomp cov2cor(mix$theta$`vsm(usm(trait), ism(geno), Gu = A`) } ############################################## ############### lme4breeding ################# ############################################## if(requireNamespace("lme4breeding")){ library(lme4breeding) ### univariate model A <- A.matr(M$M) A <- A + diag(1e-4, ncol(A), ncol(A)) mix <- lmeb(Protein.content ~ (1|geno), relmat = list(geno=A), data=DT) vc <- VarCorr(mix); print(vc,comp=c("Variance")) sigma(mix)^2 # error variance ### multi-trait model traits <- c("Flowering.time.at.Arkansas" ,"Seed.volume", "Protein.content") for(iTrait in traits){DT[,iTrait] <- scale(DT[,iTrait])} DTL <- reshape(DT[,c("geno", traits)], idvar = "geno", varying = traits, v.names = "value", direction = "long", timevar = "trait", times = traits ) DTL <- DTL[with(DTL, order(trait, geno)), ] head(DTL) system.time( mix <- lmeb(value ~ (0+trait|geno), relmat = list(geno=A), rotation = TRUE, data=DTL) ) vc <- VarCorr(mix); print(vc,comp=c("Variance")) vc$geno sigma(mix)^2 # error variance BLUP <- ranef(mix, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs }data(DT_rice) DT <- DT_rice GT <- GT_rice GTn <- GTn_rice head(DT) M <- atcg1234(GT) ############################################## ############### sommer. ################# ############################################## if(requireNamespace("sommer")){ library(sommer) A <- A.mat(M$M) mix <- mmes(Protein.content~1, random = ~vsm(ism(geno), Gu=A) + geno, rcov=~units, data=DT) summary(mix)$varcomp # if using henderson=TRUE provide Gu as inverse Ai <- solve(A + diag(1e-6,ncol(A),ncol(A))) Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(Ai, 'inverse')=TRUE ## MULTI-TRAIT MODEL ## reshape in long format the dataset traits <- c("Protein.content","Flag.leaf.length") DTL <- reshape(DT[,c("geno", traits)], idvar = "geno", varying = traits, v.names = "value", direction = "long", timevar = "trait", times = traits ) DTL <- DTL[with(DTL, order(trait)), ] head(DTL) M <- DTLM <- atcg1234(GT) A <- A.mat(M$M) mix <- mmes(value~trait, random = ~vsm(usm(trait),ism(geno), Gu=A) , rcov=~vsm(dsm(trait), ism(units)), data=DTL) summary(mix)$varcomp cov2cor(mix$theta$`vsm(usm(trait), ism(geno), Gu = A`) } ############################################## ############### lme4breeding ################# ############################################## if(requireNamespace("lme4breeding")){ library(lme4breeding) ### univariate model A <- A.matr(M$M) A <- A + diag(1e-4, ncol(A), ncol(A)) mix <- lmeb(Protein.content ~ (1|geno), relmat = list(geno=A), data=DT) vc <- VarCorr(mix); print(vc,comp=c("Variance")) sigma(mix)^2 # error variance ### multi-trait model traits <- c("Flowering.time.at.Arkansas" ,"Seed.volume", "Protein.content") for(iTrait in traits){DT[,iTrait] <- scale(DT[,iTrait])} DTL <- reshape(DT[,c("geno", traits)], idvar = "geno", varying = traits, v.names = "value", direction = "long", timevar = "trait", times = traits ) DTL <- DTL[with(DTL, order(trait, geno)), ] head(DTL) system.time( mix <- lmeb(value ~ (0+trait|geno), relmat = list(geno=A), rotation = TRUE, data=DTL) ) vc <- VarCorr(mix); print(vc,comp=c("Variance")) vc$geno sigma(mix)^2 # error variance BLUP <- ranef(mix, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs }
The average reaction time per day for subjects in a sleep deprivation study. On day 0 the subjects had their normal amount of sleep. Starting that night they were restricted to 3 hours of sleep per night. The observations represent the average reaction time on a series of tests given each day to each subject. Data from sleepstudy to see how lme4 models can be translated in sommer.
data("DT_sleepstudy")data("DT_sleepstudy")
The format is: chr "DT_sleepstudy"
These data are from the study described in Belenky et al. (2003), for the sleep deprived group and for the first 10 days of the study, up to the recovery period.
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Gregory Belenky et al. (2003) Patterns of performance degradation and restoration during sleep restrictions and subsequent recovery: a sleep dose-response study. Journal of Sleep Research 12, 1-12.
data(DT_sleepstudy) DT <- DT_sleepstudy head(DT) ############################################## ############### sommer. ################# ############################################## if(requireNamespace("sommer")){ library(sommer) ################################## fm2 <- mmes(Reaction ~ Days, random= ~ Subject, data=DT, tolParInv = 1e-6, verbose = FALSE) summary(fm2)$varcomp ################################## fm2 <- mmes(Reaction ~ Days, random= ~ Subject + vsm(ism(Days), ism(Subject)), data=DT, tolParInv = 1e-6, verbose = FALSE) summary(fm2)$varcomp ################################## fm2 <- mmes(Reaction ~ Days, random= ~ Subject + vsm(ism(Days), ism(Subject)), data=DT, tolParInv = 1e-6, verbose = FALSE) summary(fm2)$varcomp ################################## fm2 <- mmes(Reaction ~ Days, random= ~ vsm(ism(Days), ism(Subject)), data=DT, tolParInv = 1e-6, verbose = FALSE) summary(fm2)$varcomp } ############################################## ############### lme4breeding ################# ############################################## if(requireNamespace("lme4breeding")){ library(lme4breeding) ################################## fm1 <- lmeb(Reaction ~ Days + (1 | Subject), data=DT) vc <- VarCorr(fm1); print(vc,comp=c("Variance")) sigma(fm1)^2 # error variance ################################## fm1 <- lmeb(Reaction ~ Days + (Days || Subject), data=DT) vc <- VarCorr(fm1); print(vc,comp=c("Variance")) ################################## fm1 <- lmeb(Reaction ~ Days + (Days | Subject), data=DT) vc <- VarCorr(fm1); print(vc,comp=c("Variance")) ################################## fm1 <- lmeb(Reaction ~ Days + (0 + Days | Subject), data=DT) vc <- VarCorr(fm1); print(vc,comp=c("Variance")) BLUP <- ranef(fm1, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs }data(DT_sleepstudy) DT <- DT_sleepstudy head(DT) ############################################## ############### sommer. ################# ############################################## if(requireNamespace("sommer")){ library(sommer) ################################## fm2 <- mmes(Reaction ~ Days, random= ~ Subject, data=DT, tolParInv = 1e-6, verbose = FALSE) summary(fm2)$varcomp ################################## fm2 <- mmes(Reaction ~ Days, random= ~ Subject + vsm(ism(Days), ism(Subject)), data=DT, tolParInv = 1e-6, verbose = FALSE) summary(fm2)$varcomp ################################## fm2 <- mmes(Reaction ~ Days, random= ~ Subject + vsm(ism(Days), ism(Subject)), data=DT, tolParInv = 1e-6, verbose = FALSE) summary(fm2)$varcomp ################################## fm2 <- mmes(Reaction ~ Days, random= ~ vsm(ism(Days), ism(Subject)), data=DT, tolParInv = 1e-6, verbose = FALSE) summary(fm2)$varcomp } ############################################## ############### lme4breeding ################# ############################################## if(requireNamespace("lme4breeding")){ library(lme4breeding) ################################## fm1 <- lmeb(Reaction ~ Days + (1 | Subject), data=DT) vc <- VarCorr(fm1); print(vc,comp=c("Variance")) sigma(fm1)^2 # error variance ################################## fm1 <- lmeb(Reaction ~ Days + (Days || Subject), data=DT) vc <- VarCorr(fm1); print(vc,comp=c("Variance")) ################################## fm1 <- lmeb(Reaction ~ Days + (Days | Subject), data=DT) vc <- VarCorr(fm1); print(vc,comp=c("Variance")) ################################## fm1 <- lmeb(Reaction ~ Days + (0 + Days | Subject), data=DT) vc <- VarCorr(fm1); print(vc,comp=c("Variance")) BLUP <- ranef(fm1, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs }
This dataset contains phenotpic data for 2 traits measured in 1254 single cross hybrids coming from the cross of Flint x Dent heterotic groups. In addition contains the genotipic data (35,478 markers) for each of the 123 Dent lines and 86 Flint lines. The purpose of this data is to demosntrate the prediction of unrealized crosses (9324 unrealized crosses, 1254 evaluated, total 10578 single crosses). We have added the additive relationship matrix (A) but can be easily obtained using the A.matr function on the marker data. Please if using this data for your own research cite Technow et al. (2014) publication (see References).
data("DT_technow")data("DT_technow")
The format is: chr "DT_technow"
This data was extracted from Technow et al. (2014).
If using this data for your own research please cite:
Technow et al. 2014. Genome properties and prospects of genomic predictions of hybrid performance in a Breeding program of maize. Genetics 197:1343-1355.
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Giovanny Covarrubias-Pazaran (2024). evola: a simple evolutionary algorithm for complex problems. To be submitted to Bioinformatics.
Gaynor, R. Chris, Gregor Gorjanc, and John M. Hickey. 2021. AlphaSimR: an R package for breeding program simulations. G3 Gene|Genomes|Genetics 11(2):jkaa017. https://doi.org/10.1093/g3journal/jkaa017.
Chen GK, Marjoram P, Wall JD (2009). Fast and Flexible Simulation of DNA Sequence Data. Genome Research, 19, 136-142. http://genome.cshlp.org/content/19/1/136.
data(DT_technow) DT <- DT_technow Md <- apply(Md_technow,2,as.numeric) rownames(Md) <- rownames(Md_technow) Mf <- apply(Mf_technow,2,as.numeric) rownames(Mf) <- rownames(Mf_technow) Md <- (Md*2) - 1 Mf <- (Mf*2) - 1 Ad <- A.matr(Md) Af <- A.matr(Mf) Ad <- Ad + diag(1e-4, ncol(Ad), ncol(Ad)) Af <- Af + diag(1e-4, ncol(Af), ncol(Af)) ################# sommer #################### if(requireNamespace("sommer")){ library(sommer) ans2 <- mmes(GY~1, random=~vsm(ism(dent),Gu=Ad) + vsm(ism(flint),Gu=Af), rcov=~units, data=DT) summary(ans2)$varcomp Adi <- solve(Ad + diag(1e-4,ncol(Ad),ncol(Ad))) Adi <- as(as(as( Adi, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(Adi, 'inverse')=TRUE Afi <- solve(Af + diag(1e-4,ncol(Af),ncol(Af))) Afi <- as(as(as( Afi, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(Afi, 'inverse')=TRUE ####====================#### #### multivariate model #### #### 2 traits #### ####====================#### head(DT) traits <- c("GY","GM") DT[,traits] <- apply(DT[,traits],2,scale) DTL <- reshape(DT[,c("hybrid","dent","flint", traits)], idvar = c("hybrid","dent","flint"), varying = traits, v.names = "value", direction = "long", timevar = "trait", times = traits ) DTL <- DTL[with(DTL, order(trait,hybrid)), ] head(DTL) M <- rbind(Md,Mf) A <- A.mat(M) Ai <- solve(A + diag(1e-4,ncol(A),ncol(A))) Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(Ai, 'inverse')=TRUE ans3 <- mmes(value~trait, henderson=TRUE, random=~vsm(usm(trait),ism(overlay(dent,flint)),Gu=Ai), rcov=~ vsm(dsm(trait), ism(units)), data=DTL) summary(ans3) cov2cor(ans3$theta[[1]]) } ################# lme4breeding #################### if(requireNamespace("lme4breeding")){ library(lme4breeding) ## simple model ans2 <- lmeb(GY ~ (1|dent) + (1|flint), data=DT) vc <- VarCorr(ans2); print(vc,comp=c("Variance")) BLUP <- ranef(ans2, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs ### with relationship matrices ans2 <- lmeb(GY ~ (1|dent) + (1|flint), relmat = list(dent=Ad, flint=Af), data=DT) vc <- VarCorr(ans2); print(vc,comp=c("Variance")) ### overlayed model M <- rbind(Md,Mf) A <- A.matr(M) A <- A + diag(1e-4,ncol(A), ncol(A)) Z <- with(DT, overlay(dent,flint) ) Z = Z[which(!is.na(DT$GY)),] #### model using overlay without relationship matrix ans2 <- lmeb(GY ~ (1|fema), addmat = list(fema=Z), relmat = list(fema=A), data=DT) vc <- VarCorr(ans2); print(vc,comp=c("Variance")) sigma(ans2)^2 # error variance BLUP <- ranef(ans2, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs ### rotated model for hybrids H <- kronecker(Ad,Af, make.dimnames = TRUE) H <- H[which(colnames(H)%in% DT$hy),which(colnames(H)%in% DT$hy)] ans3 <- lmeb(GY ~ (1|hy), relmat = list(hy=H), rotation=TRUE, data=DT) vc <- VarCorr(ans3); print(vc,comp=c("Variance")) BLUP <- ranef(ans3, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs } ################# evola #################### if(requireNamespace("evola")){ library(evola) DT <- DT_technow DT$occ <- 1; DT$occ[1]=0 combos <- build.HMM(Md,Mf, return.combos.only = TRUE) combos <- combos$data.used[which(combos$data.used$hybrid %in%DT$hy),] M <- build.HMM(Md,Mf, custom.hyb = combos) A <- A.matr(M$HMM.add) A <- A[DT$hy,DT$hy] # run the genetic algorithm # we assig a weight to x'Dx of (20*pi)/180=0.34 res<-evolafit(formula = c(GY, occ)~hy, dt= DT, # constraints: if sum is greater than this ignore constraintsUB = c(Inf,100), # constraints: if sum is smaller than this ignore constraintsLB= c(-Inf,-Inf), # weight the traits for the selection b = c(1,0), # population parameters nCrosses = 100, nProgeny = 10, recombGens=1, nChr=1, mutRateAllele=0, # coancestry parameters D=A, lambda= (20*pi)/180 , nQtlStart = 90, # selection parameters propSelBetween = 0.5, propSelWithin =0.5, nGenerations = 20) Q <- pullQtlGeno(res$pop, simParam = res$simParam, trait=1); Q <- Q/2 best = bestSol(res$pop)[,"fitness"] qa = (Q %*% DT$GY)[best,]; qa qAq = Q[best,] %*% A %*% Q[best,]; qAq sum(Q[best,]) # total # of inds selected evolmonitor(res) plot(DT$GY, col=as.factor(Q[best,]), pch=(Q[best,]*19)+1) pareto(res) } ################## end ###################data(DT_technow) DT <- DT_technow Md <- apply(Md_technow,2,as.numeric) rownames(Md) <- rownames(Md_technow) Mf <- apply(Mf_technow,2,as.numeric) rownames(Mf) <- rownames(Mf_technow) Md <- (Md*2) - 1 Mf <- (Mf*2) - 1 Ad <- A.matr(Md) Af <- A.matr(Mf) Ad <- Ad + diag(1e-4, ncol(Ad), ncol(Ad)) Af <- Af + diag(1e-4, ncol(Af), ncol(Af)) ################# sommer #################### if(requireNamespace("sommer")){ library(sommer) ans2 <- mmes(GY~1, random=~vsm(ism(dent),Gu=Ad) + vsm(ism(flint),Gu=Af), rcov=~units, data=DT) summary(ans2)$varcomp Adi <- solve(Ad + diag(1e-4,ncol(Ad),ncol(Ad))) Adi <- as(as(as( Adi, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(Adi, 'inverse')=TRUE Afi <- solve(Af + diag(1e-4,ncol(Af),ncol(Af))) Afi <- as(as(as( Afi, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(Afi, 'inverse')=TRUE ####====================#### #### multivariate model #### #### 2 traits #### ####====================#### head(DT) traits <- c("GY","GM") DT[,traits] <- apply(DT[,traits],2,scale) DTL <- reshape(DT[,c("hybrid","dent","flint", traits)], idvar = c("hybrid","dent","flint"), varying = traits, v.names = "value", direction = "long", timevar = "trait", times = traits ) DTL <- DTL[with(DTL, order(trait,hybrid)), ] head(DTL) M <- rbind(Md,Mf) A <- A.mat(M) Ai <- solve(A + diag(1e-4,ncol(A),ncol(A))) Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(Ai, 'inverse')=TRUE ans3 <- mmes(value~trait, henderson=TRUE, random=~vsm(usm(trait),ism(overlay(dent,flint)),Gu=Ai), rcov=~ vsm(dsm(trait), ism(units)), data=DTL) summary(ans3) cov2cor(ans3$theta[[1]]) } ################# lme4breeding #################### if(requireNamespace("lme4breeding")){ library(lme4breeding) ## simple model ans2 <- lmeb(GY ~ (1|dent) + (1|flint), data=DT) vc <- VarCorr(ans2); print(vc,comp=c("Variance")) BLUP <- ranef(ans2, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs ### with relationship matrices ans2 <- lmeb(GY ~ (1|dent) + (1|flint), relmat = list(dent=Ad, flint=Af), data=DT) vc <- VarCorr(ans2); print(vc,comp=c("Variance")) ### overlayed model M <- rbind(Md,Mf) A <- A.matr(M) A <- A + diag(1e-4,ncol(A), ncol(A)) Z <- with(DT, overlay(dent,flint) ) Z = Z[which(!is.na(DT$GY)),] #### model using overlay without relationship matrix ans2 <- lmeb(GY ~ (1|fema), addmat = list(fema=Z), relmat = list(fema=A), data=DT) vc <- VarCorr(ans2); print(vc,comp=c("Variance")) sigma(ans2)^2 # error variance BLUP <- ranef(ans2, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs ### rotated model for hybrids H <- kronecker(Ad,Af, make.dimnames = TRUE) H <- H[which(colnames(H)%in% DT$hy),which(colnames(H)%in% DT$hy)] ans3 <- lmeb(GY ~ (1|hy), relmat = list(hy=H), rotation=TRUE, data=DT) vc <- VarCorr(ans3); print(vc,comp=c("Variance")) BLUP <- ranef(ans3, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs } ################# evola #################### if(requireNamespace("evola")){ library(evola) DT <- DT_technow DT$occ <- 1; DT$occ[1]=0 combos <- build.HMM(Md,Mf, return.combos.only = TRUE) combos <- combos$data.used[which(combos$data.used$hybrid %in%DT$hy),] M <- build.HMM(Md,Mf, custom.hyb = combos) A <- A.matr(M$HMM.add) A <- A[DT$hy,DT$hy] # run the genetic algorithm # we assig a weight to x'Dx of (20*pi)/180=0.34 res<-evolafit(formula = c(GY, occ)~hy, dt= DT, # constraints: if sum is greater than this ignore constraintsUB = c(Inf,100), # constraints: if sum is smaller than this ignore constraintsLB= c(-Inf,-Inf), # weight the traits for the selection b = c(1,0), # population parameters nCrosses = 100, nProgeny = 10, recombGens=1, nChr=1, mutRateAllele=0, # coancestry parameters D=A, lambda= (20*pi)/180 , nQtlStart = 90, # selection parameters propSelBetween = 0.5, propSelWithin =0.5, nGenerations = 20) Q <- pullQtlGeno(res$pop, simParam = res$simParam, trait=1); Q <- Q/2 best = bestSol(res$pop)[,"fitness"] qa = (Q %*% DT$GY)[best,]; qa qAq = Q[best,] %*% A %*% Q[best,]; qAq sum(Q[best,]) # total # of inds selected evolmonitor(res) plot(DT$GY, col=as.factor(Q[best,]), pch=(Q[best,]*19)+1) pareto(res) } ################## end ###################
Information from a collection of 599 historical CIMMYT wheat lines. The wheat data set is from CIMMYT's Global Wheat Program. Historically, this program has conducted numerous international trials across a wide variety of wheat-producing environments. The environments represented in these trials were grouped into four basic target sets of environments comprising four main agroclimatic regions previously defined and widely used by CIMMYT's Global Wheat Breeding Program. The phenotypic trait considered here was the average grain yield (GY) of the 599 wheat lines evaluated in each of these four mega-environments.
A pedigree tracing back many generations was available, and the Browse application of the International Crop Information System (ICIS), as described in (McLaren et al. 2000, 2005) was used for deriving the relationship matrix A among the 599 lines; it accounts for selection and inbreeding.
Wheat lines were recently genotyped using 1447 Diversity Array Technology (DArT) generated by
Triticarte Pty. Ltd. (Canberra, Australia; http://www.triticarte.com.au). The DArT markers
may take on two values, denoted by their presence or absence. Markers with a minor allele frequency
lower than 0.05 were removed, and missing genotypes were imputed with samples from the marginal
distribution of marker genotypes, that is, , where
is the estimated allele frequency computed from the non-missing genotypes. The number of DArT
MMs after edition was 1279.
data(DT_wheat)data(DT_wheat)
Matrix Y contains the average grain yield, column 1: Grain yield for environment 1 and so on.
International Maize and Wheat Improvement Center (CIMMYT), Mexico.
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
McLaren, C. G., L. Ramos, C. Lopez, and W. Eusebio. 2000. “Applications of the geneaology manegment system.” In International Crop Information System. Technical Development Manual, version VI, edited by McLaren, C. G., J.W. White and P.N. Fox. pp. 5.8-5.13. CIMMyT, Mexico: CIMMyT and IRRI.
Giovanny Covarrubias-Pazaran (2024). evola: a simple evolutionary algorithm for complex problems. To be submitted to Bioinformatics.
Gaynor, R. Chris, Gregor Gorjanc, and John M. Hickey. 2021. AlphaSimR: an R package for breeding program simulations. G3 Gene|Genomes|Genetics 11(2):jkaa017. https://doi.org/10.1093/g3journal/jkaa017.
Chen GK, Marjoram P, Wall JD (2009). Fast and Flexible Simulation of DNA Sequence Data. Genome Research, 19, 136-142. http://genome.cshlp.org/content/19/1/136.
data(DT_wheat) DT <- DT_wheat GT <- apply(GT_wheat,2,as.numeric) rownames(GT) <- rownames(GT_wheat) DT <- data.frame(pheno=as.vector(DT), env=as.factor(paste0("e", sort(rep(1:4,nrow(DT))))), id=rep(rownames(DT),4)) K <- tcrossprod(GT-1) m <- sum(diag(K))/nrow(K) K <- K/m K <- K + diag(1e-05, ncol(K), ncol(K)) K[1:4,1:4] ## head(DT) ############## sommer ############### if(requireNamespace("sommer")){ library(sommer) mix0 <- mmes(pheno~1, random = ~vsm(ism(id),Gu=K), rcov=~units, data=DT[which(DT$env == "e1"),]) summary(mix0)$varcomp # if using mmes=TRUE provide Gu as inverse Ki <- solve(K + diag(1e-4,ncol(K),ncol(K))) Ki <- as(as(as( Ki, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(Ki, 'inverse')=TRUE } ############## lme4breeding ############### if(requireNamespace("lme4breeding")){ library(lme4breeding) #### main effect model system.time( mix0 <- lmeb(pheno ~ (1|id), relmat = list(id=K), data=DT) ) vc <- VarCorr(mix0); print(vc,comp=c("Variance")) sigma(mix0)^2 # error variance BLUP <- ranef(mix0, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs #### unstructured model DT <- DT[with(DT, order(env, id)), ] # sort for rotation system.time( mix1 <- lmeb(pheno ~ (0 + env | id), relmat = list(id=K), rotation = TRUE, data=DT) ) vc <- VarCorr(mix1); print(vc,comp=c("Variance")) sigma(mix1)^2 # error variance } ############## evola ############### if(requireNamespace("evola")){ library(evola) DT <- as.data.frame(DT_wheat) DT$id <- rownames(DT) # IDs DT$occ <- 1; DT$occ[1]=0 # to track occurrences DT$dummy <- 1; DT$dummy[1]=0 # dummy trait # if genomic G <- K # if pedigree A <- A_wheat A[1:4,1:4] ##Perform eigenvalue decomposition for clustering ##And select cluster 5 as target set to predict pcNum=25 svdWheat <- svd(A, nu = pcNum, nv = pcNum) PCWheat <- A %*% svdWheat$v rownames(PCWheat) <- rownames(A) DistWheat <- dist(PCWheat) TreeWheat <- cutree(hclust(DistWheat), k = 5 ) plot(PCWheat[,1], PCWheat[,2], col = TreeWheat, pch = as.character(TreeWheat), xlab = "pc1", ylab = "pc2") vp <- rownames(PCWheat)[TreeWheat == 3]; length(vp) tp <- setdiff(rownames(PCWheat),vp) As <- A[tp,tp] DT2 <- DT[rownames(As),] DT2$cov <- apply(A[tp,vp],1,mean) # we assign a weight to x'Dx of (60*pi)/180=1 res<-evolafit(formula=cbind(cov, occ)~id, dt= DT2, # constraints: if sum is greater than this ignore constraintsUB = c(Inf, 100), # constraints: if sum is smaller than this ignore constraintsLB= c(-Inf, -Inf), # weight the traits for the selection b = c(1,0), # population parameters nCrosses = 100, nProgeny = 10, recombGens=1, nChr=1, mutRateAllele=0, # coancestry parameters D=As, lambda= (60*pi)/180 , nQtlStart = 90, # selection parameters propSelBetween = 0.5, propSelWithin =0.5, nGenerations = 30, verbose = TRUE) Q <- pullQtlGeno(res$pop, simParam = res$simParam, trait=1); Q <- Q/2 best <- bestSol(res$pop)[,"fitness"] sum(Q[best,]) # total # of inds selected pareto(res) } ################ end ##################data(DT_wheat) DT <- DT_wheat GT <- apply(GT_wheat,2,as.numeric) rownames(GT) <- rownames(GT_wheat) DT <- data.frame(pheno=as.vector(DT), env=as.factor(paste0("e", sort(rep(1:4,nrow(DT))))), id=rep(rownames(DT),4)) K <- tcrossprod(GT-1) m <- sum(diag(K))/nrow(K) K <- K/m K <- K + diag(1e-05, ncol(K), ncol(K)) K[1:4,1:4] ## head(DT) ############## sommer ############### if(requireNamespace("sommer")){ library(sommer) mix0 <- mmes(pheno~1, random = ~vsm(ism(id),Gu=K), rcov=~units, data=DT[which(DT$env == "e1"),]) summary(mix0)$varcomp # if using mmes=TRUE provide Gu as inverse Ki <- solve(K + diag(1e-4,ncol(K),ncol(K))) Ki <- as(as(as( Ki, "dMatrix"), "generalMatrix"), "CsparseMatrix") attr(Ki, 'inverse')=TRUE } ############## lme4breeding ############### if(requireNamespace("lme4breeding")){ library(lme4breeding) #### main effect model system.time( mix0 <- lmeb(pheno ~ (1|id), relmat = list(id=K), data=DT) ) vc <- VarCorr(mix0); print(vc,comp=c("Variance")) sigma(mix0)^2 # error variance BLUP <- ranef(mix0, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs #### unstructured model DT <- DT[with(DT, order(env, id)), ] # sort for rotation system.time( mix1 <- lmeb(pheno ~ (0 + env | id), relmat = list(id=K), rotation = TRUE, data=DT) ) vc <- VarCorr(mix1); print(vc,comp=c("Variance")) sigma(mix1)^2 # error variance } ############## evola ############### if(requireNamespace("evola")){ library(evola) DT <- as.data.frame(DT_wheat) DT$id <- rownames(DT) # IDs DT$occ <- 1; DT$occ[1]=0 # to track occurrences DT$dummy <- 1; DT$dummy[1]=0 # dummy trait # if genomic G <- K # if pedigree A <- A_wheat A[1:4,1:4] ##Perform eigenvalue decomposition for clustering ##And select cluster 5 as target set to predict pcNum=25 svdWheat <- svd(A, nu = pcNum, nv = pcNum) PCWheat <- A %*% svdWheat$v rownames(PCWheat) <- rownames(A) DistWheat <- dist(PCWheat) TreeWheat <- cutree(hclust(DistWheat), k = 5 ) plot(PCWheat[,1], PCWheat[,2], col = TreeWheat, pch = as.character(TreeWheat), xlab = "pc1", ylab = "pc2") vp <- rownames(PCWheat)[TreeWheat == 3]; length(vp) tp <- setdiff(rownames(PCWheat),vp) As <- A[tp,tp] DT2 <- DT[rownames(As),] DT2$cov <- apply(A[tp,vp],1,mean) # we assign a weight to x'Dx of (60*pi)/180=1 res<-evolafit(formula=cbind(cov, occ)~id, dt= DT2, # constraints: if sum is greater than this ignore constraintsUB = c(Inf, 100), # constraints: if sum is smaller than this ignore constraintsLB= c(-Inf, -Inf), # weight the traits for the selection b = c(1,0), # population parameters nCrosses = 100, nProgeny = 10, recombGens=1, nChr=1, mutRateAllele=0, # coancestry parameters D=As, lambda= (60*pi)/180 , nQtlStart = 90, # selection parameters propSelBetween = 0.5, propSelWithin =0.5, nGenerations = 30, verbose = TRUE) Q <- pullQtlGeno(res$pop, simParam = res$simParam, trait=1); Q <- Q/2 best <- bestSol(res$pop)[,"fitness"] sum(Q[best,]) # total # of inds selected pareto(res) } ################ end ##################
The yield of oats from a split-plot field trial using three varieties and four levels of manurial treatment. The experiment was laid out in 6 blocks of 3 main plots, each split into 4 sub-plots. The varieties were applied to the main plots and the manurial (nitrogen) treatments to the sub-plots.
blockblock factor with 6 levels
nitronitrogen treatment in hundredweight per acre
Varietygenotype factor, 3 levels
yieldyield in 1/4 lbs per sub-plot, each 1/80 acre.
rowrow location
columncolumn location
Yates, Frank (1935) Complex experiments, Journal of the Royal Statistical Society Suppl. 2, 181–247.
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.
data(DT_yatesoats) DT <- DT_yatesoats head(DT) ############################################## ############### sommer. ################# ############################################## if(requireNamespace("sommer")){ library(sommer) m3 <- mmes(fixed=Y ~ V + N + V:N, random = ~ B + B:MP, rcov=~units, data = DT) summary(m3)$varcomp } ############################################## ############### lme4breeding ################# ############################################## if(requireNamespace("lme4breeding")){ library(lme4breeding) mix <- lmeb(Y ~ V + N + V:N + (1|B) + (1|B:MP), data = DT) vc <- VarCorr(mix); print(vc,comp=c("Variance")) sigma(mix)^2 # error variance BLUP <- ranef(mix, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs }data(DT_yatesoats) DT <- DT_yatesoats head(DT) ############################################## ############### sommer. ################# ############################################## if(requireNamespace("sommer")){ library(sommer) m3 <- mmes(fixed=Y ~ V + N + V:N, random = ~ B + B:MP, rcov=~units, data = DT) summary(m3)$varcomp } ############################################## ############### lme4breeding ################# ############################################## if(requireNamespace("lme4breeding")){ library(lme4breeding) mix <- lmeb(Y ~ V + N + V:N + (1|B) + (1|B:MP), data = DT) vc <- VarCorr(mix); print(vc,comp=c("Variance")) sigma(mix)^2 # error variance BLUP <- ranef(mix, condVar=TRUE) condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs }
Shortcut to get an identity relationship matrix.
I.matr(x)I.matr(x)
x |
Vector of values to identify the unique values and form an iende. |
Nothing but a shortcut to avoid few lines of code.
If return.imputed = FALSE, the identity matrix is returned.
the I matrix
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
## random population of 200 lines with 1000 markers## random population of 200 lines with 1000 markers
This function is a very simple function to impute a numeric or character vector with the mean or median value of the vector.
imputev(x, method="median", by=NULL)imputev(x, method="median", by=NULL)
x |
a numeric or character vector. |
method |
the method to choose between mean or median. |
by |
an alternative vector with values declaring the method to be applied by. |
a numeric or character vector imputed with the method selected.
Giovanny Covarrubias-Pazaran
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
####=========================================#### #### generate your mickey mouse -log10(p-values) ####=========================================#### set.seed(1253) x <- rnorm(100) x[sample(1:100,10)] <- NA imputev(x)####=========================================#### #### generate your mickey mouse -log10(p-values) ####=========================================#### set.seed(1253) x <- rnorm(100) x[sample(1:100,10)] <- NA imputev(x)
jet.colors(n) generates a sequence of
colors from dark blue to cyan to yellow to dark red. It
is similar to the default color schemes in Python's
matplotlib or MATLAB.
jet.colors(n, alpha = 1)jet.colors(n, alpha = 1)
n |
The number of colors to return. |
alpha |
The transparency value of the colors. See
|
A vector of colors along the jet colorramp.
{ # Plot a colorbar with jet.colors image(matrix(seq(100), 100), col=jet.colors(100)) }{ # Plot a colorbar with jet.colors image(matrix(seq(100), 100), col=jet.colors(100)) }
This function calculates the LD decay based on a marker matrix and a map with distances between markers in cM or base pairs.
LD.decay(markers,map,silent=FALSE,unlinked=FALSE,gamma=0.95)LD.decay(markers,map,silent=FALSE,unlinked=FALSE,gamma=0.95)
markers |
a numeric matrix of markers (columns) by individuals (rows) in -1, 0, 1 format. |
map |
a data frame with 3 columns "Locus" (name of markers), "LG" (linkage group or chromosome), and "Position" (in cM or base pairs). |
silent |
a TRUE/FALSE value statement indicating if the program should or should not display the progress bar. silent=TRUE means that will not be displayed. |
unlinked |
a TRUE/FALSE value statement indicating if the program should or should not calculate the alpha(see next argument) percentile of interchromosomal LD. |
gamma |
a percentile value for LD breakage to be used in the calculation of interchromosomal LD extent. |
a list with 3 elements; "by.LG", "all.LG", "LDM". The first element (by.LG) is a list with as many elements as chromosomes where each contains a matrix with 3 columns, the distance, the r2 value, and the p-value associated to the chi-square test for disequilibrium. The second element (all.LG) has a big matrix with distance, r2 values and p-values, for each point from all chromosomes in a single data.frame. The third element (LDM) is the matrix of linkage disequilibrium between pairs of markers.
If unlinked is selected the program should return the gamma percentile interchromosomal LD (r2) for each chromosome and average.
Laido, Giovanni, et al. Linkage disequilibrium and genome-wide association mapping in tetraploid wheat (Triticum turgidum L.). PloS one 9.4 (2014): e95211.
nInds=300 nMarks=500 #random population of nInds lines with nMarks markers M <- matrix(rep(0,nInds*nMarks),nInds,nMarks) for (i in 1:nInds) { M[i,] <- ifelse(runif(nMarks)<0.5,-1,1) } rownames(M) <- paste0("ind",1:nInds) colnames(M) <- paste0("mark",1:nMarks) mapM = data.frame(Locus=colnames(M),Position=1:nMarks,LG=1) res <- LD.decay(M, mapM) #### subset only markers with significant LD res$all.LG <- res$all.LG[which(res$all.LG$p < .001),] #### plot the LD decay with(res$all.LG, plot(r2~d,col=transp("cadetblue"), xlim=c(0,55), ylim=c(0,1), pch=20,cex=0.5,yaxt="n", xaxt="n",ylab=expression(r^2), xlab="Distance in cM") ) axis(1, at=seq(0,55,5), labels=seq(0,55,5)) axis(2,at=seq(0,1,.1), labels=seq(0,1,.1), las=1) #### if you want to add the loess regression lines #### this could take a long time!!! # mod <- loess(r2 ~ d, data=res$all.LG) # par(new=TRUE) # lilo <- predict(mod,data.frame(d=1:55)) # plot(lilo, bty="n", xaxt="n", yaxt="n", col="green", # type="l", ylim=c(0,1),ylab="",xlab="",lwd=2) # res3 <- LD.decay(markers=CPgeno, map=mapCP, # unlinked = TRUE,gamma = .95) # abline(h=res3$all.LG, col="red")nInds=300 nMarks=500 #random population of nInds lines with nMarks markers M <- matrix(rep(0,nInds*nMarks),nInds,nMarks) for (i in 1:nInds) { M[i,] <- ifelse(runif(nMarks)<0.5,-1,1) } rownames(M) <- paste0("ind",1:nInds) colnames(M) <- paste0("mark",1:nMarks) mapM = data.frame(Locus=colnames(M),Position=1:nMarks,LG=1) res <- LD.decay(M, mapM) #### subset only markers with significant LD res$all.LG <- res$all.LG[which(res$all.LG$p < .001),] #### plot the LD decay with(res$all.LG, plot(r2~d,col=transp("cadetblue"), xlim=c(0,55), ylim=c(0,1), pch=20,cex=0.5,yaxt="n", xaxt="n",ylab=expression(r^2), xlab="Distance in cM") ) axis(1, at=seq(0,55,5), labels=seq(0,55,5)) axis(2,at=seq(0,1,.1), labels=seq(0,1,.1), las=1) #### if you want to add the loess regression lines #### this could take a long time!!! # mod <- loess(r2 ~ d, data=res$all.LG) # par(new=TRUE) # lilo <- predict(mod,data.frame(d=1:55)) # plot(lilo, bty="n", xaxt="n", yaxt="n", col="green", # type="l", ylim=c(0,1),ylab="",xlab="",lwd=2) # res3 <- LD.decay(markers=CPgeno, map=mapCP, # unlinked = TRUE,gamma = .95) # abline(h=res3$all.LG, col="red")
Legendre polynomials of order 'n' are created given a vector 'x' and normalized to lay between values u and v.
leg(x,n=1,u=-1,v=1, intercept=TRUE, intercept1=FALSE)leg(x,n=1,u=-1,v=1, intercept=TRUE, intercept1=FALSE)
x |
numeric vector to be used for the polynomial. |
n |
order of the Legendre polynomials. |
u |
lower bound for the polynomial. |
v |
upper bound for the polynomial. |
intercept |
a TRUE/FALSE value indicating if the intercept should be included. |
intercept1 |
a TRUE/FALSE value indicating if the intercept should have value 1 (is multiplied by sqrt(2)). |
an Legendre polynomial matrix of order n.
Giovanny Covarrubias-Pazaran
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
x <- sort(rep(1:3,100)) library(orthopolynom) leg(x, n=1) leg(x, n=2)x <- sort(rep(1:3,100)) library(orthopolynom) leg(x, n=1) leg(x, n=2)
logspace creates a vector with decreasing logaritmic trend.
logspace(x, p=2)logspace(x, p=2)
x |
sequence of values to pass through the function. |
p |
power to be applied to the values. |
a vector of length n with logarithmic decrease trend.
Giovanny Covarrubias-Pazaran
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
plot(logspace(1:100,p=1)) plot(logspace(1:100,p=2)) plot(logspace(1:100,p=3))plot(logspace(1:100,p=1)) plot(logspace(1:100,p=2)) plot(logspace(1:100,p=3))
This function was designed to create a manhattan plot using a data frame with columns "Chrom" (Chromosome), "Position" and "p.val" (significance for the test).
manhattan(map, col=NULL, fdr.level=0.05, show.fdr=TRUE, PVCN=NULL, ylim=NULL, ...)manhattan(map, col=NULL, fdr.level=0.05, show.fdr=TRUE, PVCN=NULL, ylim=NULL, ...)
map |
the data frame with 3 columns with names; "Chrom" (Chromosome), "Position" and "p.val" (significance for the test). |
col |
colors prefered by the user to be used in the manhattan plot. The default is NULL which will use the red-blue palette. |
fdr.level |
false discovery rate to be drawn in the plot. |
show.fdr |
a TRUE/FALSE value indicating if the FDR value should be shown in the manhattan plot or not. By default is TRUE meaning that will be displayed. |
PVCN |
In case the user wants to provide the name of the column that should be treated as the "p.val" column expected by the program in the 'map' argument. |
ylim |
the y axis limits for the manhattan plot if the user wants to customize it. By default the plot will reflect the minimum and maximum values found. |
... |
additional arguments to be passed to the plot function such as pch, cex, etc. |
If all parameters are correctly indicated the program will return:
a manhattan plot
Giovanny Covarrubias-Pazaran
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
#random population of 200 lines with 1000 markers M <- matrix(rep(0,200*1000),1000,200) for (i in 1:200) { M[,i] <- ifelse(runif(1000)<0.5,-1,1) } colnames(M) <- 1:200 set.seed(1234) pp <- abs(rnorm(500,0,3));pp[23:34] <- abs(rnorm(12,0,20)) geno <- data.frame(Locus=paste("m",1:500, sep="."),Chrom=sort(rep(c(1:5),100)), Position=rep(seq(1,100,1),5), p.val=pp, check.names=FALSE) geno$Locus <- as.character(geno$Locus) ## look at the data, 5LGs, 100 markers in each ## -log(p.val) value for simulated trait head(geno) tail(geno) manhattan(geno)#random population of 200 lines with 1000 markers M <- matrix(rep(0,200*1000),1000,200) for (i in 1:200) { M[,i] <- ifelse(runif(1000)<0.5,-1,1) } colnames(M) <- 1:200 set.seed(1234) pp <- abs(rnorm(500,0,3));pp[23:34] <- abs(rnorm(12,0,20)) geno <- data.frame(Locus=paste("m",1:500, sep="."),Chrom=sort(rep(c(1:5),100)), Position=rep(seq(1,100,1),5), p.val=pp, check.names=FALSE) geno$Locus <- as.character(geno$Locus) ## look at the data, 5LGs, 100 markers in each ## -log(p.val) value for simulated trait head(geno) tail(geno) manhattan(geno)
This function was designed to create a genetic map plot using a data frame indicating the Linkage Group (LG), Position and marker names (Locus).
map.plot(data, trait = NULL, trait.scale = "same", col.chr = NULL, col.trait = NULL, type = "hist", cex = 0.4, lwd = 1, cex.axis = 0.4, cex.trait=0.8, jump = 5)map.plot(data, trait = NULL, trait.scale = "same", col.chr = NULL, col.trait = NULL, type = "hist", cex = 0.4, lwd = 1, cex.axis = 0.4, cex.trait=0.8, jump = 5)
data |
the data frame with 3 columns with names; Locus, LG and Position |
trait |
if something wants to be plotted next the linkage groups the user must indicate the name of the column containing the values to be ploted, i.e. p-values, LOD scores, X2 segregation distortion values, etc. |
trait.scale |
is trait is not NULL, this is a character value indicating if the y axis limits for the trait plotted next to the chromosomes should be the same or different for each linkage group. The default value is "same", which means that the same y axis limit is conserved across linkage groups. For giving an individual y axis limit for each linkage group write "diff". |
col.chr |
a vector with color names for the chromosomes. If NULL they will be drawn in gray-black scale. |
col.trait |
a vector with color names for the dots, lines or histogram for the trait plotted next to the LG's |
type |
a character value indicating if the trait should be plotted as scatterplot 'dot', histogram 'hist', line 'line' next to the chromosomes. |
cex |
the cex value determining the size of the cM position labels in the LGs |
lwd |
the width of the lines in the plot |
cex.axis |
the cex value for sizing the labels of LGs and traits plotted (top labels) |
cex.trait |
the cex value for sizing the dots or lines of the trait plotted |
jump |
a scalar value indicating how often should be drawn a number next to the LG indicating the position. The default is 5 which means every 5 cM a label will be drawn, i.e. 0,5,10,15,... cM. |
If all parameters are correctly indicated the program will return:
a plot with the LGs and the information used to create a plot
Giovanny Covarrubias-Pazaran
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
#random population of 200 lines with 1000 markers M <- matrix(rep(0,200*1000),1000,200) for (i in 1:200) { M[,i] <- ifelse(runif(1000)<0.5,-1,1) } colnames(M) <- 1:200 set.seed(1234) geno <- data.frame(Locus=paste("m",1:500, sep="."),LG=sort(rep(c(1:5),100)), Position=rep(seq(1,100,1),5), X2=rnorm(500,10,4), check.names=FALSE) geno$Locus <- as.character(geno$Locus) ## look at the data, 5LGs, 100 markers in each ## X2 value for segregation distortion simulated head(geno) tail(geno) table(geno$LG) # 5 LGs, 100 marks map.plot(geno, trait="X2", type="line") map.plot(geno, trait="X2", type="hist") map.plot(geno, trait="X2", type="dot") data("DT_cpdata") MP <- MP_cpdata colnames(MP)[3] <- c("LG") head(MP) map.plot(MP, type="line", cex=0.6)#random population of 200 lines with 1000 markers M <- matrix(rep(0,200*1000),1000,200) for (i in 1:200) { M[,i] <- ifelse(runif(1000)<0.5,-1,1) } colnames(M) <- 1:200 set.seed(1234) geno <- data.frame(Locus=paste("m",1:500, sep="."),LG=sort(rep(c(1:5),100)), Position=rep(seq(1,100,1),5), X2=rnorm(500,10,4), check.names=FALSE) geno$Locus <- as.character(geno$Locus) ## look at the data, 5LGs, 100 markers in each ## X2 value for segregation distortion simulated head(geno) tail(geno) table(geno$LG) # 5 LGs, 100 marks map.plot(geno, trait="X2", type="line") map.plot(geno, trait="X2", type="hist") map.plot(geno, trait="X2", type="dot") data("DT_cpdata") MP <- MP_cpdata colnames(MP)[3] <- c("LG") head(MP) map.plot(MP, type="line", cex=0.6)
'neMarker' uses a marker matrix to approximate the effective population size (Ne) by discovering how many individuals are needed to sample all possible alleles in a population.
neMarker(M, neExplore=NULL, maxMarker=1000, nSamples=5)neMarker(M, neExplore=NULL, maxMarker=1000, nSamples=5)
M |
marker matrix coded in a numerical faashion (any allele dosage is fine). |
neExplore |
a vector of numbers with the effective population sizes to be explored. |
maxMarker |
maximum number of markers to use for the analysis. |
nSamples |
number of individuals to sample for the Ne calculation. |
A vector with allele coverage based on different number of individuals
Giovanny Covarrubias-Pazaran
Not based on any theory published yet but in a solid intuition on what is really important for a breeding program when we ask what is the effective population size
nInds=300 nMarks=1000 #random population of nInds lines with nMarks markers M <- matrix(rep(0,nInds*nMarks),nInds,nMarks) for (i in 1:nInds) { M[i,] <- ifelse(runif(nMarks)<0.5,-1,1) } # run the function ne <- neMarker(M, neExplore = seq(2,300,100), nSamples = 5) # plot(ne$Ne)nInds=300 nMarks=1000 #random population of nInds lines with nMarks markers M <- matrix(rep(0,nInds*nMarks),nInds,nMarks) for (i in 1:nInds) { M[i,] <- ifelse(runif(nMarks)<0.5,-1,1) } # run the function ne <- neMarker(M, neExplore = seq(2,300,100), nSamples = 5) # plot(ne$Ne)
'overlay' adds r times the design matrix for model term t to the existing design matrix. Specifically, if the model up to this point has p effects and t has a effects, the a columns of the design matrix for t are multiplied by the scalar r (default value 1.0). This can be used to force a correlation of 1 between two terms as in a diallel analysis.
overlay(..., rlist=NULL, prefix=NULL, sparse=FALSE)overlay(..., rlist=NULL, prefix=NULL, sparse=FALSE)
... |
as many vectors as desired to overlay. |
rlist |
a list of scalar values indicating the times that each incidence matrix overlayed should be multiplied by. By default r=1. |
prefix |
a character name to be added before the column names of the final overlay matrix. This may be useful if you have entries with names starting with numbers which programs such as asreml doesn't like, or for posterior extraction of parameters, that way 'grep'ing is easier. |
sparse |
a TRUE/FALSE statement specifying if the matrices should be built as sparse or regular matrices. |
an incidence matrix with as many columns levels in the vectors provided to build the incidence matrix.
Giovanny Covarrubias-Pazaran
Fikret Isik. 2009. Analysis of Diallel Mating Designs. North Carolina State University, Raleigh, USA.
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
femalef <- as.factor(c("a","b","c")) malef <- as.factor(c("d","e","a")) overlay(femalef,malef, sparse = TRUE) overlay(femalef,malef, sparse = FALSE)femalef <- as.factor(c("a","b","c")) malef <- as.factor(c("d","e","a")) overlay(femalef,malef, sparse = TRUE) overlay(femalef,malef, sparse = FALSE)
propMissing quick calculation of the proportion of missing data in a vector.
propMissing(x)propMissing(x)
x |
vector of observations. |
a numeric value with the proportion of missing data.
Giovanny Covarrubias-Pazaran
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommec. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
x <- c(1:10, NA) propMissing(x)x <- c(1:10, NA) propMissing(x)
'redmm' reduces a model matrix by performing a singular value decomposition or Cholesky on an incidence matrix.
redmm(x, M = NULL, Lam=NULL, nPC=50, cholD=FALSE, returnLam=FALSE)redmm(x, M = NULL, Lam=NULL, nPC=50, cholD=FALSE, returnLam=FALSE)
x |
as vector with values to form a model matrix or the complete incidence matrix itself for an effect of interest. |
M |
an optional matrix of features explaining the levels of x. If not provided is assumed that the entire incidence matrix has been provided in x. But if provided, the decomposition occurs in the matrix M. |
Lam |
a matrix of loadings in case is already available to avoid recomputing it. |
nPC |
number of principal components to keep from the matrix of loadings to form the model matrix. |
cholD |
should a Cholesky or a Singular value decomposition should be used. The default is the SVD. |
returnLam |
should the function return the loading matrix in addition to the incidence matrix. Default is FALSE. |
A list with 3 elements:
1) The model matrix to be used in the mixed modeling.
2) The reduced matrix of loadings (nPC columns).
3) The full matrix of loadings.
Giovanny Covarrubias-Pazaran
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
# data set including the covariate to be reduced DT <- data.frame(Env=sort(rep(paste("e",1:5,sep="_"), 2))) DT # GxE matrix H0 <- matrix(rnorm(30), nrow=5, ncol=6) # 5 envs, 6 features rownames(H0) <- paste("e",1:5,sep="_") colnames(H0) <- paste("x",1:6,sep="_") H0 # get compression Z <- with(DT, redmm(Env, M = H0, nPC = 3)) Z# data set including the covariate to be reduced DT <- data.frame(Env=sort(rep(paste("e",1:5,sep="_"), 2))) DT # GxE matrix H0 <- matrix(rnorm(30), nrow=5, ncol=6) # 5 envs, 6 features rownames(H0) <- paste("e",1:5,sep="_") colnames(H0) <- paste("x",1:6,sep="_") H0 # get compression Z <- with(DT, redmm(Env, M = H0, nPC = 3)) Z
Simple function to map a vector of values to another.
replaceValues(Source, Search, Replace)replaceValues(Source, Search, Replace)
Source |
A source of values to change. |
Search |
The values to search. |
Replace |
The equivalent values to replace. |
Simple function to map a vector of values another.
new values from a second vector.
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
x <- letters[1:5] y <- 1:5 z <- rep(letters[1:5],3) replaceValues(Source=z, Search=x, Replace=y)x <- letters[1:5] y <- 1:5 z <- rep(letters[1:5],3) replaceValues(Source=z, Search=x, Replace=y)
rrm creates a reduced rank factor analytic matrix by selecting the n vectors of the L matrix of the Cholesky decomposition or the U vectors of the SVD decomposition (loadings or latent covariates) to create a new incidence matrix of latent covariates that can be used with the lmeb solver to fit random regressions on the latent covariates.
rrm(x=NULL, H=NULL, nPC=2, returnGamma=FALSE, cholD=TRUE)rrm(x=NULL, H=NULL, nPC=2, returnGamma=FALSE, cholD=TRUE)
x |
vector of the dataset containing the variable to be used to form the incidence matrix. |
H |
two-way table of identifiers (rows; e.g., genotypes) by features (columns; e.g., environments) effects. Row names and column names are required. No missing data is allowed. |
nPC |
number of principal components to keep from the loadings matrix. |
returnGamma |
a TRUE/FALSE argument specifying if the function should return the matrix of loadings used to build the incidence matrix for the model. The default is FALSE so it returns only the incidence matrix. |
cholD |
a TRUE/FALSE argument specifying if the Cholesky decomposition should be calculated or the singular value decomposition should be used instead. |
This implementation of a version of the reduced rank factor analytic models uses the so-called principal component (PC) models (Meyer, 2009) which assumes specific effects (psi) are equal to 0. The model is as follows:
y = Xb + Zu + e
where the variance of u ~ MVN(0, Sigma)
Sigma = (Gamma_t Gamma) + Psi
Extended factor analytic model:
y = Xb + Z(I Gamma)c + Zs + e = Xb + Z*c + Zs + e
where y is the response variable, X and Z are incidence matrices for fixed and random effects respectively, I is a diagonal matrix, Gamma are the factor loadings for c common factor scores, and s are the specific effects, e is the vector of residuals.
Reduced rank model:
y = Xb + Z(I Gamma)c + e = Xb + Z*c + e
which is equal to the one above but assumes specific effects = 0.
The algorithm in rrm is the following:
1) uses a wide-format table of timevar (m columns) by idvar (q rows) named H to form the initial variance-covariance matrix (Sigma) which is calculated as Sigma = H'H of dimensions m x m (column dimensions, e.g., environments x environments).
2) The Sigma matrix is then center and scaled.
3) A Cholesky (L matrix) or SVD decomposition (U D V') is performed in the Sigma matrix.
4) n vectors from L (when Cholesky is used) or U sqrt(D) (when SVD is used) are kept to form Gamma. Gamma = L[,1:nPc] or Gamma = U[,1:nPC]. These are the so-called loadings (L for all loadings, Gamma for the subset of loadings).
4) Gamma is used to form a new incidence matrix as Z* = Z Gamma
5) This matrix is later used for the REML machinery to be used with the usc (unstructured) or smm (diagonal) structures to estimate variance components and factor scores. The resulting BLUPs from the mixed model are the optimized factor scores. Pretty much as a random regression over latent covariates.
This implementation does not update the loadings (latent covariates) during the REML process, only estimates the REML factor scores for fixed loadings. This is different to other software (e.g., asreml) where the loadings are updated during the REML process as well.
BLUPs for genotypes in all locations can be recovered as:
u = Gamma * u_scores
The resulting loadings (Gamma) and factor scores can be thought as an equivalent to the classical factor analysis.
As an additional information, notice that we calculate the factor loadings from BLUPs and the mixed model only calculates the factor scores. This is different to the asreml software where loadings are calculated as variance components through REML. Despite the difference we have run multiple examples and simulations and the BLUPs from both approaches are on average >0.98 correlated so you can be confident that our approach is robust.
a incidence matrix Z* = Z Gamma which is the original incidence matrix for the timevar multiplied by the loadings.
a matrix of loadings or latent covariates.
the covariance matrix used to calculate Gamma.
Giovanny Covarrubias-Pazaran
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Meyer K (2009) Factor analytic models for genotype by environment type problems and structured covariance matrices. Genetics Selection Evolution, 41:21
# data set including the covariate to be reduced DT <- data.frame(Env=sort(rep(paste("e",1:5,sep="_"), 2))) DT # GxE matrix H0 <- matrix(rnorm(30), nrow=6, ncol=5) # 5 envs, 6 inds colnames(H0) <- paste("e",1:5,sep="_") rownames(H0) <- paste("x",1:6,sep="_") H0 # get latent factors Z <- with(DT, rrm(Env, H = H0, nPC = 3)) Z# data set including the covariate to be reduced DT <- data.frame(Env=sort(rep(paste("e",1:5,sep="_"), 2))) DT # GxE matrix H0 <- matrix(rnorm(30), nrow=6, ncol=5) # 5 envs, 6 inds colnames(H0) <- paste("e",1:5,sep="_") rownames(H0) <- paste("x",1:6,sep="_") H0 # get latent factors Z <- with(DT, rrm(Env, H = H0, nPC = 3)) Z
image for sparsity.
simage(data, Var1=NULL, Var2=NULL, ...)simage(data, Var1=NULL, Var2=NULL, ...)
data |
model data of class |
Var1 |
variable to set in the x axis |
Var2 |
variable to set in the y axis |
... |
Further arguments to be passed to the image function. |
vector of image
Giovanny Covarrubias
# row x column combinations existing DT <- data.frame(Row=sort(rep(1:10,5)), Col=rep(1:5,10)) DT$Rowf <- as.factor(DT$Row) DT$Colf <- as.factor(DT$Col) DT <- DT[-1,] # remove one intentionally simage(data=DT, Var1 = "Rowf", Var2 = "Colf") # you can also inpect which cross combinations exist# row x column combinations existing DT <- data.frame(Row=sort(rep(1:10,5)), Col=rep(1:5,10)) DT$Rowf <- as.factor(DT$Row) DT$Colf <- as.factor(DT$Col) DT <- DT[-1,] # remove one intentionally simage(data=DT, Var1 = "Rowf", Var2 = "Colf") # you can also inpect which cross combinations exist
image for sparsity.
simage2(X, ...)simage2(X, ...)
X |
object of class |
... |
Further arguments to be passed to the image function. |
vector of image
Giovanny Covarrubias
simage2(matrix(0:8,3,3))simage2(matrix(0:8,3,3))
Makes a simple correlation matrix based on the number of environments and megaenvironments desired.
simGECorMat(nEnv,nMegaEnv,mu=0.7, v=0.2, mu2=0, v2=0.3)simGECorMat(nEnv,nMegaEnv,mu=0.7, v=0.2, mu2=0, v2=0.3)
nEnv |
Number of environments to simulate. Needs to be divisible by the nMegaEnv argument. |
nMegaEnv |
Number of megaenvironments to simulate. |
mu |
Mean value of the genetic correlation within megaenvironments. |
v |
variance in the genetic correlation within megaenvironments. |
mu2 |
Mean value of the genetic correlation between megaenvironments. |
v2 |
variance in the genetic correlation between megaenvironments. |
Simple simulation of a correlation matrix for environments and megaenvironments.
G the correlation matrix
the correlation matrix
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
simGECorMat(9,3)simGECorMat(9,3)
Simple function to map a vector of values to the range of 0 and 1 values to have a better behavior of the algorithm.
stan(x, lb=0, ub=1)stan(x, lb=0, ub=1)
x |
A vector of numeric values. |
lb |
Lower bound value to map the x values. |
ub |
Upper bound value to map the x values. |
Simple function to map a vector of values to the range of 0 and 1 values to have a better behavior of the algorithm.
new values in range 0 to 1
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
x <- rnorm(20, 10, 3);x stan(x)x <- rnorm(20, 10, 3);x stan(x)
tps is a wrapper of tpsmmb function from the TPSbits package to avoid version dependencies but if you're using this function for your research please cite the TPSbits package. This function is internally used by the spl2Dmatrices function to get Tensor-Product P-Spline Mixed Model Bits
(design matrices) for use with sommer and lme4breeding.
tps( columncoordinates, rowcoordinates, nsegments=NULL, minbound=NULL, maxbound=NULL, degree = c(3, 3), penaltyord = c(2, 2), nestorder = c(1, 1), asreml = "grp", eigenvalues = "include", method = "Lee", stub = NULL )tps( columncoordinates, rowcoordinates, nsegments=NULL, minbound=NULL, maxbound=NULL, degree = c(3, 3), penaltyord = c(2, 2), nestorder = c(1, 1), asreml = "grp", eigenvalues = "include", method = "Lee", stub = NULL )
columncoordinates |
A string. Gives the name of |
rowcoordinates |
A string. Gives the name of |
nsegments |
A list of length 2. Number of segments to split column and row ranges into, respectively (= number of internal knots + 1). If only one number is specified, that value is used in both dimensions. If not specified, (number of unique values - 1) is used in each dimension; for a grid layout (equal spacing) this gives a knot at each data value. |
minbound |
A list of length 2. The lower bound to be used for column and row dimensions respectively; default calculated as the minimum value for each dimension. |
maxbound |
A list of length 2. The upper bound to be used for column and row dimensions respectively; default calculated as the maximum value for each dimension. |
degree |
A list of length 2. The degree of polynomial spline to be used for column and row dimensions respectively; default=3. |
penaltyord |
A list of length 2. The order of differencing for column and row dimensions, respectively; default=2. |
nestorder |
A list of length 2. The order of nesting for column and row dimensions, respectively; default=1 (no nesting). A value of 2 generates a spline with half the number of segments in that dimension, etc. The number of segments in each direction must be a multiple of the order of nesting. |
asreml |
A string. Indicates the types of structures to be generated
for use in asreml models; default
|
eigenvalues |
A string. Indicates whether eigenvalues should be
included within the Z design matrices |
method |
A string. Method for forming the penalty; default= |
stub |
A string. Stub to be attached to names in the |
List of length 7, 8 or 9 (according to the asreml and
eigenvalues parameter settings).
data = the input data frame augmented with structures required
to fit tensor product splines in asreml-R. This data frame can be used
to fit the TPS model.
Added columns:
TP.col, TP.row = column and row coordinates
TP.CxR = combined index for use with smooth x smooth term
TP.C.n for n=1:(diff.c) = X parts of column spline for use
in random model (where diff.c is the order of column differencing)
TP.R.n for n=1:(diff.r) = X parts of row spline for use in
random model (where diff.r is the order of row differencing)
TP.CR.n for n=1:((diff.c*diff.r)) = interaction between the
two X parts for use in fixed model. The first variate is
a constant term which should be omitted from the model when the constant
(1) is present. If all elements are
included in the model then the constant term should be omitted,
eg. y ~ -1 + TP.CR.1 + TP.CR.2 + TP.CR.3 + TP.CR.4 + other terms...
when asreml="grp" or "sepgrp", the spline basis
functions are also added into the data frame. Column numbers for each
term are given in the grp list structure.
mbflist = list that can be used in call to asreml (so long as Z
matrix data frames extracted with right names, eg BcZ<stub>.df)
BcZ.df = mbf data frame mapping onto smooth part of column
spline, last column (labelled TP.col) gives column index
BrZ.df = mbf data frame mapping onto smooth part of row spline,
last column (labelled TP.row) gives row index
BcrZ.df = mbf data frame mapping onto smooth x smooth term, last
column (labelled TP.CxR) maps onto col x row combined index
dim = list structure, holding dimension values relating to the
model:
"diff.c" = order of differencing used in column dimension
"nbc" = number of random basis functions in column dimension
"nbcn" = number of nested random basis functions in column dimension
used in smooth x smooth term
"diff.r" = order of differencing used in column dimension
"nbr" = number of random basis functions in column dimension
"nbrn" = number of nested random basis functions in column dimension
used in smooth x smooth term
trace = list of trace values for ZGZ' for the random TPspline
terms, where Z is the design matrix and G
is the known diagonal variance matrix derived from eigenvalues. This can
be used to rescale the spline design matrix (or equivalently variance
components).
grp = list structure, only added for settings
asreml="grp", asreml="sepgrp" or asreml="own".
For asreml="grp", provides column indexes for each of the 5
random components of the 2D splines.
For asreml="sepgrp", provides column indexes for each of the X and
Z component matrices for the 1D splines, plus the composite smooth x
smooth interaction term. For asreml="own", provides column indexes
for the composite random model.
Dimensions of the components can be derived from the values in the
dim item. The Z terms are scaled by the associated
eigenvalues when eigenvalues="include", but not when
eigenvalues="omit".
eigen = list structure, only added for option setting
eigenvalues="omit". Holds the diagonal elements of the inverse
variance matrix for the terms Xc:Zr (called diagr), Zc:Xr
(called diagc) and Zc:Zr (called diagcr).
This function takes a color and returns the same with a certain alpha grade transparency.
transp(col, alpha=0.5)transp(col, alpha=0.5)
col |
Color to be used for transparency |
alpha |
Grade of transparency desired |
No major details.
If arguments are correctly specified the function returns:
A new color with certain grade of transparency
Robert J. Henry. 2013. Molecular Markers in Plants. Wiley-Blackwell. ISBN 978-0-470-95951-0.
Ben Hui Liu. 1998. Statistical Genomics. CRC Press LLC. ISBN 0-8493-3166-8.
transp("red", alpha=0.5)transp("red", alpha=0.5)
Computes a Wald test for 1 or more coefficients, given their variance-covariance matrix.
wald.test(Sigma, b, Terms = NULL, L = NULL, H0 = NULL, df = NULL, verbose = FALSE) ## S3 method for class 'wald.test' print(x, digits = 2, ...)wald.test(Sigma, b, Terms = NULL, L = NULL, H0 = NULL, df = NULL, verbose = FALSE) ## S3 method for class 'wald.test' print(x, digits = 2, ...)
Sigma |
A var-cov matrix, usually extracted from one of the fitting functions (e.g., |
b |
A vector of coefficients with var-cov matrix |
Terms |
An optional integer vector specifying which coefficients should be jointly tested, using a Wald
|
L |
An optional matrix conformable to |
H0 |
A numeric vector giving the null hypothesis for the test. It must be as long as |
df |
A numeric vector giving the degrees of freedom to be used in an |
verbose |
A logical scalar controlling the amount of output information. The default is |
x |
Object of class “wald.test” |
digits |
Number of decimal places for displaying test results. Default to 2. |
... |
Additional arguments to |
The key assumption is that the coefficients asymptotically follow a (multivariate) normal distribution with mean =
model coefficients and variance = their var-cov matrix.
One (and only one) of Terms or L must be given. When L is given, it must have the same number of
columns as the length of b, and the same number of rows as the number of linear combinations of coefficients.
When df is given, the Wald statistic is divided by m = the number of
linear combinations of coefficients to be tested (i.e., length(Terms) or nrow(L)). Under the null
hypothesis H0, this new statistic follows an distribution.
An object of class wald.test, printed with print.wald.test.
Diggle, P.J., Liang, K.-Y., Zeger, S.L., 1994. Analysis of longitudinal data. Oxford, Clarendon Press, 253 p.
Draper, N.R., Smith, H., 1998. Applied Regression Analysis. New York, John Wiley & Sons, Inc., 706 p.
data("mtcars") DT <- mtcars head(DT) m3 <- lm(mpg ~ as.factor(carb), data = DT) # extrac coefficients and variance of coefficients wald.test(b = m3$coefficients, Sigma = vcov(m3), Terms = 2) # make a test for a specific linear combination LL <- matrix(0,nrow=1, ncol=6) LL[1,2] <- 1 LL[1,3] <- -1 LL wald.test(b = m3$coefficients, Sigma = vcov(m3), L=LL) # sommer has 'Ci' slot for Sigma and 'bu' for coefficients # lme4breeding has condVar() function to extract Sigma for random effects and # vcov() for fixed effect and has fixef() and ranef() to extract fixed and # random coefficients.data("mtcars") DT <- mtcars head(DT) m3 <- lm(mpg ~ as.factor(carb), data = DT) # extrac coefficients and variance of coefficients wald.test(b = m3$coefficients, Sigma = vcov(m3), Terms = 2) # make a test for a specific linear combination LL <- matrix(0,nrow=1, ncol=6) LL[1,2] <- 1 LL[1,3] <- -1 LL wald.test(b = m3$coefficients, Sigma = vcov(m3), L=LL) # sommer has 'Ci' slot for Sigma and 'bu' for coefficients # lme4breeding has condVar() function to extract Sigma for random effects and # vcov() for fixed effect and has fixef() and ranef() to extract fixed and # random coefficients.