Bastos Lab – lec06-crd

Today’s goals

Explore key concepts in CRD:

Homogeneous experimental unit
Treatment randomization
The effects model
The ANOVA table
The linear model assumptions

Motivational example - Treatment design

2-way factorial
N fertilizer rates: 0, 100, 200 kg N/ha
K fertilizer rates: 0, 30, 60 kg K/ha
3 x 3 = 9 treatment combinations

Experimental design

Assuming we have homogeneous experimental material (e.g., same soil type, topography, etc.)

Thus, we can use a completely randomized design (CRD)

Treatment randomization

Randomization guards against unknown or uncontrollable sources of bias
Avoid systematic patterns
Eliminate selective assignment of EU to Trt (conscious or unconscious!)
Randomization allows for valid inference on CAUSATION

Treatment randomization - CRD

Randomization of a treatment to a EU is unrestricted.
That means that replications of same treatment could, by random chance, fall right next to each other.

In our motivational example:

4 replicates
Total observations: 9 x 4 = 36 EUs

Homogeneous Experimental Material - CRD ✅

In the plot layout here, all treatments (1 through 9) were randomly assigned to any experimental unit (plot) in the study area.

Homogeneous Experimental Material - CRD ✅

Treatment 1 and its replicates are highlighted.
Note how, due to the unrestricted randomization, treatment 1 appears twice in the first column, and does not appear on the third column. The same happened with other treatments.

Homogeneous Experimental Material - CRD ✅

Because the experimental material is homogeneous (e.g., same soil texture class), this should not be an issue when estimating treatment means and performing comparisons. 👍

The effects model

\[ y_{ijk} = \mu + \alpha_{i} + \beta_{j} + \alpha\beta_{ij} + e_{ijk} \]

\(y_{ijk}\) is the observation on the kth rep. from ith N rate and jth K rate
\(\mu\) is the overall mean
\(\alpha_{i}\) is the differential effect of ith N rate
\(\beta_{j}\) is the differential effect of jth K rate
\(\alpha\beta_{ij}\) is the differential effect of the combination of the ith N rate and ith K rate
\(e_{ijk}\) is the residual corresponding to the kth replicate of N rate i and K rate j.

The ANOVA table

In the following ANOVA table…

n is number of levels in N rate = 3
k is number of levels in K rate = 3
r is number of replicates = 4
N is total number of obserevations or EUs = 3 x 3 x 4 = 36

The ANOVA table

Source of variation	df	SS	MS	F ratio
N rate	dfn = n - 1 = 3 - 1 = 2	SSn	MSn = SSn / dfn	MSn / MSe
K rate	dfk = k - 1 = 3 - 1 = 2	SSk	MSk = SSk / dfk	MSk / MSe
N x K	dfnk = (n - 1) x (k - 1) = (3-1) x (3-1) = 4	SSnk	MSnk = SSnk / dfnk	MSnk / MSe
Error	dfe = nk(r - 1) = 3x3(4-1) = 9x3 = 27	SSe	MSe = SSe / dfe
TOTAL	dft = N -1 = 35	SSt

Each component of effects model has a row in the table
The larger is the F ratio value, the more evidence towards an effect being significant.

The ANOVA table - let’s think about it

What can I do to increase the chances of finding a significant effect? In other words, how can I \(\uparrow\) F ratios?

Have more variability explained by treatment (\(\uparrow\) SSn ~ MSn)
Have less variability explained by error (\(\downarrow\) MSe), through
Less noise in the error, or
More dfe

ANOVA table - motivational example

Source of variation	Sum Sq	Df	F value	Pr(>F)
(Intercept)	836,785,505.5	1	1,917.9470762	<0.001
nrate_kgha	1,490,936.1	2	1.7086437	0.2
krate_kgha	470,561.8	2	0.5392735	0.589
nrate_kgha:krate_kgha	11,108,159.3	4	6.3650904	<0.001
Residuals	11,779,891.6	27

What is significant here (say at \(\alpha = 0.05\))?
But wait, before inference, need to check model assumptions (based on residuals)!

The linear model residual(s)

\[ \hat{e}_{ijk} = y_{ijk} - \hat{y}_{ijk} \] Or, in other words…

\[ residual = observed - predicted \]

The linear model assumptions are based on the residuals!

\[ e_{ijk} \sim iidN(0, \sigma^2) \] \(e_{ijk}\) is the residual corresponding to the kth replicate of N rate i and K rate j.

Random experimental errors (residuals) are assumed to be:

Mutually independent
Normally distributed with mean 0 and common and homogeneous variance \(\sigma^2\) across trts
No extreme observations (outliers)

Studentized residuals

There are different types of residuals, including raw residual (the previous formula).
Raw residuals are not useful to assess all assumptions (e.g., outliers)
That limitation is addressed if we calculated the studentized residuals

Studentized residuals

Studentized residuals are the raw residuals divided by their standard deviation:

~95% of studentized residuals within ± 2 sd
~99% of studentized residuals within ± 3 sd

Valid analysis depend on valid assumptions!

If model assumptions are not met, inference is meaningless
Validity and repeatability of results is QUESTIONABLE
Always check residuals before making inference! If something is wrong with assumptions, need to fix before proceeding with inference!

Checking assumptions - residual independence

No clear pattern (evidenced by geom_smooth) ✅

Checking assumptions - residual homoscedasticity

Data spread is even and centered around zero (evidenced by geom_smooth) ✅

Checking assumptions - residual normality

It’s common for some residuals in the tails being off, especially with low N (N=36). Nothing to worry here ✅

Checking assumptions - residual outliers

A couple of residuals beyond -3 and 3, but not too many and not too far out. I wouldn’t worry about them ✅

Checking assumptions - summary

We confirmed that residuals were:

✅ Independent
✅ Homoscedastic (homogeneous variance)
✅ Normally distributed
✅ No outliers
Therefore, the model can be used for inference!

Summary

CRD is randomized without restrictions
Each term of the effects model has a row in the ANOVA table
Raw vs. Studentized residuals
If model assumptions are not met, inference is meaningless
Model residual assumptions: iidN(0,\(\sigma^2\))

Completely Randomized Design (CRD)

Today’s goals

Motivational example - Treatment design

Experimental design

Treatment randomization

Treatment randomization - CRD

Homogeneous Experimental Material - CRD ✅

Homogeneous Experimental Material - CRD ✅

Homogeneous Experimental Material - CRD ✅

The effects model

The ANOVA table

The ANOVA table

The ANOVA table - let’s think about it

ANOVA table - motivational example

The linear model residual(s)

The linear model assumptions are based on the residuals!

Studentized residuals

Studentized residuals

Valid analysis depend on valid assumptions!

Checking assumptions - residual independence

Checking assumptions - residual homoscedasticity

Checking assumptions - residual normality

Checking assumptions - residual outliers

Checking assumptions - summary

Summary