Source of variation | df | SS | MS | F ratio |
---|---|---|---|---|
N rate | dfn = | SSn | MSn = | MSn / MSe |
K rate | dfk = | SSk | MSk = | MSk / MSe |
N x K | dfnk = | SSnk | MSnk = | MSnk / MSe |
Error | dfe = | SSe | MSe = | |
TOTAL | dft = | SSt |
Explore key concepts in CRD:
Assuming we have homogeneous experimental material (e.g., same soil type, topography, etc.)
Randomization guards against unknown or uncontrollable sources of bias
Avoid systematic patterns
Eliminate selective assignment of EU to Trt (conscious or unconscious!)
Randomization allows for valid inference on CAUSATION
Randomization of a treatment to a EU is unrestricted.
That means that replications of same treatment could, by random chance, fall right next to each other.
In our motivational example:
4 replicates
Total observations: 9 x 4 = 36 EUs
In the plot layout here, all treatments (1 through 9) were randomly assigned to any experimental unit (plot) in the study area.
Treatment 1 and its replicates are highlighted.
Note how, due to the unrestricted randomization, treatment 1 appears twice in the first column, and does not appear on the third column. The same happened with other treatments.
Because the experimental material is homogeneous (e.g., same soil texture class), this should not be an issue when estimating treatment means and performing comparisons. 👍
\[ y_{ijk} = \mu + \alpha_{i} + \beta_{j} + \alpha\beta_{ij} + e_{ijk} \]
In the following ANOVA table…
Source of variation | df | SS | MS | F ratio |
---|---|---|---|---|
N rate | dfn = | SSn | MSn = | MSn / MSe |
K rate | dfk = | SSk | MSk = | MSk / MSe |
N x K | dfnk = | SSnk | MSnk = | MSnk / MSe |
Error | dfe = | SSe | MSe = | |
TOTAL | dft = | SSt |
Each component of effects model has a row in the table
The larger is the F ratio value, the more evidence towards an effect being significant.
Source of variation | Sum Sq | Df | F value | Pr(>F) |
---|---|---|---|---|
(Intercept) | 836,785,505.5 | 1 | 1,917.9470762 | <0.001 |
nrate_kgha | 1,490,936.1 | 2 | 1.7086437 | 0.2 |
krate_kgha | 470,561.8 | 2 | 0.5392735 | 0.589 |
nrate_kgha:krate_kgha | 11,108,159.3 | 4 | 6.3650904 | <0.001 |
Residuals | 11,779,891.6 | 27 |
What is significant here (say at \(\alpha = 0.05\))?
But wait, before inference, need to check model assumptions (based on residuals)!
\[ \hat{e}_{ijk} = y_{ijk} - \hat{y}_{ijk} \] Or, in other words…
\[ residual = observed - predicted \]
\[ e_{ijk} \sim iidN(0, \sigma^2) \] \(e_{ijk}\) is the residual corresponding to the kth replicate of N rate i and K rate j.
Random experimental errors (residuals) are assumed to be:
There are different types of residuals, including raw residual (the previous formula).
Raw residuals are not useful to assess all assumptions (e.g., outliers)
That limitation is addressed if we calculated the studentized residuals
Studentized residuals are the raw residuals divided by their standard deviation:
If model assumptions are not met, inference is meaningless
Validity and repeatability of results is QUESTIONABLE
Always check residuals before making inference! If something is wrong with assumptions, need to fix before proceeding with inference!
geom_smooth
) ✅geom_smooth
) ✅We confirmed that residuals were:
✅ Independent
✅ Homoscedastic (homogeneous variance)
✅ Normally distributed
✅ No outliers
Therefore, the model can be used for inference!