Completely Randomized Design (CRD)

Today’s goals

Explore key concepts in CRD:

  1. Homogeneous experimental unit
  2. Treatment randomization
  3. The effects model
  4. The ANOVA table
  5. The linear model assumptions

Motivational example - Treatment design

  • 2-way factorial
  • N fertilizer rates: 0, 100, 200 kg N/ha
  • K fertilizer rates: 0, 30, 60 kg K/ha
  • 3 x 3 = 9 treatment combinations

Experimental design

Assuming we have homogeneous experimental material (e.g., same soil type, topography, etc.)

  • Thus, we can use a completely randomized design (CRD)

Treatment randomization

  • Randomization guards against unknown or uncontrollable sources of bias

  • Avoid systematic patterns

  • Eliminate selective assignment of EU to Trt (conscious or unconscious!)

  • Randomization allows for valid inference on CAUSATION

Treatment randomization - CRD

  • Randomization of a treatment to a EU is unrestricted.

  • That means that replications of same treatment could, by random chance, fall right next to each other.

In our motivational example:

  • 4 replicates

  • Total observations: 9 x 4 = 36 EUs

Homogeneous Experimental Material - CRD

In the plot layout here, all treatments (1 through 9) were randomly assigned to any experimental unit (plot) in the study area.

Homogeneous Experimental Material - CRD

  • Treatment 1 and its replicates are highlighted.

  • Note how, due to the unrestricted randomization, treatment 1 appears twice in the first column, and does not appear on the third column. The same happened with other treatments.

Homogeneous Experimental Material - CRD

Because the experimental material is homogeneous (e.g., same soil texture class), this should not be an issue when estimating treatment means and performing comparisons. 👍

The effects model

\[ y_{ijk} = \mu + \alpha_{i} + \beta_{j} + \alpha\beta_{ij} + e_{ijk} \]

  • \(y_{ijk}\) is the observation on the kth rep. from ith N rate and jth K rate
  • \(\mu\) is the overall mean
  • \(\alpha_{i}\) is the differential effect of ith N rate
  • \(\beta_{j}\) is the differential effect of jth K rate
  • \(\alpha\beta_{ij}\) is the differential effect of the combination of the ith N rate and ith K rate
  • \(e_{ijk}\) is the residual corresponding to the kth replicate of N rate i and K rate j.

The ANOVA table

In the following ANOVA table…

  • n is number of levels in N rate = 3
  • k is number of levels in K rate = 3
  • r is number of replicates = 4
  • N is total number of obserevations or EUs = 3 x 3 x 4 = 36

The ANOVA table

Source of variation

df

SS

MS

F ratio

N rate

dfn =
n - 1 =
3 - 1 = 2

SSn

MSn =
SSn / dfn

MSn / MSe

K rate

dfk =
k - 1 =
3 - 1 = 2

SSk

MSk =
SSk / dfk

MSk / MSe

N x K

dfnk =
(n - 1) x (k - 1) =
(3-1) x (3-1) = 4

SSnk

MSnk =
SSnk / dfnk

MSnk / MSe

Error

dfe =
nk(r - 1) =
3x3(4-1) =
9x3 = 27

SSe

MSe =
SSe / dfe

TOTAL

dft =
N -1 = 35

SSt

  • Each component of effects model has a row in the table

  • The larger is the F ratio value, the more evidence towards an effect being significant.

The ANOVA table - let’s think about it

  • What can I do to increase the chances of finding a significant effect? In other words, how can I \(\uparrow\) F ratios?
  1. Have more variability explained by treatment (\(\uparrow\) SSn ~ MSn)
  2. Have less variability explained by error (\(\downarrow\) MSe), through
  3. Less noise in the error, or
  4. More dfe

ANOVA table - motivational example

Source of variation

Sum Sq

Df

F value

Pr(>F)

(Intercept)

836,785,505.5

1

1,917.9470762

<0.001

nrate_kgha

1,490,936.1

2

1.7086437

0.2

krate_kgha

470,561.8

2

0.5392735

0.589

nrate_kgha:krate_kgha

11,108,159.3

4

6.3650904

<0.001

Residuals

11,779,891.6

27

  • What is significant here (say at \(\alpha = 0.05\))?

  • But wait, before inference, need to check model assumptions (based on residuals)!

The linear model residual(s)

\[ \hat{e}_{ijk} = y_{ijk} - \hat{y}_{ijk} \] Or, in other words…

\[ residual = observed - predicted \]

The linear model assumptions are based on the residuals!

\[ e_{ijk} \sim iidN(0, \sigma^2) \] \(e_{ijk}\) is the residual corresponding to the kth replicate of N rate i and K rate j.

Random experimental errors (residuals) are assumed to be:

  • Mutually independent
  • Normally distributed with mean 0 and common and homogeneous variance \(\sigma^2\) across trts
  • No extreme observations (outliers)

Studentized residuals

  • There are different types of residuals, including raw residual (the previous formula).

  • Raw residuals are not useful to assess all assumptions (e.g., outliers)

  • That limitation is addressed if we calculated the studentized residuals

Studentized residuals

Studentized residuals are the raw residuals divided by their standard deviation:

  • ~95% of studentized residuals within ± 2 sd
  • ~99% of studentized residuals within ± 3 sd

Valid analysis depend on valid assumptions!

  • If model assumptions are not met, inference is meaningless

  • Validity and repeatability of results is QUESTIONABLE

  • Always check residuals before making inference! If something is wrong with assumptions, need to fix before proceeding with inference!

Checking assumptions - residual independence

  • No clear pattern (evidenced by geom_smooth)

Checking assumptions - residual homoscedasticity

  • Data spread is even and centered around zero (evidenced by geom_smooth)

Checking assumptions - residual normality

  • It’s common for some residuals in the tails being off, especially with low N (N=36). Nothing to worry here

Checking assumptions - residual outliers

  • A couple of residuals beyond -3 and 3, but not too many and not too far out. I wouldn’t worry about them

Checking assumptions - summary

We confirmed that residuals were:

  • Independent

  • Homoscedastic (homogeneous variance)

  • Normally distributed

  • No outliers

  • Therefore, the model can be used for inference!

Summary

  • CRD is randomized without restrictions
  • Each term of the effects model has a row in the ANOVA table
  • Raw vs. Studentized residuals
  • If model assumptions are not met, inference is meaningless
  • Model residual assumptions: iidN(0,\(\sigma^2\))