Repeated measures

Today’s goals

Explore key concepts in repeated measures:

  1. The independence assumption
  2. What is repeated measure analysis
  3. Correlation structure types

Motivational example - study design

  • 2-way factorial in a split-plot
  • Whole-plot: K fertilizer rates: 0, 30, 60 kg K/ha
  • Split-plot: N fertilizer rates: 0, 100, 200 kg N/ha
  • 3 x 3 = 9 treatment combinations
  • RCBD with 4 blocks

New response variable: yield over time

Up until now, our response variable has been yield measured once at the end of the season.

What if we measured yield at 4 different time points?

Research questions of interest:

  • Interaction between Treatment and Time
  • Does the effect of treatment change over time?
  • How do treatments compare at any given time point?

The model

yield_kgha ~ (1|block/krate_kgha) + krate_kgha*nrate_kgha*fdap

  • (1|block/krate_kgha) sets the random effect nested structure of the split-plot RCBD design

  • krate_kgha*nrate_kgha*fdap sets the fixed effect structure of treatment design, with all main effects AND interaction terms

  • fdap is days after planting when plots were harvested, treated as a factor

Linear (mixed) model assumptions

Random effects (between-subject):

\[ (\alpha\rho)_{ik} \sim iidN(0, \sigma^2_{\alpha\rho}) \]

Error/residual (within-subject):

\[ e_{ijk} \sim iidN(0, \sigma^2_{e}) \]

The independence assumption

The observation/residual (\(e_{ijk}\), within-subject) independence assumption is largely based on the fact that treatment levels can be/are randomized to experimental units.

This is expressed as correlations among observations being set to zero in the residual matrix.

The independence assumption

The example below shows the correlation matrix structure for the 4 yield measurements for the first plot in the study:

     [,1] [,2] [,3] [,4]
[1,]    1    0    0    0
[2,]    0    1    0    0
[3,]    0    0    1    0
[4,]    0    0    0    1

The diagonal values represent a shared common variance, and the off-diagonal values represent a shared common correlation of zero.

The independence assumption

However, randomization is not always possible, as in this case (cannot randomize time of harvest).

For example, if one of the treatment factors of interest is a measure of time or space, their levels are non-randomizable.

Breaking the independence assumption

Examples of time and space treatment factors:

Time (mostly 1D), measurements taken at different

  • crop stages
  • days after planting
  • days after application (of some input)
  • days after incubation

Space (2D), measurements taken at different

  • soil depths (1D)
  • locations characterized by x and y coordinates (longitude, latitude)
  • used a lot in precision ag and geospatial analysis

Fixing the lack of independence among observations

In these cases, assuming correlation of zero may be wrong, so we need to

  • relax this assumption,
  • allow for the model to calculate these correlations in different ways, and
  • check if any of these ways provide a better fit to the data.

Relaxing the independence assumption

To relax the independence assumption, we test different correlation structures on the residual matrix.

The structures can be classified as:

  • Serial correlation structures (time)
  • Spatial correlation structures (space)

Serial correlation structures

Require time points to be integers (e.g., 0, 5, 10 DAP) and equally spaced.

Types:

  1. Compound symmetry
  2. General
  3. Autoregressive-moving average

Serial - Compound symmetry

  • Simplest

  • Assumes equal correlation among errors of the same group

  • No. of parameters estimated: one (\(\rho\))

  • Too simplistic for time-series, as it is common that correlation decreases as time interval increases (not a single fixed correlation)

  • Correlation matrix structure for the 4 yield measurements for the first plot in the study with a CS correlation value of 0.3:

     [,1] [,2] [,3] [,4]
[1,]  1.0  0.3  0.3  0.3
[2,]  0.3  1.0  0.3  0.3
[3,]  0.3  0.3  1.0  0.3
[4,]  0.3  0.3  0.3  1.0

Serial - General

  • Most complex!
  • Each correlation in data is represented by a different parameter
  • Number of parameters increases quadratically with the maximum number of observations within a group
  • Often lead to overparameterized models
  • Correlation matrix structure for the 4 yield measurements for the first plot in the study with general correlation:
     [,1] [,2] [,3] [,4]
[1,]  1.0  0.2  0.1 -0.1
[2,]  0.2  1.0  0.0  0.2
[3,]  0.1  0.0  1.0  0.0
[4,] -0.1  0.2  0.0  1.0

Serial - Autoregressive-moving average

  • Current observation as a linear function of previous (lag) observations

  • The number of past observations included in the linear model (p) is called the order of the autoregressive model, which is denoted by AR(p)

  • There are p correlation parameters in an AR(p) model, e.g. AR(1) has 1 correlation parameter

Serial - Autoregressive-moving average

  • The AR(1) model is one of the few serial correlation structures that can be generalized to continuous, non-equally-spaced time measurements
  • Correlation matrix structure for the 4 yield measurements for the first plot in the study with a AR1 correlation structure and correlation value of 0.8:
      [,1] [,2] [,3]  [,4]
[1,] 1.000 0.80 0.64 0.512
[2,] 0.800 1.00 0.80 0.640
[3,] 0.640 0.80 1.00 0.800
[4,] 0.512 0.64 0.80 1.000

Serial - Autoregressive-moving average

  • Current observation is a linear function of independent and identically distributed noise terms

  • The number of noise terms included in the linear model (q) is called the order of the moving average model, which is denoted by MA(q)

  • There are q correlation parameters in an MA(q) model

Serial - Autoregressive-moving average

  • Mixed autoregressive–moving average models, called ARMA models, are obtained by combining together an autoregressive model and a moving average model.

  • There are p + q correlation parameters p in an ARMA(p, q) model

Spatial correlation structures

  • Used to model continuous 2D data

  • Although spatial, can also be used for time variables as it is more flexible in relation to non-equally distant intervals

  • Spatial correlation structures are generally represented by their semivariogram, instead of their correlation function

  • The correlation parameter p is generally referred to as the range in the spatial statistics literature

Spatial correlation structures

Some types of variogram models for spatial correlation structures:

  • Exponential
  • Gaussian
  • Linear
  • Rational quadratic
  • Spherical

Spatial correlation structures

Repeated measures in R

In R, we are going to use the package nlme.

  • We’ll make extensive use of a few functions, where lme() is the main one.

  • lme() allows us to fit mixed-effect models with nested random effects (e.g., split-plot in RCBD) AND test for different residual correlation structures

Specifying correlation structures in R

Repeated measures workflow

  1. Run the default model that assumes equal correlation (the default assumption) and use it as a baseline

  2. Run different versions of the first model, where in each version you attempt a different correlation matrix structure.

  3. Make sure that the correlation structure you are using matches your repeated measure variable in terms of accepting (or not) non-equally spaced intervals

Repeated measures workflow

  1. Compare all models based on fit criteria like Aikaike Information Criteria (AIC), Bayesian Information Criteria (BIC), etc., where lower values represent better fit.

  2. Select model with lowest fit criteria metric, check assumptions, use it for inference.

Summary

  • When having time or space as part of the treatment structure, then observations and residuals are not assumed to be independent.

  • We allow the error correlation matrix to estimate different correlation estimates

  • We then select the model with the best fit and that has addressed the potential correlation issue

  • We use the selected model for inference