Linear and non-linear regression

Today’s goals

Explore

  • the mathematical formula,
  • implementation, and
  • interpretation

of linear and non-linear regression models applied for optimum-finding in agriculture.

Regression

Statistical method for fitting a line to data where the relationship between two variables, a quantitative outcome (\(Y\)) and a quantitative predictor (\(X\)), are of interest.

Regression

The quantitative predictor of the regression model can take different forms, including:

Let’s explore each of them next.

Model 1 - Intercept

Model 1 - Intercept

The intercept-only model only contains one fixed-effect parameter:

\[\Large{Y = \color{blue}{b_0} + \epsilon}\]

intercept, \(b_0\) (aka overall mean of the response variable)

Model 1 - Intercept

# A tibble: 1 × 3
  term        estimate p.value
  <chr>          <dbl>   <dbl>
1 (Intercept)     12.2  0.0002

Yield is 12.2 Mg/ha regardless of seeding rate.

Model 2 - Linear

Model 2 - Linear

The linear (intercept + slope) model contains two fixed-effect parameters:

\[\Large{Y = \color{blue}{b_0} + \color{purple}{b_1} X + \epsilon}\]

  • intercept, \(b_0\) (where line cuts the y-axis when x = 0)
  • slope, \(b_1\) (change in y for each unit change on x)

Model 2 - Linear

# A tibble: 2 × 3
  term        estimate p.value
  <chr>          <dbl>   <dbl>
1 (Intercept)   8.02    0     
2 sr_ksha       0.0520  0.0001

For each 1-unit increase in seeding rate, yield increases 0.052 Mg/ha.

Model 3 - Quadratic

Model 3 - Quadratic

The quadratic model contains three fixed-effect parameters:

\[\Large{Y = \color{blue}{b_0} + \color{purple}{b_1} X + \color{forestgreen}{b_2} X^2 +\epsilon}\]

  • intercept, \(b_0\) (where line cuts the y-axis when x = 0)
  • \(b_1\) and \(b_2\)
  • where \(b_1 + 2b_2\) is the change in y for each unit change in x

Model 3 - Quadratic

# A tibble: 3 × 3
  term         estimate p.value
  <chr>           <dbl>   <dbl>
1 (Intercept)  -0.976     0.275
2 sr_ksha       0.309     0    
3 I(sr_ksha^2) -0.00161   0    

Yield is positively related to seeding rate (b1 > 0) with a point of maximum (b2 < 0) near 90,000 seeds/ha.

Model 4 - Linear-plateau

Model 4 - Linear-plateau

The linear-plateau model contains three fixed-effect parameters:

\[ \begin{cases} x < \color{red}{xs},\ Y = \color{blue}{b_0} + \color{purple}{b_1} X \\ x > \color{red}{xs},\ Y = \color{blue}{b_0} + \color{purple}{b_1} \color{red}{xs} \end{cases} \]

  • intercept, \(b_0\) (where line cuts the y-axis when x = 0)
  • slope, \(b_1\) (change in y for each unit change on x until xs)
  • break-point, \(xs\) (x axis value where y is optimized)

Model 4 - Linear-plateau

# A tibble: 3 × 3
  term  estimate p.value
  <chr>    <dbl>   <dbl>
1 a        3.46   0.0075
2 b        0.136  0     
3 xs      73.5    0     

Yield increases linearly as seeding rate increases (b1 > 0) until a threshold (xs = 73.5 k seeds/ha), after which point yield remains constant.

Summary

We covered the mathematical formula, implementation, and interpretation of the different regression models, including linear and non-linear.