21  Multiple linear regression I

Author

Karl Gregory

The simple linear regression model provided a way to study the relationship between a response \(Y\) and a covariate \(x\), when the relationship was linear. Here we extend the simple linear regression model to a model which can include more than just a single covariate. We call it the multiple linear regression model.

Before introducing the model, it will be helpful to spend a moment on vectors. We can think of a vector as a list of numbers, arranged in a column. We will denote vectors with bold-faced lower-case letters, like \(\mathbf{x}\), and we denote the entries of a vector in regular type-face with subscripts. For example, if \(\mathbf{x}\) is a column vector with \(p\) entries, we can write \[ \mathbf{x}= \left(\begin{array}{c} x_1\\ x_2\\ \vdots\\ x_p\end{array}\right). \] If \(\mathbf{x}\) has \(p\) real-numbered entries in it, we write \(\mathbf{x}\in \mathbb{R}^p\). Here the entries of \(\mathbf{x}\) serve as coordinates in \(p\)-dimensional Euclidean space. Transposing a vector turns the column into a row; we denote the transpose of the vector \(\mathbf{x}\) as \(\mathbf{x}^\top\). Thus \[ \mathbf{x}^\top = (x_1,x_2, \dots,x_p). \] And if we transpose a row it becomes a column, so \(\mathbf{x}= (x_1,x_2,\dots,x_p)^\top\).

In the multiple linear regression model, we no longer have a single covariate \(x\) but a vector \(\mathbf{x}\) containing the values of several covariates

Definition 21.1 (Multiple linear regression model) For data pairs \((\mathbf{x}_1,Y_1),\dots,(\mathbf{x}_n,Y_n)\) where \(\mathbf{x}_i = (x_{i1},\dots,x_{ip})^\top\), the multiple linear regression model assumes \[ Y_i = \beta_0 + x_{i1}\beta_1 + \dots + x_{ip}\beta_p + \varepsilon_i \] for \(i=1,\dots,n\), where

  • \(\mathbf{x}_1,\dots,\mathbf{x}_n\) are vectors in \(\mathbb{R}^p\) of covariate or predictor values.
  • \(Y_1,\dots,Y_n\) are the response values
  • \(\beta_0, \beta_1, \dots, \beta_p\) are the regression coefficients.
  • \(\varepsilon_1,\dots,\varepsilon_n\) are iid \(\text{Normal}(0,\sigma^2)\) error terms.
  • \(\sigma^2\) is the error term variance.