Linear regression is a widely used statistical model in a broad variety of applications. It is one of the easiest examples to demonstrate important aspects of statistical modelling.
1.1 The Linear Model
Definition 1.1 (Linear model)
: For each observation
i∈{1,…,n}, let
Yi be the response variable and
xi(1),…,xi(p) be the predictors. In the linear model the response variable is a linear function of the predictors up to some error
εi:
Yi=j=1∑pβjxi(j)+εi Note: Usually we assume that
ε1,…,εn are
iid with
E[εi]=0 and
Var[εi]=σ2. We call n the sample size and p the number of predictors. The goal is to estimate the parameters {β1,…,βp}, to study their relevance and to estimate the error variance. The parameters βj and σ2 are unknown and the errors εi are unobservable, while the response variables Yi and the predictors xi(j) are given. We can rewrite the model using the vector notation:
Y=X⋅β+ε
where Y∈Rn is the random vector of response variables, X∈Rn×p is the matrix of predictors, β∈Rp is vector of unknown parameters and ε∈Rn is the random vector of errors. We typically assume that the sample size is larger than the number of predictors, i.e. n>p, and that the matrix X has full rank p.
Note: To model an intercept, we set the first predictor variable to be a constant, i.e.
xi(1)=1. We then get
Yi=β1+∑j=2pβjxi(j)+εi. Note (On stochastic models)
: The linear model involves some stochastic components: the error terms
εi are random variables and hence the response variables
Yi as well. The predictor variables
xi(j) are assumed to be non-random. However, in some applications it is more appropriate to treat the predictor variables as random. The stochastic nature of the error terms
εi can be assigned to various sources, e.g. measurement errors or the inability to capture all underlying non-systematic effects.
Example (Regression through the origin)
: Yi=βxi+εi Example (Simple linear regression)
: Yi=β1+β2xi+εi Example (Transformed predictors)
: Yi=β1+β2log(xi(2))+β3sin(xi(3))+εi 1.2 Least Squares Method
We assume the linear model Y=Xβ+ε. Our goal is to find a good estimate of β.
Definition 1.2 (Least squares estimator)
: The least squares estimator
β^ is defined as
β^=β∈Rpargmin∥Y−Xβ∥2
Assuming that
X has rank
p, the minimizer can be computed explicitly by
β^=(X⊤X)−1X⊤Y Proof: As
∥Y−Xβ∥2 is convex we find the minimizwer by setting its gradient to
0:
∇∥Y−Xβ∥2=−2X⊤(Y−Xβ)=!0
This yields the normal equations
X⊤Xβ^=X⊤Y
Under the assumption that
X has rank
p the matrix
X⊤X∈Rp×p has full rank and is invertible, thus
β^=(X⊤X)−1X⊤Y Definition 1.3 (Residuals)
: We define the residuals as
Ri=Yi−xi⊤β^ The residuals are esimtates for the εi's. Thus it is plausible to have
σ^2=n−p1i=1∑nRi2
as the estimator for the error variance σ2. It will be shown later that using the factor n−p1 yields E[σ^2]=σ2.