3. Distributional Properties of OLS

In this chapter, unless otherwise noted, we will always assume the WCLM and will specify if additional assumptions are made.

3.1 Moments under WCLM

Proposition 3.1 (Expectations): The expectations are:
  • E ⁣[β^]=β\E{\vbetahat} = \vbeta
  • E ⁣[Y^]=E ⁣[Y]=Xβ\E{\vYhat} = \E{\vY} = \dmat X \vbeta
  • E ⁣[ε^]=0\E{\vepsilonhat} = \dvec 0
Proposition 3.2 (Covariance matrices): The covariance matrices are:
  • Cov ⁣[β^, β^]=σ2(XX)1\Cov{\vbetahat}{\vbetahat} = \sigma^2 (\dmattr X \dmat X)^{-1}
  • Cov ⁣[Y^, Y^]=σ2P\Cov{\vYhat}{\vYhat} = \sigma^2 \dmat P
  • Cov ⁣[ε^, ε^]=σ2Q\Cov{\vepsilonhat}{\vepsilonhat} = \sigma^2 \dmat Q
  • Cov ⁣[β^, Y^]=σ2(XX)1X\Cov{\vbetahat}{\vYhat} = \sigma^2 (\dmattr X \dmat X)^{-1} \dmattr X
  • Cov ⁣[β^, ε^]=0\Cov{\vbetahat}{\vepsilonhat} = \dmat 0
  • Cov ⁣[Y^, ε^]=0\Cov{\vYhat}{\vepsilonhat} = \dmat 0
Note: Recall that generally yˇ\check{\vy} and εˇ\vepsiloncheck are empirically uncorrelated only if the model includes an intercept. If treated as random vectors Y^\vYhat and ε^\vepsilonhat however, they are uncorrelated without restrictions.

3.2 Estimator for Deviation Variance

We can now define an ubiased estimator for the variance of the deviations.

Definition 3.3 (Unbiased estimator for deviation variance): The estimator σ^2=ε^ε^np=1npi=1nε^i2 \sigmahat^2 = \frac{\vepsilonhat^{\top} \vepsilonhat}{n - p} = \frac{1}{n - p} \sum_{i=1}^n \epsilonhat_i^2 is an unbiased estimator for the variance σ2\sigma^2 of the deviations.
Proof: E ⁣[σ^2]=1npi=1nVar ⁣[ε^i]=1nptr ⁣(σ2Q)=σ2 \E{\sigmahat^2} = \frac{1}{n - p} \sum_{i=1}^n \Var{\epsilonhat_i} = \frac{1}{n - p} \trace{\sigma^2 \dmat Q} = \sigma^2

We note that we can write iid samples of any random variable YY with finite first and second moment via the location model Yi=μ+εiY_i = \mu + \epsilon_i where μ=E ⁣[Y]\mu = \E{Y} and εi\epsilon_i is the zero-mean stochastic component. As such, the empirical variance emerges as the estimator σ^2\sigmahat^2 for the variance of the deviations in the location model.

Recap (Empirical variance): The empirical variance of an iid sample y1,,yny_1, \ldots, y_n of YY is var ⁣(y)=1n1i=1n(yiy)2\var{\vy} = \frac{1}{n-1} \sum_{i=1}^n (y_i - \mean{y})^2
Proposition 3.4 (Empirical variance ubiased): The empirical variance is unbiased, i.e. E ⁣[var ⁣(Y)]=Var ⁣[Y]\E{\var{\vY}} = \Var{Y}.

For iid samples of two random variables YY and ZZ with finite first and second moments we have a similar result for the covariance.

Recap (Empirical covariance): The empirical covariance of an iid sample y1,,yny_1, \ldots, y_n of YY and an iid sample z1,,znz_1, \ldots, z_n of ZZ is cov ⁣(y,z)=1n1i=1n(yiy)(ziz)\cov{\vy, \dvec z} = \frac{1}{n-1} \sum_{i=1}^n (y_i - \mean{y})(z_i - \mean{z})
Proposition 3.5 (Empirical covariance unbiased): The empirical covariance is unbiased, i.e. E ⁣[cov ⁣(Y,Z)]=Cov ⁣[Y, Z]\E{\cov{\vY, \rvec Z}} = \Cov{Y}{Z}.
Proof: Let Yi=μY+εY,iY_i = \mu_Y + \epsilon_{Y,i} and Zi=μZ+εZ,iZ_i = \mu_Z + \epsilon_{Z,i} be the iid samples of YY and ZZ with E ⁣[Y]=μY\E{Y} = \mu_Y and Zi=μZZ_i = \mu_Z. For both models we have P=1n1\dmat P = \frac{1}{n} \dmat 1, thus Y=PY+QY=1n1Y+Q(μY1+εY)=Y1+QεY \vY = \dmat P \vY + \dmat Q \vY = \frac{1}{n} \dmat 1 \vY + \dmat Q (\mu_{Y} \dvec 1 + \vepsilon_{Y}) = \mean{Y} \dvec 1 + \dmat Q \vepsilon_{Y} and similarly Z=Z1+QεZ\rvec Z = \mean{Z} \dvec 1 + \dmat Q \vepsilon_{Z}. Hence E ⁣[cov ⁣(Y,Z)]=1n1E ⁣[(YY1)(ZZ1)]=1n1E ⁣[(QεY)QεZ]=1n1E ⁣[εYQQεZ]=1n1E ⁣[tr ⁣(εYQεZ)]=1n1tr ⁣(QE ⁣[εZεY])=Cov ⁣[Y, Z]\begin{align*} \E{\cov{\vY, \rvec Z}} &= \frac{1}{n-1} \E{(\vY - \mean{Y} \dvec 1)^{\top} (\rvec Z - \mean{Z} \dvec 1)} \\ &= \frac{1}{n-1} \E{(\dmat Q \vepsilon_Y)^{\top} \dmat Q \vepsilon_Z} \\ &= \frac{1}{n-1} \E{\vepsilon_Y^{\top} \dmattr Q \dmat Q \vepsilon_Z} \\ &= \frac{1}{n-1} \E{\trace{\vepsilon_Y^{\top} \dmat Q \vepsilon_Z}} \\ &= \frac{1}{n-1} \trace{\dmat Q \E{\vepsilon_Z \vepsilon_Y^{\top}}} \\ &= \Cov{Y}{Z} \end{align*}

3.3 Distributions under SCLM

We now assume the SCLM, thus εN(0,σ2I)\vepsilon \sim \lawN(\dvec 0, \sigma^2 \dmat I).

Recap: The sum of Gaussian random variables is again Gaussian.
Proposition 3.6 (Distributions): The distributions are:
  • β^N(β,σ2(XX)1)\vbetahat \sim \lawN(\vbeta, \sigma^2 (\dmattr X \dmat X)^{-1})
  • Y^N(Xβ,σ2P)\vYhat \sim \lawN(\dmat X \vbeta, \sigma^2 \dmat P)
  • ε^N(0,σ2Q)\vepsilonhat \sim \lawN(\dvec 0, \sigma^2 \dmat Q)
  • npσ2σ^2=1σ2i=1nε^i2χnp2\frac{n - p}{\sigma^2} \sigmahat^2 = \frac{1}{\sigma^2} \sum_{i=1}^n \epsilonhat_i^2 \sim \lawChi{n-p}
Note:
  • Y^\vYhat and ε^\vepsilonhat are independent as they are uncorrelated Gaussian.
  • β^\vbetahat and ε^\vepsilonhat are independent as they are uncorrelated Gaussian.
  • Thus, σ^2\sigmahat^2 is independent of β^\vbetahat, as σ^2\sigmahat^2 is defined by ε^\vepsilonhat.

3.4 Asymptotic Normality

SCLM are strong assumptions. Asymptotically they hold true under the following conditions:

  • The smallest eigenvalues of XX\dmattr X \dmat X goes λmin\lambda_{\min} \to \infty as nn \to \infty
  • The maximum element max{P[1,1],,P[n,n]}0\max \set{\dmat P_{[1,1]}, \ldots, \dmat P_{[n,n]}} \to 0 as nn \to \infty

Then Lidenbergs CLT applies and β^aN(β,σ2(XX)1)\vbetahat \sima \lawN(\vbeta, \sigma^2 (\dmattr X \dmat X)^{-1}). However, OLS may not be efficient in presence of non-Gaussian ε\vepsilon and power of tests and length of CIs can be very wrong.