Statistical Modelling 2025-10-16

3. Distributional Properties of OLS

In this chapter, unless otherwise noted, we will always assume the WCLM and will specify if additional assumptions are made.

3.1 Moments under WCLM

Proposition 3.1 (Expectations): The expectations are:

$\E{\vbetahat} = \vbeta$
$\E{\vYhat} = \E{\vY} = \dmat X \vbeta$
$\E{\vepsilonhat} = \dvec 0$

Proposition 3.2 (Covariance matrices): The covariance matrices are:

$\Cov{\vbetahat}{\vbetahat} = \sigma^2 (\dmattr X \dmat X)^{-1}$
$\Cov{\vYhat}{\vYhat} = \sigma^2 \dmat P$
$\Cov{\vepsilonhat}{\vepsilonhat} = \sigma^2 \dmat Q$
$\Cov{\vbetahat}{\vYhat} = \sigma^2 (\dmattr X \dmat X)^{-1} \dmattr X$
$\Cov{\vbetahat}{\vepsilonhat} = \dmat 0$
$\Cov{\vYhat}{\vepsilonhat} = \dmat 0$

Note: Recall that generally

\check{\vy}

and

\vepsiloncheck

are empirically uncorrelated only if the model includes an intercept. If treated as random vectors

\vYhat

and

\vepsilonhat

however, they are uncorrelated without restrictions.

3.2 Estimator for Deviation Variance

We can now define an ubiased estimator for the variance of the deviations.

Definition 3.3 (Unbiased estimator for deviation variance): The estimator

\sigmahat^2 = \frac{\vepsilonhat^{\top} \vepsilonhat}{n - p} = \frac{1}{n - p} \sum_{i=1}^n \epsilonhat_i^2

is an unbiased estimator for the variance

\sigma^2

of the deviations.

Proof:

\E{\sigmahat^2} = \frac{1}{n - p} \sum_{i=1}^n \Var{\epsilonhat_i} = \frac{1}{n - p} \trace{\sigma^2 \dmat Q} = \sigma^2

We note that we can write iid samples of any random variable $Y$ with finite first and second moment via the location model $Y_i = \mu + \epsilon_i$ where $\mu = \E{Y}$ and $\epsilon_i$ is the zero-mean stochastic component. As such, the empirical variance emerges as the estimator $\sigmahat^2$ for the variance of the deviations in the location model.

Recap (Empirical variance): The empirical variance of an iid sample

y_1, \ldots, y_n

Y

\var{\vy} = \frac{1}{n-1} \sum_{i=1}^n (y_i - \mean{y})^2

Proposition 3.4 (Empirical variance ubiased): The empirical variance is unbiased, i.e.

\E{\var{\vY}} = \Var{Y}

For iid samples of two random variables $Y$ and $Z$ with finite first and second moments we have a similar result for the covariance.

Recap (Empirical covariance): The empirical covariance of an iid sample

y_1, \ldots, y_n

Y

and an iid sample

z_1, \ldots, z_n

Z

\cov{\vy, \dvec z} = \frac{1}{n-1} \sum_{i=1}^n (y_i - \mean{y})(z_i - \mean{z})

Proposition 3.5 (Empirical covariance unbiased): The empirical covariance is unbiased, i.e.

\E{\cov{\vY, \rvec Z}} = \Cov{Y}{Z}

Proof: Let

Y_i = \mu_Y + \epsilon_{Y,i}

and

Z_i = \mu_Z + \epsilon_{Z,i}

be the iid samples of

Y

and

Z

with

\E{Y} = \mu_Y

and

Z_i = \mu_Z

For both models we have

\dmat P = \frac{1}{n} \dmat 1

thus

\vY = \dmat P \vY + \dmat Q \vY = \frac{1}{n} \dmat 1 \vY + \dmat Q (\mu_{Y} \dvec 1 + \vepsilon_{Y}) = \mean{Y} \dvec 1 + \dmat Q \vepsilon_{Y}

and similarly

\rvec Z = \mean{Z} \dvec 1 + \dmat Q \vepsilon_{Z}

Hence

\begin{align*} \E{\cov{\vY, \rvec Z}} &= \frac{1}{n-1} \E{(\vY - \mean{Y} \dvec 1)^{\top} (\rvec Z - \mean{Z} \dvec 1)} \\ &= \frac{1}{n-1} \E{(\dmat Q \vepsilon_Y)^{\top} \dmat Q \vepsilon_Z} \\ &= \frac{1}{n-1} \E{\vepsilon_Y^{\top} \dmattr Q \dmat Q \vepsilon_Z} \\ &= \frac{1}{n-1} \E{\trace{\vepsilon_Y^{\top} \dmat Q \vepsilon_Z}} \\ &= \frac{1}{n-1} \trace{\dmat Q \E{\vepsilon_Z \vepsilon_Y^{\top}}} \\ &= \Cov{Y}{Z} \end{align*}

3.3 Distributions under SCLM

We now assume the SCLM, thus $\vepsilon \sim \lawN(\dvec 0, \sigma^2 \dmat I)$

Recap: The sum of Gaussian random variables is again Gaussian.

Proposition 3.6 (Distributions): The distributions are:

$\vbetahat \sim \lawN(\vbeta, \sigma^2 (\dmattr X \dmat X)^{-1})$
$\vYhat \sim \lawN(\dmat X \vbeta, \sigma^2 \dmat P)$
$\vepsilonhat \sim \lawN(\dvec 0, \sigma^2 \dmat Q)$
$\frac{n - p}{\sigma^2} \sigmahat^2 = \frac{1}{\sigma^2} \sum_{i=1}^n \epsilonhat_i^2 \sim \lawChi{n-p}$

Note:

$\vYhat$ and $\vepsilonhat$ are independent as they are uncorrelated Gaussian.
$\vbetahat$ and $\vepsilonhat$ are independent as they are uncorrelated Gaussian.
Thus, $\sigmahat^2$ is independent of $\vbetahat$ as $\sigmahat^2$ is defined by $\vepsilonhat$

3.4 Asymptotic Normality

SCLM are strong assumptions. Asymptotically they hold true under the following conditions:

The smallest eigenvalues of $\dmattr X \dmat X$ goes $\lambda_{\min} \to \infty$ as $n \to \infty$
The maximum element $\max \set{\dmat P_{[1,1]}, \ldots, \dmat P_{[n,n]}} \to 0$ as $n \to \infty$

Then Lidenbergs CLT applies and $\vbetahat \sima \lawN(\vbeta, \sigma^2 (\dmattr X \dmat X)^{-1})$ However, OLS may not be efficient in presence of non-Gaussian $\vepsilon$ and power of tests and length of CIs can be very wrong.