Definition 6.1 (BLUE): An estimator θ^ of a scalar parameter θ∈Θ is a “best linear unbiased estimator” or BLUE if
θ^ is a linear estimator, i.e θ^=a⊤Y+b
θ^ is unbiased, i.e. ∀θ∈Θ:E[θ^]=θ
θ^ has minimal variance, i.e. for any other linear unbiased estimator γ^ it holds Var[θ^]≤Var[γ^]
The BLUE is defined in terms of a scalar estimator as variance comparisons between multivariate estimators is not well-defined.
Theorem 6.2 (Gauss-Markov): Under the WCLM assumptions, for any c∈R, the OLS estimator c⊤β^ is BLUE for c⊤β.
If further SCLM is assumed an, even stronger statement can be achieved.
Definition 6.3 (UMVUE): An estimator θ^ of a scalar parameter θ∈Θ is a “uniformly minimum variance unbiased estimator” or UMVUE if
θ^ is unbiased, i.e. ∀θ∈Θ:E[θ^]=θ
θ^ has minimal variance, i.e. for any other unbiased estimator γ^ it holds Var[θ^]≤Var[γ^]
Theorem 6.4 (Lehmann-Scheffé): Under the SCLM assumptions, for any c∈R, the OLS estimator c⊤β^ is UMVUE for c⊤β.
6.2 Generalized Assumptions
We now introduce a generalization of the WCLM and SCLM by relaxing the assumptions that the deviations are homoscedastic.
Assumption 6.5 (Weak general linear model): For each observation the response is the linear function Yi=∑i=1pβixi+εi with zero-mean i.e. E[εi]=0 for all i∈{1,…,n}.
Note:
Notet that the deviations are possibly heteroscedastic and correlated.
For notational convenience, we use WGLM to denote the weak general linear model.
Assumption 6.6 (Strong general linear model): For each observation the response is the linear function Yi=∑i=1pβixi+εi with zero-mean Gaussian deviations, i.e. εi∼iidN(0,Σ).
Note: For notational convenience, we use SGLM to denote the strong general linear model.
These are rather general assumption so sometimes we additionally assume that some structure is known.
Assumption 6.7 (Known correlation structure): The correlation matrix of the deviations is known up to some multiplicative constant σ2, i.e. Cov[ε,ε]=σ2S where S is positive definite and known.
Alternatively we might assume that the deviations are uncorrelated but heteroscedastic.
Assumption 6.8 (Uncorrelated, heteroscedastic devations): The deviations are uncorrelated but heteroscedastic, i.e. Cov[ε,ε]=diag(σ12,…,σn2) where σi2 is unknown.
6.3 Generalized Least Squares
In this section we assume the SGLM with known correlation structure Σ=σ2S.
Note: We denote the OLS estimator with β^OLS to distinguish it from others.
We have Y∼N(Xβ,σ2S) and β^OLS∼N(β,σ2(X⊤X)−1SX(X⊤X)−1). This estimator, while unbiased, is not the most efficient.
Recap (Spectral decomposition of pd matrix): Any positive definite matrix A∈Rn×n can be decomposed into A=UΛU⊤ where
U=[u1…un] is an orthogonal eigenbasis of eigenvectors ui of A
Λ=diag(λ1,…,λn) is a diagonal matrix of eigenvalues λi of A
Recap (Square root of pd matrix): Given a positive definite matrix A∈Rn×n and let Λ1/2=diag(λ1,…,λn). We define
A1/2=UΛ1/2U⊤
A−1/2=UΛ−1/2U⊤
We note that A1/2 and A−1/2 are unique irrespective of the choice of U and that (A1/2)2=A and (A−1/2)2=A−1.
We transform the linear model via left multiplication of S−1/2, i.e.
S−1/2Y=S−1/2Xβ+S−1/2ε⟹Y~=X~β+ε~
and note that ε~∼N(0,σ2I). Thus we can use OLS estimator in the tilde model β^GLS=(X~⊤X~)−1X~⊤Y~.
Definition 6.9 (GLS estimator): The GLS estimator β^GLS is
β^GLS=(X⊤S−1X)−1X⊤S−1Y
Note: Assume known correlation structure S. Under WGLM the GLS estimator is BLUE. Under SGLM the GLS estimator is UMVUE. If S=I, GLS has smaller variance than OLS, i.e. is more efficient.
Proposition 6.10 (Distribution of the GLS estimator): Under known correlation structure S the distribution of the GLS estimator is
β^GLS∼N(β,σ2(X⊤S−1X)−1)
6.4 Weighted Least Squares
A special case of SGLM with known correlation structure S, is if S=diag(v1,…,v2). Then least squares estimation amounts to
β^GLS=β^WLS=β∈Rpargmini=1∑nwi(Yi−xi⊤β)2
This procedure is called weighted least squares or WLS with the weights wi=vi−1∝Var[εi]−1. If Var[εi] is large, we downweight the i-th contribution.