Computational Statistics 2024-03-14

3. Nonparametric Regression

We consider here nonparametric regression with one predictor variable. Practically relevant generalizations to more than one or two predictor variables are not so easy due to the curse of dimensionality and will require different approaches discussed later.

Assumption 3.1 (

\iid

assumption for random design): We assume the observations are

\iid

samples from the joint distribution of two real random variables

\rvec X

and

Y

i.e.

(\rvec X_1, Y_1), \ldots, (\rvec X_n, Y_n) \simiid F_{\rvec X, Y}

where

F_{X, Y}

denotes the unknown joint

\cdf

(\rvec X, Y)

Note: While the

\iid

assumption generalizes for multivariate

\rvec X

in nonparametric regression we mainly deal with univariate

X

We further assume the regression model for our data i.e. $Y_i = m(X_i) + \epsilon_i$ and strict exogeneity $\E{\epsilon_i \mid X_i = x_i} = 0$ Hence $m(x) = \E{Y \mid X = x}$ We note that $m : \R^p \to \R$ can be an arbitrary regression function and does not necessarily need to be linear.

Note: There are four paradigms in nonparametric regression:

local averaging
local modelling
global modelling
penalized modelling

3.1 The Kernel Regression Estimator

The kernel regression estimator or Nadaraya–Watson estimator is a local averaging nonparamtric regression estimator. The idea is to plug in the kernel density estimates for $f_{X_i,Y_i}$ and $f_{X_i}$ i.e. $\fhat_{X}(x) = \frac{1}{nh} \sum_{i=1}^n k\of{\frac{x - X_i}{h}}$ and $\fhat_{X,Y}(x,y) = \frac{1}{nh^2} \sum_{i=1}^n k\of{\frac{x - X_i}{h}} k\of{\frac{y - Y_i}{h}}$ into $m(x) = \E{Y \mid X = x} = \int_R \frac{f_{X,Y}(x, y)}{f_{X}(x)} \dd y$

Definition 3.2 (Kernel regression estimator): The kernel regression estimator is

\mhat(x) = \sum_{i=1}^n w_i(x) Y_i

where

w_i(x) = \frac{k\of{\frac{x - X_i}{h}}}{\sum_{i=1}^n k(\frac{x- X_i}{h})}

Proof: TODO

An interesting interpretation of the kernel regression estimator is $\mhat(x) = \argmin_{m_x \in \R} \sum_{i=1}^n k\of{\frac{x - X_i}{h}} (Y_i - m_x)^2$ Thus for every $x$ we are searching for the best local constant $m_x$ such that the localized sum of squares is minimized. Localization is here described by the kernel $k$ and gives a large weight to those observations where $X_i$ is close to the point $x$ of interest.

It is useful to represent the regression function estimator $\mhat$ evaluated at the observation points $X_1, \ldots, X_n$ with a linear operator on $\rvec Y$

Definition 3.3 (Smoother matrix): The smoother matrix of a nonparametric regression estimator

\mhat

is the matrix

\rmat S

such that

\rvec \Yhat = \rmat S \rvec Y

where

\rvec \Yhat

is the vector of

\Yhat_i = \mhat(X_i)

Proposition 3.4 (Smoother matrix of kernel regression): The smoother matrix of the kernel regression estimator is

\rmat{S}_{[i,j]} = w_j(X_i)

for

i,j \in \set{1, \ldots, n}

Definition 3.5 (Degrees of freedom): Given a smoother matrix

\rmat S

the degrees of freedom are defined as

\df = \trace{\rmat S}

Note: The definition of degrees of freedom can be viewed as a general concept for the number of parameters in a model fit with the smoother matrix

\rmat S

Proposition 3.6 (Degrees of freedom of kernel regression): The degrees of freedom of the kernel regression estimator is

\df = n \cdot k(0)

3.2 The Local Polynomial Estimator

The local polynomial estimator is a local modelling nonparametric regression estimator. The idea is to find the local regression parameters of a polynomial fit.

Definition 3.7 (Local Polynomial Estimator): THe local polynomial estomator is

\mhat(x) = \betahat_{1}(x)

where

\gvec \betahat(x) = [\betahat_1(x), \ldots, \betahat_p(x)]^{\tr}

are the locally fitted regression parameters of the polynomial of degree

p-1

i.e.

\gvec \betahat(x)= \argmin_{\gvec \beta \in \R^{p}} \sum_{i=1}^n k\of{\frac{x - X_i}{h}} \pa{Y_i - \sum_{j=1}^p \beta_j (x_i - x)^{j-1}}^2

We note that $\dvn{r}{\mhat(x)}{x} = r! \betahat_{r+1}(x)$ for $r \in \set{0, \ldots, p-1}$

3.3 The Smoothing Splines Estimator

The smoothing spiles estimator is a global and penalized modelling nonparametric regression estimator.

Definition 3.8 (Smoothing Splines estimator): Let

\setC_2

be the set of functions

m

with continuous second derivatives. The smoothing splines estimator for some regularization parameter

\lambda \geq 0

\mhat_{\lambda}(x) = \argmin_{m \in \setC_2} \sum_{i=1}^n (Y_i - m(X_i))^2 + \lambda \int_{\R} \dvn{2}{m(x)}{x} \dd x

The solution $\mhat_{\lambda}(x)$ is a natural cubic spline with knots at $X_1, \ldots, X_n$