We consider here nonparametric regression with one predictor variable. Practically relevant generalizations to more than one or two predictor variables are not so easy due to the curse of dimensionality and will require different approaches discussed later.
Assumption 3.1 (
iid assumption for random design)
: We assume the observations are
iid samples from the joint distribution of two real random variables
X and
Y, i.e.
(X1,Y1),…,(Xn,Yn)∼iidFX,Y
where
FX,Y denotes the unknown joint
cdf of
(X,Y). Note: While the
iid assumption generalizes for multivariate
X, in nonparametric regression we mainly deal with univariate
X. We further assume the regression model for our data i.e.
Yi=m(Xi)+εi
and strict exogeneity E[εi ∣ Xi=xi]=0. Hence m(x)=E[Y ∣ X=x]. We note that m:Rp→R can be an arbitrary regression function and does not necessarily need to be linear.
Note: There are four paradigms in nonparametric regression:
- local averaging
- local modelling
- global modelling
- penalized modelling
3.1 The Kernel Regression Estimator
The kernel regression estimator or Nadaraya–Watson estimator is a local averaging nonparamtric regression estimator. The idea is to plug in the kernel density estimates for fXi,Yi and fXi, i.e.
f^X(x)=nh1i=1∑nk(hx−Xi)
and
f^X,Y(x,y)=nh21i=1∑nk(hx−Xi)k(hy−Yi)
into
m(x)=E[Y ∣ X=x]=∫RfX(x)fX,Y(x,y)dy
Definition 3.2 (Kernel regression estimator)
: The kernel regression estimator is
m^(x)=i=1∑nwi(x)Yi
where
wi(x)=∑i=1nk(hx−Xi)k(hx−Xi) Proof: TODO
An interesting interpretation of the kernel regression estimator is
m^(x)=mx∈Rargmini=1∑nk(hx−Xi)(Yi−mx)2
Thus for every x, we are searching for the best local constant mx such that the localized sum of squares is minimized. Localization is here described by the kernel k and gives a large weight to those observations where Xi is close to the point x of interest.
It is useful to represent the regression function estimator m^ evaluated at the observation points X1,…,Xn with a linear operator on Y.
Definition 3.3 (Smoother matrix)
: The smoother matrix of a nonparametric regression estimator
m^ is the matrix
S such that
Y^=SY
where
Y^ is the vector of
Y^i=m^(Xi). Proposition 3.4 (Smoother matrix of kernel regression)
: The smoother matrix of the kernel regression estimator is
S[i,j]=wj(Xi)
for
i,j∈{1,…,n}. Definition 3.5 (Degrees of freedom)
: Given a smoother matrix
S, the degrees of freedom are defined as
df=tr(S) Note: The definition of degrees of freedom can be viewed as a general concept for the number of parameters in a model fit with the smoother matrix
S. Proposition 3.6 (Degrees of freedom of kernel regression)
: The degrees of freedom of the kernel regression estimator is
df=n⋅k(0) 3.2 The Local Polynomial Estimator
The local polynomial estimator is a local modelling nonparametric regression estimator. The idea is to find the local regression parameters of a polynomial fit.
Definition 3.7 (Local Polynomial Estimator)
: THe local polynomial estomator is
m^(x)=β^1(x)
where
β^(x)=[β^1(x),…,β^p(x)]⊤ are the locally fitted regression parameters of the polynomial of degree
p−1, i.e.
β^(x)=β∈Rpargmini=1∑nk(hx−Xi)(Yi−j=1∑pβj(xi−x)j−1)2 We note that
dxrdrm^(x)=r!β^r+1(x)
for r∈{0,…,p−1}.
3.3 The Smoothing Splines Estimator
The smoothing spiles estimator is a global and penalized modelling nonparametric regression estimator.
Definition 3.8 (Smoothing Splines estimator)
: Let
C2 be the set of functions
m with continuous second derivatives. The smoothing splines estimator for some regularization parameter
λ≥0 is
m^λ(x)=m∈C2argmini=1∑n(Yi−m(Xi))2+λ∫Rdx2d2m(x)dx The solution m^λ(x) is a natural cubic spline with knots at X1,…,Xn.