Let X1,…,Xn∼iidF from a differentiable cdf F with pdf f=F′. The task is to estimate f by some f^. As always, there are many garbage estimators. The question is what is a good one?
We could define the mean-squared error as
MSE(x)=E[(f(x)−f^(x))2]
which is a bit weird because we do not care about a single x∈R. A better criteria is
MISE=∫MSE(x)dx=∫(E[f^(x)]−f(x))2+Var[f^(x)]dx
Note that this is still just a surrogate criterion.
Example (Histogram): Choose x0∈R and h>0. Then ∀x∈R we have
f^x0,h(x)=j∈Z∑g^j1{x∈Ij}
where ∀j∈Z we have Ij=(x0+jh,x0+(j+1)h) and gj^=nh#{i∈{1,…,n}:Xi∈Ij}.
f^x0,h is not even continuous. Is this a problem?
Example (Kernel density estimator): Fix a kernel k:R→R≥0 such that ∫−∞∞k(x)dx=1,k is bounded and ∀x∈R:k(x)=k(−x). Let h>0 and define
f^h(x)=nh1i=1∑nk(hx−Xi)
where nh1 ensures that f^h(x) is integrable to 1. Common choiche are gaussian kernels
k(x)=2π1e−2x2
Note that many properties over the kernel carry over to the estimator.