Exercise 3
Question 1.
Let $\hat{f}_h$ be a kernel estimator of the unknown density $f$ with a bandwidth $h$. Show that $\int \hat{f}_h(y)dy=1$.
Question 2.
Let $Y_1,\ldots,Y_n \sim f(y)$, where the density $f$ is unknown.
- We want to estiamte $\mu=EY$. An obvious (?) estimator is $\bar{Y}$.
Suppose however that we decided to estimate first $f$ by a kernel estimator $\hat{f}_h(y)$ based on a symmetric kernel
and a bandwidth $h$, and then estimate $\mu$ by $\hat{\mu}=\int y \hat{f}_h(y)dy$. What is the resulting $\hat{\mu}$? Is it a consistent estimator for $\mu$?
Does it depend on a specific kernel and on a chosen bandwidth?
- Return to the previous paragraph but for estimating the second moment $\mu_2=EY^2$ by $\hat{\mu}_2=\int y^2 \hat{f}_h(y)dy$.
- We say that a kernel $K(\cdot)$ is an $m$-th order kernel if it satisfies
- $\int K(u)du=1$
- $\int u^j K(u)du=0,\;j=1,\ldots,m-1$
- $\int u^m K(u)du \neq 0$
How will $\hat{\mu}_2$ change if we use a third order kernel instead? Generalize these results for estimating the $p$-th moment
$\mu_p=EY^p$ by $\hat{\mu}_p=\int y^p \hat{f}_h(y)dy$.
Question 3.
Consider kernel estimation of the unknown (univariate) density $f$ given a ramdom sample $Y_1,\ldots,Y_n \sim f(y)$.
Assume that $f \in C^m,\;m \geq 1$
and choose the kernel $K(\cdot)$ of the order $m$ (see Question 2). Show that
- $IMSE(\hat{f}_h,f) \leq C_1(K)h^{2m}+\frac{C_2(K)}{nh}$
- the optimal choice for a bandwidth is $h_0=O(n^{-\frac{1}{2m+1}})$
- the optimal $IMSE(\hat{f}_{h_0},f)=O(n^{-\frac{2m}{2m+1}})$