Exercise 3

Question 1.

Let $\hat{f}_h$ be a kernel estimator of the unknown density $f$ with a bandwidth $h$. Show that $\int \hat{f}_h(y)dy=1$.

Question 2.

Let $Y_1,\ldots,Y_n \sim f(y)$, where the density $f$ is unknown.
  1. We want to estiamte $\mu=EY$. An obvious (?) estimator is $\bar{Y}$. Suppose however that we decided to estimate first $f$ by a kernel estimator $\hat{f}_h(y)$ based on a symmetric kernel and a bandwidth $h$, and then estimate $\mu$ by $\hat{\mu}=\int y \hat{f}_h(y)dy$. What is the resulting $\hat{\mu}$? Is it a consistent estimator for $\mu$? Does it depend on a specific kernel and on a chosen bandwidth?
  2. Return to the previous paragraph but for estimating the second moment $\mu_2=EY^2$ by $\hat{\mu}_2=\int y^2 \hat{f}_h(y)dy$.
  3. How will $\hat{\mu}_2$ change if we use a third order kernel instead? Generalize these results for estimating the $p$-th moment $\mu_p=EY^p$ by $\hat{\mu}_p=\int y^p \hat{f}_h(y)dy$.

Question 3.

Consider kernel estimation of the unknown (univariate) density $f$ given a ramdom sample $Y_1,\ldots,Y_n \sim f(y)$. Assume that $f \in H^m,\;m \geq 1$, where $H^m=\{f: \int f^{(m)}(y)^2dy < \infty\}$ and choose the kernel $K(\cdot)$ of the order $m$. Show that
  1. $IMSE(\hat{f}_h,f) \leq C_1(K)h^{2m}+\frac{C_2(K)}{nh}$
  2. the optimal choice for a bandwidth is $h_0=O(n^{-\frac{1}{2m+1}})$
  3. the optimal $IMSE(\hat{f}_{h_0},f)=O(n^{-\frac{2m}{2m+1}})$