Exercise 1

Prelude and Fugue in Nonparametric Regression (univariate case)

Theoretical Prelude (for warming-up)

Question 1.

  1. Show that locally-constant estimator coincides with Nadaraya-Watson kernel estimator.
  2. Show, that if we add a linear trend to the unknown function, the resulting local linear and cubic spline estimators (with the fixed tuning parameters) can be obtained from the original ones simply by adding the same linear trend.
  3. Assume that the unknown regression function $g \in C^m$. Consider a Nadaraya-Watson kernel estimator with a kernel of $m$-th order.
    1. Find the corresponding asymptotic bias(x), Var(x), MSE(x) and IMSE (up to the order).
    2. What is the optimal order of the bandwidth?
    3. What is the resulting optimal asymptotic order of the IMSE?
  4. Show that a cubic spline $s(x)$ with K knots $\xi_1,\ldots, \xi_k$ can be written in the form $$ s(x)=\beta_0+\beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \sum_{j=1}^K \theta_j (x-\xi_j)_+^3, $$ where $a_+=a$ if $a \geq 0$ and zero otherwise.

Question 2.

Consider a general (univariate) nonparametric regression model:
$y_i=g(x_i)+\epsilon_i, \;\;\;\; i=1,...n,$
where $\epsilon_i$'s are i.i.d. variables with zero mean and the (known) variance $\sigma^2$. Let $\hat{\bf g}$ be an arbitrary linear estimator of ${\bf g}$, i.e. $\hat{\bf g}=H{\bf y}$ for some squared matrix $H$.
  1. Show that the Average Mean Squared Error (AMSE) is
    $AMSE=\frac{1}{n}\sum_{i=1}^nE(\hat{g}_i-g_i)^2=\frac{1}{n}\left({\bf g}^T(I-H)^T(I-H){\bf g}+\sigma^2 tr(H H^T)\right)$
  2. Show that
    $E(RSS)=\sum_{i=1}^n E(y_i-\hat{g}_i)^2={\bf g}^T(I-H)^T(I-H){\bf g}+\sigma^2 tr\left((I-H)(I-H)^T\right)$
  3. Based on the results above find the unbiased estimator for AMSE.
  4. What is the matrix $H$ for a Nadaraya-Watson kernel estimator, local polynomial regression, smoothihg spline?

Fugue in Data

Question 3.

The file dat.reg contains x and y.
  1. Fit kernel estimates to the data using several kernels (depending on the package you use - see Computational Notes for R users below) and choosing bandwidthes according to CV or GCV. Compare kernel estimates.
  2. Return to the previous paragraph using local linear regression estimates.
  3. Fit a smoothing cubic spline to the data selecting the smoothing parameter by GCV. Compare the resulting spline estimate with those obtained in the previous paragraph. .
  4. Add the linear trend 2 x+1 to the original y and derive again the corresponding estimators from 2.1-2.3. Are you surprised by results? (you're already 'warmed-up'!).
  5. After you've finished 2.1-2.4 I can reveal you the secret (but don't tell others who still haven't done 2.1-2.4!): x and y were simply generated by $y_i=(\sin(2\pi x_i^3))^3+\epsilon_i$, where $\epsilon_i$ are i.i.d. $N(0,.01)$. Using Monte Carlo generate 100 other random samples y at the same design points x from the function $g(x)=(\sin(2\pi x_i^3))^3$ adding a random Gaussian noise with the variance 0.01.
    1. Estimate pointwise squared bias, variance and MSE for each $x_i$ for kernel, local and smoothing spline estimators, and plot them as a function of $x_i$. Comment the results.
    2. Estimate the corresponding global average squared bias, variance and AMSE for the above estimators. Compare their goodness-of-fit.

Question 4.

What can be better than a glass of good wine? But do you know that it is sort of harvest grapes and their quality (together certainly with wine-making process and further proper keeping) that distinguish wines from different areas and vintages? One cannot mix up old deep red wines from Bordeaux mostly made from Cabernet Sauvignon and Merlot grapes, their `competitors' from Burgundy made from Pinot Noir, young fresh Beaujolais, or wonderful Chianti made from Sangiovese grapes. Yarden's Cabernet Sauvignon is also quite different from that, say, of Psagot or Shiloh wineries (can you suggest any test to check this statement (-:) ?).

The file Vineyard.dat contains the data from a vineyard of some Chateau. The vineyard is divided into 52 rows, and the 52 observations in the data set correspond to the yields of the harvests in 1989, 1990 and 1991 measured by the total number of lugs (a lug is a basket that is used to carry the harvest grapes and contains about 30 pounds of grapes). The row numbers are ordered, with increasing row number reflecting movement from northwest to southeast. Rows 31-52 are shorter than rows 1-30 (100 yards long versus 120 yards). Strong winds and animals (birds and raccoons) damage more at the outer, exposed, parts of the vineyard.

The file contains: row number (first column), numer of lugs for 1989 harvest (second column), number of lugs for 1990 harvest (third column), number of lugs for 1991 harvest (fourth column).

  1. Plot the total lug count (the sum of the yields of three harvests) as a function of row. Comment the plot using the information about the data above.
  2. Fit kernel, local linear, smoothing spline and LOESS estimators to the total lug count. Compare the resulting estimates (don't forget analysis of residuals!).
  3. Two rows in the data appear to be possible outliers (can you guess which ones and what is the possible reason for them to be outliers?). Omit these two rows and refit all four estimates. Do the fitted curves change a lot?
  4. L'Haim!

Computational Notes for R users.

Here is a (partial) list of R functions for kernel estimation, local polynomial regression and spline smoothing. See the corresponding help files for details of their use.

  • npreg from the package np available on CRAN computes local polynomial regression estimators and allows one to choose the bandwidth by cross-validation. In fact, it can also be used to fit Nadaraya-Watson kernel estimator (how?)
  • R function loess performs a local polynomial regression where the bandwidth is defined implicitly by the percentage of data points within the window. Unfortunately, this percentage either should be provided by a user or is given by a default value. You can choose a `reasonable' bandwidth by visual analysis of several loess estimators with different bandwidthes.
  • R function lowess is similar to loess but performs a robust local polynomial regression
  • To perform spline smoothing use the function smooth.spline with all.knots=T. It chooses a smoothing parameter either by GCV (default) or CV. The function print(smooth.spline(x,y,...)) provides some useful output for smoothing spline fitting.