Exercise 5

Nonlinear Regression

Question 1.

The file Puromycin.dat contains the data on the substrate concentration of Puromycin, x (parts per million, ppm) and the initial rate, or "velocity", y, of the enzymatic reaction (counts/min2) in the presence of Puromycin. The velocity is assumed to depend on the substrate concentration according to the Michaelis-Menten equation: y=θ1 x/(θ2+x).
  1. What is the physical/mathematical meaning of the parameters θ1 and θ2?
  2. Using transformations of x and y transform the original nonlinear model to a linear one. Fit the corresponding linear model. Does it seem to be adequate?
  3. Find the matrix V for the Michaelis-Menten original nonlinear model. Fit the nonlinear model using the results of 1.2 for obtaining initial values for the parameters, and check the adequacy of the nonlinear model.
  4. Test the hypothesis θ1=200 applying F- and t-tests, comment the results.

Generalized Linear Models

Question 2 (theoretical 'warming-up').

  1. Which of the following distributions belong to the natural exponential family: negative binomial NB(r,p) with given r, unform U[0,θ], Student tn, Gamma(α,β) with fixed α, Gamma(α,β) with fixed β, Beta(p,q) with fixed q? For those of them who do belong to the exponential family, find their natural parameters θ's and functions b(θ).
  2. Given a sample y1,...,yn from a distribution that belongs to the natural exponential family with a natural parameter θ, find the sufficient statistic for θ.
  3. Consider the Poisson log-linear model. Give the form of the corresponding iterative weights wi and the "adjusted dependent variable'' zi used in the iteratively re-weighted least squares algorithm.
  4. Show that for the Poisson model, the deviance D and the Pearson statistic X2 are ``close''.

Question 3.

The data in the file GHQ.dat come from a psychiatric study. Each of 120 patients was administered the 12-item General Health Questionaire (GHQ), resulting score between 0 and 12, and was subsequently given a full psychiatric examination by a psychiatrist who did not know the patient's GHQ score. The patient was classified by a psychiatrist as either a "Case", requiring psychiatric treatment or a "Non-Case". The goal of research was to establish whether the GHQ score could indicate the need for psychiatric treatment. More specifically, given patient's GHQ and Sex, what can be said about the probability P that the patient is a psychiatric case?

The file gives the GHQ, the number of Cases C, and Non-Cases NC at each GHQ score classified by the factor Sex (1=Male, 2=Female).

  1. Plot the proportion of patients who need psychiatric treatment as a function of GHQ for both sexes. Can you say something from a visual analysis of the data?
  2. Fit an appropriate model for studying the relations between P and explanatory variables GHQ and Sex. Analyse the results. Does the effect of GHQ on P depend on patient's sex? Does Sex influence on P at all?
  3. Fit the final model. Plot the fitted proportions and compare them with the observed ones. Interpretate the results. Do they support/contradict your preliminary conclusions based on the visual analysis of the data? Could you recommend to use the GHQ as a reasonable indicator for the need of a psychiatric treatment without a full expensive and sophisticated psychiatric examination?
  4. What can you say about the probability of a new female patient with GHQ=2 to be a psychiatric case? (give both the pointwise estimate and the corresponding confidence interval). R functions predict or predict.glm may be helpful (see their help for more detail).

Computational Notes for R users:

  • To fit nonlinear regression use the function nls (see help for details).
  • To fit a generalized linear model you will generally use the function glm:
    >glm(formula, family=...(link=...),...)
    The glm function creates an object of class glm that contains most of information you need. See help(glm.object) for details.
  • For some data the convergence of the iteratively reweighted least squares algorithm is slow and does not occur in (default) 10 iterations. It may happen, for example, in binomial models with a lot of empty cells. R gives you a "Warning". Don't panic! You can increase the number of iterations by the parameter maxit:
    >glm(formula,...,maxit=...)
  • To evaluate the fitted model at some new values of the predictors use the function predict:
    >predict(glm.object, type=...,se=T)
    The output will contain, in particlular, a vector of estimated response (depending on type) and a vector of standard errors for constructing confidence intervals for the mean response.