Exercise 2

Question 1. Multinomial Prelude.

The file Miners.dat gives the number of coalminers classified by radiological examination into one of three categories of pneumoconiosis (N - normal, M - mild pneumoconiosis, S - severe pneumoconiosis). The observarions are grouped according to the period of time Period that individuals have spent working in the mine.
  1. Plot the proportions of miners in each category against years worked. Find an appropriate statistical model for the multiple response proportions. Comment the results and goodness-of-fit.
  2. Repeat the previous paragraph performimg a log transformation of Period. Compare the two models and choose the ``better" one.
  3. Is there a significant difference between the M and S categories? Plot the fitted proportions from the final model and compare them with the original data. Comment the goodness-of-fit.
  4. Try the proportional odds model to the data. Does this model seem adequate?

Poisson Passions

Question 2 (theoretical 'warming-up').

  1. Let Xi~Pois(λi), i=1,...,n be independent Poisson random variables. What is the distribution of S=X1+...+Xn? Find the joint conditional distribution of X1,..., Xn given S.
  2. For the Poisson distribution, find the transformation Ψ(θ) which "normalizes'' the likelihood, i.e. impliesl'''(Ψ0)=0, where Ψ0 is the maximum likelihood estimate of Ψ.
  3. Consider the Poisson model. Give the form of the iterative weights wi and the "adjusted dependent variable'' zi used in the iteratively re-weighted least squares algorithm for the linear model (identity link function) for the mean.
  4. Show that for the Poisson model, the deviance D and the Pearson statistic X2 are ``close''.

Question 3.

The data in the file Ship.dat concern a type of damage caused by waves to the forward section of certain cargo-carrying vessels. For the purpose of settings standards for hull construction it is important to know the risk of damage associated with the three classifying factors:
  • Ship type A-E
  • Year of operation 1960-74, 1975-79
  • Year of construction 1960-64, 1965-69, 1970-74, 1975-79
The data gives the three classifying factors, the aggregate number of months service and the number of damage incindents (as distinct from the number of ships damage). Note that a single ship may be damaged more than once and furthermore that some ships will have been operating in both periods. No ships constructed after 1975 could have operated before 1974, explaining some of zero cells.
  1. Fit the appropriate main effects model. Analyse its goodness-of-fit.
  2. Is there need to include iteraction terms to the model? (use model selection procedure(s) you find helpful)
  3. Summarize the results and give conclusions.
Computational Notes for R users:
  • the function multinom from the library nnet fits multinomial models
  • the function polr from the library MASS fits proportional odds models
  • the function step can be helpful for model selection.