Exercise 2
Question 1. Multinomial Prelude.
The file Miners.dat gives the number of coalminers
classified by radiological examination into one of three categories of
pneumoconiosis (N - normal, M - mild pneumoconiosis, S -
severe pneumoconiosis). The observarions are grouped according to the period
of time Period that individuals have spent working in the mine.
- Plot the proportions of miners in each category against years worked.
Find an appropriate statistical model for the multiple response proportions.
Comment the results and goodness-of-fit.
- Repeat the previous paragraph performimg a log transformation of
Period. Compare the two models and choose the ``better" one.
- Is there a significant difference between the M and S categories?
Plot the fitted proportions from the final model
and compare them with the original data. Comment the goodness-of-fit.
- Try the proportional odds model to the data. Does this model seem
adequate?
Poisson Passions
Question 2 (theoretical 'warming-up').
- Let xi~Pois(λi), i=1,...,n be independent
Poisson random variables.
What is the distribution of S=x1+...+xn?
Find the joint conditional distribution of x1,...,
xn given S.
- For the Poisson distribution, find the transformation Ψ(θ)
which "normalizes'' the likelihood, i.e. implies l'''(Ψ0)=0,
where Ψ0 is the maximum likelihood estimate of Ψ.
- Consider the Poisson model. Give the form of the iterative weights
wi and the "adjusted dependent variable'' zi used in the
iteratively re-weighted least squares algorithm for the linear model
(identity link function) for the mean.
- Show that for the Poisson model, the deviance D and the Pearson
statistic X2 are ``close''.
Question 3.
The data in the file Ship.dat concern a type of damage
caused by waves to the forward section of certain cargo-carrying vessels. For
the purpose of settings standards for hull construction it is important to
know the risk of damage associated with the three classifying factors:
- Ship type A-E
- Year of operation 1960-74, 1975-79
- Year of construction 1960-64, 1965-69, 1970-74, 1975-79
The data gives the three classifying factors, the aggregate number of
months service and the number of damage incindents (as distinct from the number of
ships damage).
Note that a single ship may be damaged more
than once and furthermore that some ships will have been operating in both
periods. No ships constructed after 1975 could have operated before 1974,
explaining some of zero cells.
- Fit the appropriate main effects model. Analyse its goodness-of-fit.
- Is there need to include iteraction terms to the model? (use model selection
procedure(s) you find helpful)
- Summarize the results and give conclusions.
Computational Notes for R users:
- the function multinom from the library nnet fits multinomial
models
- the function polr from the library MASS fits proportional
odds models
- the function step can be helpful for model selection.