Exercise 2
Question 1. Multinomial Prelude.
The file
Miners.dat gives the number of coalminers classified by radiological examination into one of three
categories of pneumoconiosis (
N - normal,
M - mild pneumoconiosis,
S - severe pneumoconiosis). The observarions are grouped according to the period of time
Period that individuals have spent working in the mine.
- Plot the proportions of miners in each category against years worked. Find an appropriate statistical model for the
multiple response proportions. Comment the results and goodness-of-fit.
- Repeat the previous paragraph performimg a log transformation of Period. Compare the two models and choose the ``better" one.
- Is there a significant difference between the M and S categories? Plot the fitted proportions from the final model and compare them with the original data. Comment
the goodness-of-fit.
- Try the proportional odds model to the data. Does this model seem adequate?
Poisson Passions
Question 2 (theoretical 'warming-up').
- Let Xi~Pois(λi), i=1,...,n be independent Poisson random variables. What is the distribution of S=X1+...+Xn? Find the joint conditional distribution of X1,..., Xn given S.
- For the Poisson distribution, find the transformation Ψ(θ) which "normalizes'' the likelihood, i.e. impliesl'''(Ψ0)=0, where Ψ0 is the maximum likelihood estimate of Ψ.
- Consider the Poisson model. Give the form of the iterative weights wi and the "adjusted dependent variable'' zi used in the iteratively re-weighted least squares algorithm for the linear model (identity link
function) for the mean.
- Show that for the Poisson model, the deviance D and the Pearson statistic X2 are ``close''.
Question 3.
The data in the file
Ship.dat concern a type of damage caused by waves to the forward section of certain cargo-carrying
vessels. For the purpose of settings standards for hull construction it is important to know the risk of damage associated
with the three classifying factors:
- Ship type A-E
- Year of operation 1960-74, 1975-79
- Year of construction 1960-64, 1965-69, 1970-74, 1975-79
The data gives the three classifying factors, the aggregate number of months service and the number of damage
incindents (as distinct from the number of ships damage). Note that a single ship may be damaged more than once and
furthermore that some ships will have been operating in both periods. No ships constructed after 1975 could have operated
before 1974, explaining some of zero cells.
- Fit the appropriate main effects model. Analyse its goodness-of-fit.
- Is there need to include iteraction terms to the model? (use model selection procedure(s) you find helpful)
- Summarize the results and give conclusions.
Computational Notes for R users:
- the function multinom from the library nnet fits multinomial models
- the function polr from the library MASS fits proportional odds models
- the function step can be helpful for model selection.