Exercise 2
Prelude and Fugue in F-test Major
Theoretical Prelude
Question 1.
Suppose we have a linear model with an intercept, p explanatory variables and i.i.d.
normal errors:
y=β0+β1 x1+...+βp xp+ε
Transform the original response y to y'=y-c, where c is a fixed constant.
- What will happen to the OLS estimates of β's after transformation?
- Show that the residual sums of squares (RSS) for y and y'
will be the same.
- Show that the F-statistics for testing H0 : β1=...=βs =0 will
be also the same in both cases.
- Where did you use the assumption of normality for errors?
How will you test the hypotheses
H0 (see above) when the distribution of errors ε
is different from normal (though known)?
Question 2.
Consider a simple linear regression with a single explanatory variable x.
Show that
- If all n observations xi are equidistant from their average, than
hii=2/n.
- If all but one observation xi's are identical, these will have
hii=1/(n-1), while for the remaining observation hii=1
Fugue in Medical Data
Question 3.
The file Feigl.dat gives the survival times (Time)
in weeks from initial diagnosis
of 33 patients with acute myelogeneous leukaemia, with two covariates:
WBC (white blood cell count in thousands) and AG-factor at the time
of diagnosis (1=Pos, 2=Neg).
- Plot Time against WBC for each level of AG. Does the
plot indicate that the linear model will be appropriate? Try the effect of
the log-transformations on Time and WBC on this plot.
- Fit a full linear regression model (with iteraction) of Time on
WBC and AG. Comment the results. Test for parallel regression.
Does this model fit the data?
- Re-fit the model on the log-log scale.
Does the effect of log(WBC) on log(Time) depend on presence
of AG-factor? Check the adequacy of the resulting model and try to think
of possible reasons for problems you found (if any).
Compare this model with that of the previous paragraph.
Question 4.
The file Charges.dat contains data on the sex,
the attending physician (A,B or C), severity of illness
(1-4), total hospital charges
(Chrg) and age for 49 patients, all of whom had an identical diagnosis,
from Northwestern Memorial Hospital, Chicago.
- Fit the main effect model expressing the charges against age and the other variables
(don't forget first to express them as suitable indicator variables where
necessary).
Is the linear model adequate for this data?
- Find the appropriate transformation of the dependent variable from the
Box-Cox transformation family, re-fit the model and comment its adequacy.
- Test the hypotheses that the attending physician has no effect
on hospital charges (on the chosen scale).
- Some feminist organizations claim that there is sexual discrimination in
the hospital and women suffer from higher hospital charges. Does their claim
have any statistical ground?
- Point out influential observation(s) that strongly affected your model
(if any). Remove them from the data and re-fit the model. Comment the results.
Repeat Step 3 and Step 4.
- Repeat Step 2 without influential observations you've found. Did you get the
same scale for the response variable as before? Try to explain this phenomenon.
- Are you completely satisfied with the resulting model(s)? If "yes",
mazal tov!; if "no", give an idea(s) of improving it.
Computational Notes for R users:
- the function lm used for fitting linear models creates
an object lm.object as its output that contains a lot of useful
information you may need for analysis of your model. See
help(lm.object) for more details
- fitting a linear model by lm and
creating the object lm.object as its output, the function
plot(lm.object) gives useful plots, like residuals vs.
predicted values, Q-Q plot, Cook's distance, etc.
- to find the optimal Box-Cox transformation, you can use the function
boxcox from Ripley's software in the library MASS
you should attach/download first
- to define factor variables use the factor or
ordered functions
(see help for details)
- to use only part of the data in fitting models, use the parameters
subset (preferable) or weights in lm function
(see help(lm) for details)
- the functions update, add1, drop1 may be useful for
modifying models (see help for details)