Question 1.
Consider a linear regression model with p explanatory variables:
y=Xβ+ε,
where X is a n by p matrix, E(ε)=0 and
Var(ε)=σ2In (σ is
unknown). We want to test a general linear
hypothesis H0:Aβ=c, where
A is an m by p matrix (m <= p) of rank m and
c is a m-dimensional vector.
- Show that the following null-hypotheses are particular cases of this general
case (derive the corresponding matrix A and vector c):
- β1=...=βk=0
- β1=...=βk
- β1=7
- β1=7, β3=2β2-1,
β2+β4=3β5
- Find the OLS estimators of β under H0
- Using the results of the previous paragraph and, in addition, assuming
that εi are normaly distributed, derive
the corresponding test for testing H0 (try to simplify the
final formula as much as you can).
Question 2.
Consider a simple linear Gaussian regression without an intercept:
yi= β xi + εi,
where i.i.d. εi ~ N(0,σ2), i=1,...,n
- Find the MLEs for β and σ2.
- Is the MLE for β a linear estimator? Is it unbiased?
Is it UMVUE? What is its MSE?
Question 3.
Suppose
y=Xβ+ε,
where εi are i.i.d. with zero means and the common variance σ2.
We wish to estimate x0'β
at some point x0
based on the ordinary least squares (OLS) estimator β*
of β as x0'β*.
- Find the mean squared error (MSE) of the resulting estimator.
- Show that there does not exist any other linear unbiased estimator of
x0'β with a smaller variance, i.e. that
this estimator is BLUE.
- Can you claim that this is the best unbiased estimator among all possible
unbiased estimators?
Will the additional assumption of normality of ε's
be helpful?
Question 4.
Consider the one-way analysis of variance model with the same numbers of observations
n in each of m groups:
yij=μ+αj+εij, i=1,...,n; j=1,...,m
where εij are i.i.d. normal variables with zero means and
the variance σ2.
- Treat the group factor as a fixed effect.
-
Find the MLE for μ, αj and σ2.
- Formulate the hypothesis for testing the homogeneity among groups. What is
the corresponding test?
- Treat the group factor as a random effect.
- Define an appropriate model and find the MLE for its parameters.
- Formulate the hypothesis for testing the homogeneity among groups in terms
of the random effects model and derive an appropriate test statistic.
- Explain the `conceptual' differences between fixed and random effects
models. Give examples where it is reasonable to consider a group factor as a
fixed and random effect correspondingly.
Question 5.
Consider the following linear growth curve model with random intercept and
slope, where n measurements are taken repeatedly on each of
m individuals over time, that is
yij|β0j,β1j =
β0j + β1j ti + εij, i=1,...,n; j=1,...,m,
where εij ~ N(0,σ2),
β0j ~ N(β*0, σ20),
β1j ~ N(β*1, σ21), and all
εij, β0j and β1j are independent.
Find the joint (marginal) distribution of the data yij,
i=1,...,n; j=1,...,m.
And for the dessert here is the story about Tom Statman, a statistician from London, that essentially may happen to any of us...
Enjoy the reading, but
each time
appears,
please, stop there for a moment - your consulting is badly needed.
Your short but comprehensive assistance will be highly appreciated.
One day from Mr. Statman's statistical practice
It was one of those days when everything runs wrong right from the morning...
It started with a telephone call at 6.00 am. Tom Statman's client, Dr. Fleming,
had been in the hospital all night waiting impatiently for the test results of
his patients treated according to his new method . The moment he got them from
the labaratory he called Tom at home though it was 6.00 am begging him
to statistically analyse these results and to compare them with those from the
control group as soon as possible.
"Please, wake up! It's urgent! It should be done before 10.00 am,
I'm sending the results to your office by email right now!" -
Dr. Fleming was so excited he could hardly speak.
Sleepy Tom, who could hardly understand what was going on, mumbled: "Yeh... t-test... Office..." swearing in
his mind at this idiot Dr. Fleming,
his stupid results, all damn statistical tests and statistics in general, and
closed the phone.
But he also understood that he won't go back to sleep again anyway now...
Tom Statman
recently started this job and each client was quite important, especially client
such as Dr. Fleming who supplied much work and paid well...
"No way... I'd better go to the office and start analysing his data" - thought
Tom. The day had started in a wrong way and it was clear to him
that this was not the end...
He looked out of
the window - it was a dark-grey drizzling cold London winter morning.
Tom put coffee on the gas and looked through the window. He was still
half a sleep and was nodding off a little bit when he suddenly heard
hissing on the gas. It was too late - the coffee spilt on to the cooker and
covered it with large dirty spots. "Damn it! My coffee! The cleaning lady was
here only yesterday" - groaned Tom. He became angry at this coffee,
at this damn morning, at this stupid life and started dressing...
In about ten minutes he went out of his house, sat in his car and tried to
start it. The engine was silent and didn't react to Tom's desperate
efforts. "Come on! Come on!"... No response... It was too much...
Tom put his head on the wheel and started crying...
"Cab! I'll take a cab and this stupid Dr. Fleming will pay me for it!" - decided
Tom.
He was in his office in the City in about half an hour.
No one was there at such an early hour.
Tom switched on his computer: "New mail has arrived" - it was Dr. Fleming's
data. "OK, let's go!" - said Tom to himself. He had got his M.Sc. in
statistics at Oxford and was a real professional. Moreover, unlike a lot of his
colleagues he still liked his job, he liked to analyse data, to discover
"hidden" connections between variables and to look at the astonished faces of his
clients:
"It's unbelieveble. You're a genius, Mr. Statman!". At those moments he was really
happy and was proud of himself. Once he started importing Dr. Fleming's data into the file
he was already enthusiastic about this project. "Please, no mis-recorded data
this time, please" - prayed Tom. He remembered the incident that happened to his
class-mates Alan Weightman and Judy Grouppy in one of the projects during those
good old student days in Oxford...
In a simple linear regression one observation has been miss-recorded.
To remove it from the fit in a simple way, Alan Weightman suggested to use the
weighted regression for the whole data set giving the
null weight to the miss-recorded observation and setting unit weights to all others.
Judy Grouppy never agreed with Alan Weightman and
proposed instead to define a new factor variable: Group=1
for the miss-recorded observation, Group=0 for all others and then fit
the model y~x+Group for the whole data. They argued for two hours and finally
agreed to ask Prof. Wiseman to solve their dispute.
Prof. Wiseman listened carefully to both
sides, thought for a moment and in his typical Jewish manner replied by suggesting several questions of his own:
"Young colleagues,
a)What do you think is the meaning of the extra variable coefficient in the model of
Mrs. Grouppy?
b)Could you compare the regression coefficients for both models and
the residual sum of
squares (RSS) for Judy Grouppy's model with the weighted RSS for Alan
Weightman's model?
Answer both of Prof. Wiseman's questions and comment on the results.
The data transfer was over. As usual Tom started with the
visual analysis: "Well-well... Both samples (the results of control and treatment groups) seem to
be normal with similar variances... Hope a simple two-sample t-test will solve the problem".
But what has
happened?!
Every time he tried to run the two-sample t-test which till this morning had
always worked without a hitch, today returned a strange error message
"t-test function is not found". Tom tried to call it again and again but
the result was the same. "Something has happened to this stupid computer!" -
Tom was almost crying. Their System Administrator was supposed to come
only about 9.30 am. Too late...
It was too much for one day... Tom was in despair. Why does this happen
to him? Who could help him? It was already 7:30 am...
Suddenly he remembered rumors about a famous
oracle, Lin No Lin. Tom always laughed at those stories but nevertheless
could not forget the one told him by his French friend Charles d'Linear:
Charles d'Linear was looking for an adequate linear model for his data that contained
a lot of explanatory variables some of them perhaps being not relevant.
He wasn't a complete `amateur' in statistics and after a while, he found a
reasonable model, checked its R2, R2adj,
R2cv,
performed the analysis of residuals, etc. -
everything seemed OK but he still felt uncomfortable with these
`half-heuristic' goodness-of-fit indicators and was keen to check the adequacy of his model
by some `exact' goodness-of-fit statistical test.
Chalres d'Linear heard good recommendations about you and
asked you to help him to find such a test. Can you help Charles?
Desperate Charles went to Lin No Lin.
The oracle listened carefully to d'Linear's
problems and promised to try to help him. After Charles had gone,
Lin No Lin entered into
deep meditation and in his vision the true value of the noise variance
σ2 has been revealed to him.
Can this miraculous revelation help Charles d'Linear? If `yes', then how?
Tom was already with his hand on the telephone book to start searching
for Lin No Lin,
when a sudden idea crossed his mind: "Linear... linear regression...
Yes! Linear regression! I can use the linear regression function to perform
this damn t-test!" - Tom was dancing. He was so ashamed now about his
one-minute weakness and readiness to call this charlatan Lin No Lin.
It was not the first time that linear regression helped Tom. In
one of his projects...
In one of his projects Tom faced a nonlinear model
y=b0+a x1x3 + b
x2x3+ a k x1 + b k x2+ε,
with the unknown parameters b0, a, b, k.
He did not have any nonlinear regression function then but, anyway, managed
to find OLS estimates for all the parameters by running a series of
linear regressions.
Is it a non-linear model indeed?
What was Tom's idea? Did he get analytical or only numerical
solutions?
Tom jumped
up to his computer and ran the linear regression. He saw the results, turned
back on his chair and started singing his favourite Queen song "We are the
Champions, my friend...". Life was not so awful, after all...
- What was Tom's enthusiasm based on (if at all)?
What kind of linear model was
Tom running and how could he (how did he think he could?) use the
results of this linear regression for his original t-test?
- Did he need the standard two-sample t-test assumption of equivalence of the
variances?
Good Luck!