back to Home Page

Generalized Linear Models

(0365.4006)

Lecturer Prof. Felix Abramovich ( felix@math.tau.ac.il)
Lecture Hours: Tuesday 16-19, Kaplun 319


Purpose

Regression analysis playes a central role in statistics being one of its most powerful and commonly used techniques. The standard linear regression models assume that the response variable is normal (or at least can be transformed to a normal one). However, unfortunately/fortunately (?) it is not always the case. A wide variety of models with a categorical or counting response is typical (althgough not the only ones!) examples, where the assumption of normality cannot be accepted as reasonable. In this course we study generalized linear models, where the response variables are allowed to be non-normal. We start from the general theory of generalazied linear models, extending the corresponding results for standard linear regression, and then consider the most useful particular cases in more details.

Topics:

  1. Introduction
    • Standard (normal) linear regression model
    • Generalized linear regression model
  2. Theory of Generalized Linear Models
    • Model components
      • exponential family and its properties
      • link functions
    • Maximum likelihood estimation
      • Newton-Raphson method
      • iteratively reweighted least squares
    • Goodness-of-fit
      • analysis of deviance
      • Pearson statistic
      • analysis of residuals
    • Model selection
  3. Particular Models
    • Binary data
    • Binomial data
    • Multinomial data
    • Poisson data
  4. Overdispersion & Quasi-Likelihood Models
  5. Nonparametric GLM
  6. Normal linear models with heterogeneous variance and GLM
  7. Generalized linear mixed effects models

Literature

  • Dobson, A.J. An Introduction to Generalized Linear Models.
  • McCullagh, P. and Nelder, J.A. Generalized Linear Models.
  • Wood, S.N. Generalized Additive Models. An Introduction with R (Chapter 2).
  • Myers, R.H. and Montgomery, D.C. A tutorial on Generalized Linear Models. Journal of Quality Technology, 29, 274-291.
  • Chapter 5: Green, P.G. and Silverman B.W. Nonparametric Regression and Generalized Linear Models.

Homework Exercises:


Computing:

The course assumes an extensive use of computer. There are no limitations on using various statistical packages and software for this course, although the data-examples considered in the class will be ``R-oriented''. Installation instructions and manuals for R can be found on the R Home page . The R based books may be helpful for this course:

In addition, you can enjoy various R packages that are not included in ka standard R software. For example, you can find very useful Ripley's software provided with the book "Modern Applied Statistics with S". To use Ripley's software enter S-Plus and give the command: library(MASS)

It is a good idea to add the above command to the function .First . This function is automatically executed every time you start R and in this case you won't need to give this command every R session. If you have not created the function .First before, do it by .First<-function(){library(MASS)} and you will creat the R function .First that meanwhile contains the only library command.

Some specific R notes you may find useful for generalized linear models:

  • To fit a generalized linear model you will generally use the function glm:
    glm(formula, family=...(link=...),...)
    The glm function creates an object of class glm that contains most of information you need. See help(glm.object) for details.
  • For some data the convergence of the iteratively reweighted least squares algorithm is slow and does not occur in (default) 10 iterations. It may happen, for example, in binomial models with a lot of empty cells. R gives you a "Warning". Don't panic! You can increase the number of iterations by the parameter maxit:
    glm(formula,...,maxit=...)
  • To evaluate the fitted model at some new values of the predictors use the function predict:
    predict(glm.object, type=...,se=T)
    The output will contain, in particlular, a vector of estimated response (depending on type) and a vector of standard errors for constructing confidence intervals for the mean response.