Linear Models

(0365.4004)

Lecturer	Prof. Felix Abramovich ( felix@tauex.tau.ac.il)
Lecture Hours	Monday 13-16, Schreiber 007

syllabus
literature
example files
homework
exam

Purpose

Regression analysis plays a central role in statistics being one of its most powerful and commonly used techniques. Regression analysis deals with problems of finding appropriate models to represent relationships between a response variable and a set of explanatory variables based on data collected from a series of experiments. These models are used to represent existing data and also to predict new observations. The basic regression models are linear ones. Although they are the simplest and (hence) most well studied models, they nevertheless do work in numerious problems. Sometimes even for non-linear models it is possible to transfer the original non-linear model to a linear one after certain transformations of variables; in some other cases linearization of complex non-linear models may be used. In this course we'll try to understand how linear models work and when it is possible to use them efficiently. In the era of "Big Data" we will pay special attention to linear regression techniques for high-dimensional data, where model selection becomes essential.

Topics:

Introduction
- regression models
- linear regression models, examples of linear regression models
Least Squares Estimates
- derivation of LSE for regression coefficients
- statistical properties of LSE
- Gauss-Markov theorem
- geometrical interpretaion of LSE
- multiple correlation coefficient
Statistical Inference
- maximum likelihood estimators for normal models
- confidence intervals and confidence regions for regression coefficients
- hypothesis testing: t-test, LRT-test ( F-test)
Model Criticism
- analysis of residuals
- influential observations, Cook's distance
- the Box-Cox transformation family
Prediction and Forecasting
Model Selection
- criteria for model selection: correlation coefficient, penalized least squares, cross-validation
- model selection and dimensionality reduction in high-dimensions: stepwise procedures, lasso, principle component regression, partial least squares
Some Special Topics:
- ridge regression
- polynomial regression, orthogonal polynomials
- piecewise-polynomial regression, splines
Generalized Least Squares
- motivation, derivation of generalized LSE for regression coefficients
- some special cases: unequal variances, repeated measurements, hierarchical models
Random and mixed effects models
- ANOVA models with fixed and random effects
- variance component (mixed effects) models
Robust Regression
- robustness and resistance
- M-estimators
- robust regression
Nonlinear Regression
- least squares estimation, the Gauss-Newton method
- statistical inference
Nonparametric Regression (if time permits)

Literature (in alphabetical order)

Faraway, J.J. Linear Models with R.
Rao, C.R. and Toutenburg, H. Linear Models. Least Squares and Alternatives.
Ryan, T.P. Modern Regression Methods.
Seber, G. A. Linear Regression Analysis.
Sen, A. and Srivastava, M. Regression Analysis: Theory, Methods and Applications.
much-much more

Example files:

Cats (Cats.dat)
Trees (Trees.dat)
Cars (Cars.dat)
Weight loss (wtloss data from the library MASS)
Sales (Sales.dat)
Drugs (Drugs.dat)
Orthodont (Orthodont.dat)
M-estimators
Phones (phone data from the library MASS)
Rumford (Rumford.dat)
Kernel
Mcycle (mcycle data from the library MASS)

Homework

Homework exercises is an integral part of the course is homework's average grade is 10% of the final grade.

I strongly encourage you to submit homeworks using R Markdown. R Markdown produces .pdf, HTML, Word files and slides that may include text, R code, the corresponding code output, mathematical formulas etc. When you click the Knit button in RStudio a document will be generated that includes both content (text, formulas, etc.) as well as the output of any embedded R code (plots, tables etc.) within the document. For more details on using R Markdown see, for example, R Markdown reference guide or R Markdown: The Definitive Guide. Using RStudio install R packages markdown and knitr. Also, in order to include mathematics in your documents you need to install LaTex/MikTex on your computer.

Homework Exercises:

Exercise 1 (7 March)
Exercise 2 (21 March)
Exercise 3 (4 April)
Exercise 4 (9 May)
Exercise 5 (23 May)

Computing:

The course assumes an extensive use of computer. There are no limitations on using various statistical packages and software for this course, although the data-examples considered in the class will be "R-oriented". Installation instructions and manuals for R can be found on the R Home page . The following R based books may be helpful for this course:

Aitkin, M., Francis, B., Hinde, J. and Darnell, R. Statistical Modelling in R.
Faraway, J.J. Linear Models with R.
James, G., Witten, D., Hastie, T. and Tibshirani, R. An Introduction to Statistical Learning with Applications in R.
Venables, W.N. and Ripley, B.D. Modern Applied Statistics with S.