Cars example

Data for fuel consumption in miles per gallon (mpg) and quarter-mile acceleration time in seconds (qmt) for 32 cars tested by the US Motor Trend magazine in 1974.

9 explanatgory variables:

S – shape of engine (0 - vee, 1 - stright)

C – number of cylinders

tt – transmission type (0 - automatic, 1 - manual)

G – number of gears

disp – engine displacement in cubic inches

hp – horse power

cb – number of carburator barrels

drat – final drive ratio

wt – weight of the car in pounds

Data on the original scale

Data on the log-scale

Full main effects linear model

## 
## Call:
## lm(formula = mpg ~ S + C + tt + G + disp + hp + cb + drat + wt, 
##     model = T)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8657 -1.5916 -0.3314  1.5457  5.2595 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 27.239321  10.333701   2.636   0.0151 *
## S1           1.137852   2.075795   0.548   0.5891  
## C           -0.332053   0.851167  -0.390   0.7002  
## tt1          1.911268   1.987005   0.962   0.3466  
## G            0.582256   1.464187   0.398   0.6947  
## disp         0.005984   0.016243   0.368   0.7161  
## hp          -0.023901   0.022053  -1.084   0.2902  
## cb          -0.499067   0.776557  -0.643   0.5271  
## drat         0.845847   1.591046   0.532   0.6003  
## wt          -0.002502   0.001579  -1.584   0.1274  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.668 on 22 degrees of freedom
## Multiple R-squared:  0.8609, Adjusted R-squared:  0.8041 
## F-statistic: 15.13 on 9 and 22 DF,  p-value: 1.683e-07

It seems as if all predictors are non-significant but the overall \(p-value=1.683e-07~\)! There are strong correlations among them.

Full log-main effects linear model

## 
## Call:
## lm(formula = mpg ~ S + C + tt + G + ldisp + lhp + cb + ldrat + 
##     lwt, model = T)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0309 -1.4183 -0.4539  1.1940  4.2639 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 106.23889   30.17546   3.521  0.00193 **
## S1            0.01556    1.78809   0.009  0.99314   
## C             0.13862    0.73040   0.190  0.85122   
## tt1           0.28994    1.84047   0.158  0.87626   
## G             1.02161    1.28165   0.797  0.43391   
## ldisp        -1.88990    3.14171  -0.602  0.55362   
## lhp          -3.85887    2.78532  -1.385  0.17981   
## cb           -0.53273    0.58263  -0.914  0.37045   
## ldrat        -0.25388    5.21057  -0.049  0.96158   
## lwt          -7.50259    3.98008  -1.885  0.07270 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.309 on 22 degrees of freedom
## Multiple R-squared:  0.8958, Adjusted R-squared:  0.8532 
## F-statistic: 21.01 on 9 and 22 DF,  p-value: 7.987e-09

Full main effects log-log linear model

## 
## Call:
## lm(formula = lmpg ~ S + C + tt + G + ldisp + lhp + cb + ldrat + 
##     lwt, model = T)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.184897 -0.056304 -0.009658  0.066993  0.208061 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.274026   1.492494   4.874 7.16e-05 ***
## S1          -0.028731   0.088440  -0.325   0.7484    
## C           -0.003128   0.036126  -0.087   0.9318    
## tt1         -0.046882   0.091031  -0.515   0.6117    
## G            0.079189   0.063391   1.249   0.2247    
## ldisp       -0.116304   0.155390  -0.748   0.4621    
## lhp         -0.159871   0.137763  -1.160   0.2583    
## cb          -0.030099   0.028817  -1.044   0.3076    
## ldrat       -0.040217   0.257718  -0.156   0.8774    
## lwt         -0.376975   0.196857  -1.915   0.0686 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1142 on 22 degrees of freedom
## Multiple R-squared:  0.8956, Adjusted R-squared:  0.8528 
## F-statistic: 20.96 on 9 and 22 DF,  p-value: 8.173e-09


Model Selection

All possible regressions model selection

\[ n~ \ln \left(\frac{RSS(M)}{n}\right) + \lambda |M| \rightarrow \min_M \]

  • \(\lambda_{AIC}=2\;\; (C_p = AIC + const)\)
  • \(\lambda_{BIC}=\ln n\;\; (\ln 32 = 3.47)\)
  • \(\lambda_{RIC}=2 \ln p\;\; (2 \ln 9=4.39)\)
      all.car<-regsubsets(lmpg ~ S + C + tt + G + ldisp + lhp + cb +
                ldrat + lwt, data=cars,method="exhaustive",nbest=1,nvmax=10)
      all.car.summary <- summary(all.car)
      rss=all.car.summary$rss
      r2=all.car.summary$rsq
      adjr2=all.car.summary$adjr2
      cp=all.car.summary$cp
      bic=all.car.summary$bic
      ric=n*log(rss/n)+2*log(9)*(2:10)
      print(round(cbind(all.car.summary$which[,-1],rss,r2,adjr2,cp,bic,ric),2))
##   S C tt G ldisp lhp cb ldrat lwt  rss   r2 adjr2    cp    bic     ric
## 1 0 0  0 0     1   0  0     0   0 0.49 0.82  0.81  9.77 -48.07 -124.76
## 2 0 0  0 0     0   1  0     0   1 0.32 0.88  0.87 -1.32 -58.22 -133.99
## 3 0 0  0 1     0   1  0     0   1 0.31 0.89  0.88 -0.26 -56.00 -130.83
## 4 0 0  0 1     0   1  1     0   1 0.30 0.89  0.88  0.77 -53.86 -127.76
## 5 0 0  0 1     1   1  1     0   1 0.29 0.89  0.87  2.32 -51.03 -124.01
## 6 0 0  1 1     1   1  1     0   1 0.29 0.89  0.87  4.14 -47.84 -119.88
## 7 1 0  1 1     1   1  1     0   1 0.29 0.90  0.86  6.03 -44.52 -115.64
## 8 1 0  1 1     1   1  1     1   1 0.29 0.90  0.86  8.01 -41.09 -111.28
## 9 1 1  1 1     1   1  1     1   1 0.29 0.90  0.85 10.00 -37.64 -106.90

AIC, BIC, RIC yield the same model \(lmpg \sim lhp + lwt\) since \(\lambda_{AIC}=2\), \(\lambda_{BIC}=3.47\), \(\lambda_{RIC}=4.31\) are quite close, typically, however, it is not the case

STEPWISE MODEL SELECTION (w.r.t. AIC)

Backward elimination

scale<-0
stepb.car <- step(llcar.lm, direction="backward", k=2,trace = F, scale = scale)
print(stepb.car$anova)
##      Step Df     Deviance Resid. Df Resid. Dev       AIC
## 1         NA           NA        22  0.2870540 -130.8423
## 2     - C  1 0.0000978017        23  0.2871518 -132.8314
## 3 - ldrat  1 0.0003045785        24  0.2874563 -134.7974
## 4     - S  1 0.0013817402        25  0.2888381 -136.6440
## 5    - tt  1 0.0024549162        26  0.2912930 -138.3732
## 6 - ldisp  1 0.0058514265        27  0.2971444 -139.7367
## 7    - cb  1 0.0125888363        28  0.3097333 -140.4090
## 8     - G  1 0.0122386222        29  0.3219719 -141.1689
print(coef(stepb.car))  
## (Intercept)         lhp         lwt 
##   8.7187582  -0.2553122  -0.5622804
print(anova(stepb.car,llcar.lm))
## Analysis of Variance Table
## 
## Model 1: lmpg ~ lhp + lwt
## Model 2: lmpg ~ S + C + tt + G + ldisp + lhp + cb + ldrat + lwt
##   Res.Df     RSS Df Sum of Sq      F Pr(>F)
## 1     29 0.32197                           
## 2     22 0.28705  7  0.034918 0.3823 0.9028
  • backward eliminartion w.r.t. BIC and RIC yield the same model

  • backward elimination w.r.t. \(R^2_{CV}\) and \(CV\) selects the model with \(lhp\), \(lwt\) and an additional \(G\) (non-significant)

Forward selection

init.lm <- lm(lmpg ~ 1., model = T)
stepf.car <- step(init.lm, scope = list(upper = update(llcar.lm,  ~ . +
        lhp * lwt)), direction = "forward", k=2, trace = F, scale = scale)
print(stepf.car$anova)
##      Step Df   Deviance Resid. Df Resid. Dev        AIC
## 1         NA         NA        31  2.7487363  -76.54703
## 2 + ldisp -1 2.25596261        30  0.4927736 -129.55012
## 3   + lwt -1 0.09882090        29  0.3939527 -134.71233
## 4   + lhp -1 0.07861123        28  0.3153415 -139.83472
print(coef(stepf.car))  
## (Intercept)       ldisp         lwt         lhp 
##  8.25381778 -0.07788716 -0.47883641 -0.21300339
print(anova(stepb.car,stepf.car,llcar.lm))
## Analysis of Variance Table
## 
## Model 1: lmpg ~ lhp + lwt
## Model 2: lmpg ~ ldisp + lwt + lhp
## Model 3: lmpg ~ S + C + tt + G + ldisp + lhp + cb + ldrat + lwt
##   Res.Df     RSS Df Sum of Sq      F Pr(>F)
## 1     29 0.32197                           
## 2     28 0.31534  1 0.0066304 0.5082 0.4834
## 3     22 0.28705  6 0.0282876 0.3613 0.8954

Stepwise

steps.car <- step(llcar.lm, scope = list(lower = init.lm, upper =  ~
        .^2.), direction = "both", k=2 , scale = scale
        , trace = F)
print(steps.car$anova)
##      Step Df     Deviance Resid. Df Resid. Dev       AIC
## 1         NA           NA        22  0.2870540 -130.8423
## 2     - C  1 0.0000978017        23  0.2871518 -132.8314
## 3 - ldrat  1 0.0003045785        24  0.2874563 -134.7974
## 4     - S  1 0.0013817402        25  0.2888381 -136.6440
## 5    - tt  1 0.0024549162        26  0.2912930 -138.3732
## 6 - ldisp  1 0.0058514265        27  0.2971444 -139.7367
## 7    - cb  1 0.0125888363        28  0.3097333 -140.4090
## 8     - G  1 0.0122386222        29  0.3219719 -141.1689

Check for interaction:

## Analysis of Variance Table
## 
## Model 1: lmpg ~ lhp + lwt + lhp:lwt
## Model 2: lmpg ~ lhp + lwt
##   Res.Df     RSS Df Sum of Sq      F Pr(>F)
## 1     28 0.31052                           
## 2     29 0.32197 -1 -0.011449 1.0324 0.3183

Comparing with the full main effects log-log model:

## Analysis of Variance Table
## 
## Model 1: lmpg ~ S + C + tt + G + ldisp + lhp + cb + ldrat + lwt
## Model 2: lmpg ~ lhp + lwt
##   Res.Df     RSS Df Sum of Sq      F Pr(>F)
## 1     22 0.28705                           
## 2     29 0.32197 -7 -0.034918 0.3823 0.9028

Selected model \(lmp \sim lhp + lwt\)

## 
## Call:
## lm(formula = lmpg ~ lhp + lwt, model = T)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.201439 -0.079566  0.002144  0.078778  0.196144 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.71876    0.53056  16.433 3.12e-16 ***
## lhp         -0.25531    0.05840  -4.372 0.000145 ***
## lwt         -0.56228    0.08741  -6.433 4.89e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1054 on 29 degrees of freedom
## Multiple R-squared:  0.8829, Adjusted R-squared:  0.8748 
## F-statistic: 109.3 on 2 and 29 DF,  p-value: 3.133e-14

All possible regressions model selection with the omitted influential observation No.29

##   S C tt G ldisp lhp cb ldrat lwt  rss   r2 adjr2    cp    bic     ric
## 1 0 0  0 0     1   0  0     0   0 0.49 0.82  0.81 10.35 -47.91 -124.65
## 2 0 0  0 0     0   1  0     0   1 0.32 0.88  0.87 -0.70 -57.75 -133.35
## 3 0 0  0 1     0   1  0     0   1 0.31 0.89  0.87  0.21 -55.69 -130.15
## 4 0 0  0 1     0   1  1     0   1 0.29 0.89  0.88  0.43 -54.66 -127.99
## 5 0 0  0 1     1   1  1     0   1 0.28 0.90  0.88  2.24 -51.48 -123.66
## 6 0 0  1 1     1   1  1     0   1 0.28 0.90  0.87  4.14 -48.16 -119.20
## 7 0 1  1 1     1   1  1     0   1 0.28 0.90  0.87  6.07 -44.79 -114.69
## 8 0 1  1 1     1   1  1     1   1 0.28 0.90  0.86  8.00 -41.43 -110.19
## 9 1 1  1 1     1   1  1     1   1 0.28 0.90  0.85 10.00 -37.96 -105.59
## 
## Call:
## lm(formula = lmpg ~ lhp + lwt, subset = (weights > 0), model = T)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.201459 -0.081833  0.002807  0.079750  0.196030 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.72082    0.54689  15.946 1.40e-15 ***
## lhp         -0.25475    0.06400  -3.980 0.000443 ***
## lwt         -0.56287    0.09235  -6.095 1.42e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1072 on 28 degrees of freedom
## Multiple R-squared:  0.8811, Adjusted R-squared:  0.8726 
## F-statistic: 103.8 on 2 and 28 DF,  p-value: 1.125e-13

Lasso

library(glmnet)
par(mfrow=c(1,2),pty="s") 
grid <- seq(0,.5,.01)
lcar.lasso0 <- glmnet(model.matrix(llcar.lm),lmpg,alpha=1,lambda=grid)
plot(lcar.lasso0,xvar=c("lambda"),label=TRUE)
lcar.lasso.cv <- cv.glmnet(model.matrix(llcar.lm),lmpg,alpha=1,nfolds=n)
plot(lcar.lasso.cv)

lambda.cv <- lcar.lasso.cv$lambda.min
cat("lambda.cv =", lambda.cv)
## lambda.cv = 0.00403561
lcar.lasso <- glmnet(model.matrix(llcar.lm)  ,lmpg,alpha=1,lambda=lambda.cv)
print(round(predict(lcar.lasso,s=lambda.cv,type="coefficients"),5))
## 11 x 1 sparse Matrix of class "dgCMatrix"
##                    1
## (Intercept)  7.16696
## (Intercept)  .      
## S1           .      
## C           -0.00263
## tt1          .      
## G            0.04739
## ldisp       -0.08362
## lhp         -0.16806
## cb          -0.02274
## ldrat        .      
## lwt         -0.37882
## RSS = 0.2932 CV = 0.01385

Principal Component Regression (PCR)

Warning: there are categorical predictors

par(mfrow=c(1,2),pty="s")
library(pls)
lcar.pcr <- pcr(lmpg~S + C + tt + G + ldisp + lhp + cb + ldrat + lwt,scale=T,validation="LOO")
print(lcar.pcr$projection)
##           Comp 1      Comp 2       Comp 3      Comp 4      Comp 5      Comp 6
## S1     0.3166273 -0.30315646  0.605388711 -0.26992448  0.22975193 -0.35255225
## C     -0.3912619  0.14209302 -0.213791275  0.14626963  0.08183272 -0.36625123
## tt1    0.2863730  0.42793351 -0.407183105 -0.23809944 -0.09496297  0.02359051
## G      0.2529018  0.50344461  0.204970864 -0.35634707  0.27237791  0.42176496
## ldisp -0.4078351  0.01180518  0.001932914 -0.02438190  0.39554097  0.26004983
## lhp   -0.3629158  0.27883800  0.056588078 -0.20693741  0.52703665 -0.34189282
## cb    -0.2004069  0.54295449  0.467777646  0.04204467 -0.53795773 -0.27020756
## ldrat  0.3373532  0.27320098  0.141471649  0.81255710  0.34593748 -0.01232466
## lwt   -0.3855969 -0.06319670  0.371416967  0.12898459 -0.11839857  0.55267000
##            Comp 7      Comp 8       Comp 9
## S1    -0.30222586  0.29227356  0.121017574
## C      0.09073203  0.77572345 -0.093842587
## tt1   -0.69682301  0.11005092  0.071596113
## G      0.42474656  0.26813529 -0.095949687
## ldisp -0.12132562 -0.01426167  0.770669253
## lhp   -0.13098775 -0.42798734 -0.386687082
## cb     0.03436946 -0.12339787  0.256193034
## ldrat -0.10412808 -0.02309961  0.009481688
## lwt   -0.43652454  0.17229152 -0.391194581
print(summary(lcar.pcr))
## Data:    X dimension: 32 9 
##  Y dimension: 32 1
## Fit method: svdpc
## Number of components considered: 9
## 
## VALIDATION: RMSEP
## Cross-validated using 32 leave-one-out segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV          0.3025   0.1235   0.1215   0.1156   0.1180   0.1200   0.1256
## adjCV       0.3025   0.1233   0.1213   0.1154   0.1177   0.1198   0.1262
##        7 comps  8 comps  9 comps
## CV      0.1165   0.1235   0.1323
## adjCV   0.1162   0.1230   0.1317
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       63.02    85.75    91.23    94.11    95.92    97.48    98.88    99.64
## lmpg    84.72    85.60    87.50    87.58    87.62    87.65    89.51    89.56
##       9 comps
## X      100.00
## lmpg    89.56
## NULL

% variance explained for \(X\)\(\frac{\sum_{j=1}^k \lambda_{(j)}}{\sum_{j=1}^d \lambda_j},\;\;\;\lambda_{(1)} \geq \ldots \geq \lambda_{(d)}\) are the (ordered) eigenvalues of \(X^t X\)

% variance explained for \(lmpg\) – the percentage of \(Var(lmpg)\) explained by regression (\(R^2\))

## 3 components:  RSS = 0.34364 CV = 0.01337

Partial Least Squares (PLS)

Warning: there are categorical predictors

par(mfrow=c(1,2),pty="s") 
lcar.plsr <- plsr(lmpg~S + C + tt + G + ldisp + lhp + cb +ldrat + lwt,scale=T, validation="LOO")
print(lcar.plsr$projection)
##           Comp 1     Comp 2     Comp 3      Comp 4       Comp 5      Comp 6
## S1     0.3035181 -0.2771319 -0.8026121 -0.08861918 -0.223664641  0.36588940
## C     -0.3856215  0.1173799  0.3676101  0.16924263  0.437639763  0.68402837
## tt1    0.2641717 -0.2235442  0.1010636 -0.74563223 -0.400270009  0.14402826
## G      0.2226716 -0.3000313  0.2490315  0.48377486 -0.031443983  0.42416012
## ldisp -0.4120024 -0.0767850 -0.1422403 -0.20994859  0.028785864  0.23732848
## lhp   -0.3847449 -0.2681456 -0.0936826 -0.18944497 -0.313661459 -0.29480380
## cb    -0.2504997 -0.6102716 -0.2216103  0.21758542 -0.097868442 -0.30809423
## ldrat  0.3043815 -0.4202635 -0.3264218 -0.27472449  0.724045780 -0.01612839
## lwt   -0.4082041 -0.3824312 -0.7302115 -0.43399006  0.003465911  0.16399068
##           Comp 7      Comp 8      Comp 9
## S1     0.1612950  0.27959865 -0.14348083
## C      0.5052341  0.03791378 -0.40918151
## tt1    0.2787113  0.06191893  0.05071435
## G     -0.2335079 -0.24137974  0.08674533
## ldisp -0.1896020  0.65998337  0.96401193
## lhp   -0.6777704 -0.13071084 -0.65520864
## cb     0.4238469  0.27517154  0.22456288
## ldrat -0.2068165  0.06080840 -0.04318831
## lwt    0.1108419 -0.58604931 -0.16550314
print(summary(lcar.plsr))
## Data:    X dimension: 32 9 
##  Y dimension: 32 1
## Fit method: kernelpls
## Number of components considered: 9
## 
## VALIDATION: RMSEP
## Cross-validated using 32 leave-one-out segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV          0.3025   0.1207   0.1188   0.1179   0.1230   0.1256   0.1300
## adjCV       0.3025   0.1206   0.1186   0.1176   0.1225   0.1251   0.1294
##        7 comps  8 comps  9 comps
## CV      0.1320   0.1323   0.1323
## adjCV   0.1314   0.1317   0.1317
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       62.95    81.90    90.74    92.72    95.41    96.51    98.05    99.00
## lmpg    85.65    87.28    88.73    89.54    89.55    89.56    89.56    89.56
##       9 comps
## X      100.00
## lmpg    89.56
## NULL

% variance explained for \(X\)\(\frac{\sum_{j=1}^k A_{(jj)}}{\sum_{j=1}^d A_{jj}}\), where \(A=W^t X^t X W\) is diagonal

% variance explained for \(lmpg\) – the percentage of \(Var(lmpg)\) explained by regression (\(R^2\))

## 3 components:  RSS = 0.30988 CV = 0.01389

Summary

Model Size RSS CV
main effects log-log 9 0.2871 0.0171
lwt+lhp (AIC, BIC, RIC) 2 0.3220 0.0125
Lasso 6 0.2932 0.0139
PCR 3 0.3436 0.0134
PLR 3 0.3099 0.0139

Multicollinearity

## Correlation matrix
##            ldisp        lwt        lhp      ldrat
## ldisp  1.0000000  0.8992112  0.8617723 -0.7550233
## lwt    0.8992112  1.0000000  0.7314468 -0.7223649
## lhp    0.8617723  0.7314468  1.0000000 -0.5455394
## ldrat -0.7550233 -0.7223649 -0.5455394  1.0000000
## 
## Correlation coefficients:
##     ldisp       lwt       lhp     ldrat 
## 0.9182041 0.8176276 0.7741215 0.6165815