Data for fuel consumption in miles per gallon (mpg) and quarter-mile acceleration time in seconds (qmt) for 32 cars tested by the US Motor Trend magazine in 1974.
9 explanatgory variables:
S – shape of engine (0 - vee, 1 - stright)
C – number of cylinders
tt – transmission type (0 - automatic, 1 - manual)
G – number of gears
disp – engine displacement in cubic inches
hp – horse power
cb – number of carburator barrels
drat – final drive ratio
wt – weight of the car in pounds
##
## Call:
## lm(formula = mpg ~ S + C + tt + G + disp + hp + cb + drat + wt,
## model = T)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8657 -1.5916 -0.3314 1.5457 5.2595
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.239321 10.333701 2.636 0.0151 *
## S1 1.137852 2.075795 0.548 0.5891
## C -0.332053 0.851167 -0.390 0.7002
## tt1 1.911268 1.987005 0.962 0.3466
## G 0.582256 1.464187 0.398 0.6947
## disp 0.005984 0.016243 0.368 0.7161
## hp -0.023901 0.022053 -1.084 0.2902
## cb -0.499067 0.776557 -0.643 0.5271
## drat 0.845847 1.591046 0.532 0.6003
## wt -0.002502 0.001579 -1.584 0.1274
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.668 on 22 degrees of freedom
## Multiple R-squared: 0.8609, Adjusted R-squared: 0.8041
## F-statistic: 15.13 on 9 and 22 DF, p-value: 1.683e-07
It seems as if all predictors are non-significant but the overall \(p-value=1.683e-07~\)! There are strong correlations among them.
##
## Call:
## lm(formula = mpg ~ S + C + tt + G + ldisp + lhp + cb + ldrat +
## lwt, model = T)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0309 -1.4183 -0.4539 1.1940 4.2639
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 106.23889 30.17546 3.521 0.00193 **
## S1 0.01556 1.78809 0.009 0.99314
## C 0.13862 0.73040 0.190 0.85122
## tt1 0.28994 1.84047 0.158 0.87626
## G 1.02161 1.28165 0.797 0.43391
## ldisp -1.88990 3.14171 -0.602 0.55362
## lhp -3.85887 2.78532 -1.385 0.17981
## cb -0.53273 0.58263 -0.914 0.37045
## ldrat -0.25388 5.21057 -0.049 0.96158
## lwt -7.50259 3.98008 -1.885 0.07270 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.309 on 22 degrees of freedom
## Multiple R-squared: 0.8958, Adjusted R-squared: 0.8532
## F-statistic: 21.01 on 9 and 22 DF, p-value: 7.987e-09
##
## Call:
## lm(formula = lmpg ~ S + C + tt + G + ldisp + lhp + cb + ldrat +
## lwt, model = T)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.184897 -0.056304 -0.009658 0.066993 0.208061
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.274026 1.492494 4.874 7.16e-05 ***
## S1 -0.028731 0.088440 -0.325 0.7484
## C -0.003128 0.036126 -0.087 0.9318
## tt1 -0.046882 0.091031 -0.515 0.6117
## G 0.079189 0.063391 1.249 0.2247
## ldisp -0.116304 0.155390 -0.748 0.4621
## lhp -0.159871 0.137763 -1.160 0.2583
## cb -0.030099 0.028817 -1.044 0.3076
## ldrat -0.040217 0.257718 -0.156 0.8774
## lwt -0.376975 0.196857 -1.915 0.0686 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1142 on 22 degrees of freedom
## Multiple R-squared: 0.8956, Adjusted R-squared: 0.8528
## F-statistic: 20.96 on 9 and 22 DF, p-value: 8.173e-09
\[ n~ \ln \left(\frac{RSS(M)}{n}\right) + \lambda |M| \rightarrow \min_M \]
all.car<-regsubsets(lmpg ~ S + C + tt + G + ldisp + lhp + cb +
ldrat + lwt, data=cars,method="exhaustive",nbest=1,nvmax=10)
all.car.summary <- summary(all.car)
rss=all.car.summary$rss
r2=all.car.summary$rsq
adjr2=all.car.summary$adjr2
cp=all.car.summary$cp
bic=all.car.summary$bic
ric=n*log(rss/n)+2*log(9)*(2:10)
print(round(cbind(all.car.summary$which[,-1],rss,r2,adjr2,cp,bic,ric),2))
## S C tt G ldisp lhp cb ldrat lwt rss r2 adjr2 cp bic ric
## 1 0 0 0 0 1 0 0 0 0 0.49 0.82 0.81 9.77 -48.07 -124.76
## 2 0 0 0 0 0 1 0 0 1 0.32 0.88 0.87 -1.32 -58.22 -133.99
## 3 0 0 0 1 0 1 0 0 1 0.31 0.89 0.88 -0.26 -56.00 -130.83
## 4 0 0 0 1 0 1 1 0 1 0.30 0.89 0.88 0.77 -53.86 -127.76
## 5 0 0 0 1 1 1 1 0 1 0.29 0.89 0.87 2.32 -51.03 -124.01
## 6 0 0 1 1 1 1 1 0 1 0.29 0.89 0.87 4.14 -47.84 -119.88
## 7 1 0 1 1 1 1 1 0 1 0.29 0.90 0.86 6.03 -44.52 -115.64
## 8 1 0 1 1 1 1 1 1 1 0.29 0.90 0.86 8.01 -41.09 -111.28
## 9 1 1 1 1 1 1 1 1 1 0.29 0.90 0.85 10.00 -37.64 -106.90
AIC, BIC, RIC yield the same model \(lmpg \sim lhp + lwt\) since \(\lambda_{AIC}=2\), \(\lambda_{BIC}=3.47\), \(\lambda_{RIC}=4.31\) are quite close, typically, however, it is not the case
scale<-0
stepb.car <- step(llcar.lm, direction="backward", k=2,trace = F, scale = scale)
print(stepb.car$anova)
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 NA NA 22 0.2870540 -130.8423
## 2 - C 1 0.0000978017 23 0.2871518 -132.8314
## 3 - ldrat 1 0.0003045785 24 0.2874563 -134.7974
## 4 - S 1 0.0013817402 25 0.2888381 -136.6440
## 5 - tt 1 0.0024549162 26 0.2912930 -138.3732
## 6 - ldisp 1 0.0058514265 27 0.2971444 -139.7367
## 7 - cb 1 0.0125888363 28 0.3097333 -140.4090
## 8 - G 1 0.0122386222 29 0.3219719 -141.1689
print(coef(stepb.car))
## (Intercept) lhp lwt
## 8.7187582 -0.2553122 -0.5622804
print(anova(stepb.car,llcar.lm))
## Analysis of Variance Table
##
## Model 1: lmpg ~ lhp + lwt
## Model 2: lmpg ~ S + C + tt + G + ldisp + lhp + cb + ldrat + lwt
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 29 0.32197
## 2 22 0.28705 7 0.034918 0.3823 0.9028
backward eliminartion w.r.t. BIC and RIC yield the same model
backward elimination w.r.t. \(R^2_{CV}\) and \(CV\) selects the model with \(lhp\), \(lwt\) and an additional \(G\) (non-significant)
init.lm <- lm(lmpg ~ 1., model = T)
stepf.car <- step(init.lm, scope = list(upper = update(llcar.lm, ~ . +
lhp * lwt)), direction = "forward", k=2, trace = F, scale = scale)
print(stepf.car$anova)
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 NA NA 31 2.7487363 -76.54703
## 2 + ldisp -1 2.25596261 30 0.4927736 -129.55012
## 3 + lwt -1 0.09882090 29 0.3939527 -134.71233
## 4 + lhp -1 0.07861123 28 0.3153415 -139.83472
print(coef(stepf.car))
## (Intercept) ldisp lwt lhp
## 8.25381778 -0.07788716 -0.47883641 -0.21300339
print(anova(stepb.car,stepf.car,llcar.lm))
## Analysis of Variance Table
##
## Model 1: lmpg ~ lhp + lwt
## Model 2: lmpg ~ ldisp + lwt + lhp
## Model 3: lmpg ~ S + C + tt + G + ldisp + lhp + cb + ldrat + lwt
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 29 0.32197
## 2 28 0.31534 1 0.0066304 0.5082 0.4834
## 3 22 0.28705 6 0.0282876 0.3613 0.8954
steps.car <- step(llcar.lm, scope = list(lower = init.lm, upper = ~
.^2.), direction = "both", k=2 , scale = scale
, trace = F)
print(steps.car$anova)
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 NA NA 22 0.2870540 -130.8423
## 2 - C 1 0.0000978017 23 0.2871518 -132.8314
## 3 - ldrat 1 0.0003045785 24 0.2874563 -134.7974
## 4 - S 1 0.0013817402 25 0.2888381 -136.6440
## 5 - tt 1 0.0024549162 26 0.2912930 -138.3732
## 6 - ldisp 1 0.0058514265 27 0.2971444 -139.7367
## 7 - cb 1 0.0125888363 28 0.3097333 -140.4090
## 8 - G 1 0.0122386222 29 0.3219719 -141.1689
Check for interaction:
## Analysis of Variance Table
##
## Model 1: lmpg ~ lhp + lwt + lhp:lwt
## Model 2: lmpg ~ lhp + lwt
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 28 0.31052
## 2 29 0.32197 -1 -0.011449 1.0324 0.3183
Comparing with the full main effects log-log model:
## Analysis of Variance Table
##
## Model 1: lmpg ~ S + C + tt + G + ldisp + lhp + cb + ldrat + lwt
## Model 2: lmpg ~ lhp + lwt
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 22 0.28705
## 2 29 0.32197 -7 -0.034918 0.3823 0.9028
##
## Call:
## lm(formula = lmpg ~ lhp + lwt, model = T)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.201439 -0.079566 0.002144 0.078778 0.196144
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.71876 0.53056 16.433 3.12e-16 ***
## lhp -0.25531 0.05840 -4.372 0.000145 ***
## lwt -0.56228 0.08741 -6.433 4.89e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1054 on 29 degrees of freedom
## Multiple R-squared: 0.8829, Adjusted R-squared: 0.8748
## F-statistic: 109.3 on 2 and 29 DF, p-value: 3.133e-14
## S C tt G ldisp lhp cb ldrat lwt rss r2 adjr2 cp bic ric
## 1 0 0 0 0 1 0 0 0 0 0.49 0.82 0.81 10.35 -47.91 -124.65
## 2 0 0 0 0 0 1 0 0 1 0.32 0.88 0.87 -0.70 -57.75 -133.35
## 3 0 0 0 1 0 1 0 0 1 0.31 0.89 0.87 0.21 -55.69 -130.15
## 4 0 0 0 1 0 1 1 0 1 0.29 0.89 0.88 0.43 -54.66 -127.99
## 5 0 0 0 1 1 1 1 0 1 0.28 0.90 0.88 2.24 -51.48 -123.66
## 6 0 0 1 1 1 1 1 0 1 0.28 0.90 0.87 4.14 -48.16 -119.20
## 7 0 1 1 1 1 1 1 0 1 0.28 0.90 0.87 6.07 -44.79 -114.69
## 8 0 1 1 1 1 1 1 1 1 0.28 0.90 0.86 8.00 -41.43 -110.19
## 9 1 1 1 1 1 1 1 1 1 0.28 0.90 0.85 10.00 -37.96 -105.59
##
## Call:
## lm(formula = lmpg ~ lhp + lwt, subset = (weights > 0), model = T)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.201459 -0.081833 0.002807 0.079750 0.196030
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.72082 0.54689 15.946 1.40e-15 ***
## lhp -0.25475 0.06400 -3.980 0.000443 ***
## lwt -0.56287 0.09235 -6.095 1.42e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1072 on 28 degrees of freedom
## Multiple R-squared: 0.8811, Adjusted R-squared: 0.8726
## F-statistic: 103.8 on 2 and 28 DF, p-value: 1.125e-13
library(glmnet)
par(mfrow=c(1,2),pty="s")
grid <- seq(0,.5,.01)
lcar.lasso0 <- glmnet(model.matrix(llcar.lm),lmpg,alpha=1,lambda=grid)
plot(lcar.lasso0,xvar=c("lambda"),label=TRUE)
lcar.lasso.cv <- cv.glmnet(model.matrix(llcar.lm),lmpg,alpha=1,nfolds=n)
plot(lcar.lasso.cv)
lambda.cv <- lcar.lasso.cv$lambda.min
cat("lambda.cv =", lambda.cv)
## lambda.cv = 0.00403561
lcar.lasso <- glmnet(model.matrix(llcar.lm) ,lmpg,alpha=1,lambda=lambda.cv)
print(round(predict(lcar.lasso,s=lambda.cv,type="coefficients"),5))
## 11 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) 7.16696
## (Intercept) .
## S1 .
## C -0.00263
## tt1 .
## G 0.04739
## ldisp -0.08362
## lhp -0.16806
## cb -0.02274
## ldrat .
## lwt -0.37882
## RSS = 0.2932 CV = 0.01385
par(mfrow=c(1,2),pty="s")
library(pls)
lcar.pcr <- pcr(lmpg~S + C + tt + G + ldisp + lhp + cb + ldrat + lwt,scale=T,validation="LOO")
print(lcar.pcr$projection)
## Comp 1 Comp 2 Comp 3 Comp 4 Comp 5 Comp 6
## S1 0.3166273 -0.30315646 0.605388711 -0.26992448 0.22975193 -0.35255225
## C -0.3912619 0.14209302 -0.213791275 0.14626963 0.08183272 -0.36625123
## tt1 0.2863730 0.42793351 -0.407183105 -0.23809944 -0.09496297 0.02359051
## G 0.2529018 0.50344461 0.204970864 -0.35634707 0.27237791 0.42176496
## ldisp -0.4078351 0.01180518 0.001932914 -0.02438190 0.39554097 0.26004983
## lhp -0.3629158 0.27883800 0.056588078 -0.20693741 0.52703665 -0.34189282
## cb -0.2004069 0.54295449 0.467777646 0.04204467 -0.53795773 -0.27020756
## ldrat 0.3373532 0.27320098 0.141471649 0.81255710 0.34593748 -0.01232466
## lwt -0.3855969 -0.06319670 0.371416967 0.12898459 -0.11839857 0.55267000
## Comp 7 Comp 8 Comp 9
## S1 -0.30222586 0.29227356 0.121017574
## C 0.09073203 0.77572345 -0.093842587
## tt1 -0.69682301 0.11005092 0.071596113
## G 0.42474656 0.26813529 -0.095949687
## ldisp -0.12132562 -0.01426167 0.770669253
## lhp -0.13098775 -0.42798734 -0.386687082
## cb 0.03436946 -0.12339787 0.256193034
## ldrat -0.10412808 -0.02309961 0.009481688
## lwt -0.43652454 0.17229152 -0.391194581
print(summary(lcar.pcr))
## Data: X dimension: 32 9
## Y dimension: 32 1
## Fit method: svdpc
## Number of components considered: 9
##
## VALIDATION: RMSEP
## Cross-validated using 32 leave-one-out segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 0.3025 0.1235 0.1215 0.1156 0.1180 0.1200 0.1256
## adjCV 0.3025 0.1233 0.1213 0.1154 0.1177 0.1198 0.1262
## 7 comps 8 comps 9 comps
## CV 0.1165 0.1235 0.1323
## adjCV 0.1162 0.1230 0.1317
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 63.02 85.75 91.23 94.11 95.92 97.48 98.88 99.64
## lmpg 84.72 85.60 87.50 87.58 87.62 87.65 89.51 89.56
## 9 comps
## X 100.00
## lmpg 89.56
## NULL
% variance explained for \(X\) – \(\frac{\sum_{j=1}^k \lambda_{(j)}}{\sum_{j=1}^d \lambda_j},\;\;\;\lambda_{(1)} \geq \ldots \geq \lambda_{(d)}\) are the (ordered) eigenvalues of \(X^t X\)
% variance explained for \(lmpg\) – the percentage of \(Var(lmpg)\) explained by regression (\(R^2\))
## 3 components: RSS = 0.34364 CV = 0.01337
par(mfrow=c(1,2),pty="s")
lcar.plsr <- plsr(lmpg~S + C + tt + G + ldisp + lhp + cb +ldrat + lwt,scale=T, validation="LOO")
print(lcar.plsr$projection)
## Comp 1 Comp 2 Comp 3 Comp 4 Comp 5 Comp 6
## S1 0.3035181 -0.2771319 -0.8026121 -0.08861918 -0.223664641 0.36588940
## C -0.3856215 0.1173799 0.3676101 0.16924263 0.437639763 0.68402837
## tt1 0.2641717 -0.2235442 0.1010636 -0.74563223 -0.400270009 0.14402826
## G 0.2226716 -0.3000313 0.2490315 0.48377486 -0.031443983 0.42416012
## ldisp -0.4120024 -0.0767850 -0.1422403 -0.20994859 0.028785864 0.23732848
## lhp -0.3847449 -0.2681456 -0.0936826 -0.18944497 -0.313661459 -0.29480380
## cb -0.2504997 -0.6102716 -0.2216103 0.21758542 -0.097868442 -0.30809423
## ldrat 0.3043815 -0.4202635 -0.3264218 -0.27472449 0.724045780 -0.01612839
## lwt -0.4082041 -0.3824312 -0.7302115 -0.43399006 0.003465911 0.16399068
## Comp 7 Comp 8 Comp 9
## S1 0.1612950 0.27959865 -0.14348083
## C 0.5052341 0.03791378 -0.40918151
## tt1 0.2787113 0.06191893 0.05071435
## G -0.2335079 -0.24137974 0.08674533
## ldisp -0.1896020 0.65998337 0.96401193
## lhp -0.6777704 -0.13071084 -0.65520864
## cb 0.4238469 0.27517154 0.22456288
## ldrat -0.2068165 0.06080840 -0.04318831
## lwt 0.1108419 -0.58604931 -0.16550314
print(summary(lcar.plsr))
## Data: X dimension: 32 9
## Y dimension: 32 1
## Fit method: kernelpls
## Number of components considered: 9
##
## VALIDATION: RMSEP
## Cross-validated using 32 leave-one-out segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 0.3025 0.1207 0.1188 0.1179 0.1230 0.1256 0.1300
## adjCV 0.3025 0.1206 0.1186 0.1176 0.1225 0.1251 0.1294
## 7 comps 8 comps 9 comps
## CV 0.1320 0.1323 0.1323
## adjCV 0.1314 0.1317 0.1317
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 62.95 81.90 90.74 92.72 95.41 96.51 98.05 99.00
## lmpg 85.65 87.28 88.73 89.54 89.55 89.56 89.56 89.56
## 9 comps
## X 100.00
## lmpg 89.56
## NULL
% variance explained for \(X\) – \(\frac{\sum_{j=1}^k A_{(jj)}}{\sum_{j=1}^d A_{jj}}\), where \(A=W^t X^t X W\) is diagonal
% variance explained for \(lmpg\) – the percentage of \(Var(lmpg)\) explained by regression (\(R^2\))
## 3 components: RSS = 0.30988 CV = 0.01389
Model | Size | RSS | CV |
---|---|---|---|
main effects log-log | 9 | 0.2871 | 0.0171 |
lwt+lhp (AIC, BIC, RIC) | 2 | 0.3220 | 0.0125 |
Lasso | 6 | 0.2932 | 0.0139 |
PCR | 3 | 0.3436 | 0.0134 |
PLR | 3 | 0.3099 | 0.0139 |
## Correlation matrix
## ldisp lwt lhp ldrat
## ldisp 1.0000000 0.8992112 0.8617723 -0.7550233
## lwt 0.8992112 1.0000000 0.7314468 -0.7223649
## lhp 0.8617723 0.7314468 1.0000000 -0.5455394
## ldrat -0.7550233 -0.7223649 -0.5455394 1.0000000
##
## Correlation coefficients:
## ldisp lwt lhp ldrat
## 0.9182041 0.8176276 0.7741215 0.6165815