forked from leetschau/ISLNotes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathch8lab.Rmd
192 lines (159 loc) · 5.65 KB
/
ch8lab.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
---
title: "Lab of Chapter 8"
output: html_notebook
---
# 8.3.1 Fitting Classification Trees
```{r}
library(tree)
library(ISLR)
High <- ifelse(Carseats$Sales > 8, 'Yes', 'No')
Carseats <- data.frame(Carseats, High)
```
Fit a classification tree:
```{r}
tree.carseats <- tree(High ~ . - Sales, data = Carseats)
summary(tree.carseats)
plot(tree.carseats)
text(tree.carseats, pretty = 0)
tree.carseats
```
Step 1 of algorithm 8.1: split the observations into training and test data set, create the tree $T_0$ with function `tree()` and calculate test MSE of $T_0$:
```{r}
set.seed(2)
train <- sample(1: nrow(Carseats), nrow(Carseats) / 2)
Carseats.test <- Carseats[-train, ]
High.test <- High[-train]
tree.carseats <- tree(High ~ . - Sales, data = Carseats, subset = train)
tree.pred <- predict(tree.carseats, Carseats.test, type = 'class')
table(tree.pred, High.test)
```
The correct prediction rate:
```{r}
(86 + 57) / (86 + 57 + 30 + 27)
```
Step 2 and 3 of algorithem 8.1: create a series of subtrees of $T_0$ according to $\alpha$:
```{r}
set.seed(3)
cv.carseats <- cv.tree(tree.carseats, FUN = prune.misclass)
names(cv.carseats)
cv.carseats
```
Find the best node number whose prediction error is minimum:
```{r}
best.node.no <- cv.carseats$size[which.min(cv.carseats$dev)]
```
Plot the error rate as a function of both *size* and *k*:
```{r}
par(mfrow=c(1,2))
plot(cv.carseats$size ,cv.carseats$dev ,type="b")
plot(cv.carseats$k ,cv.carseats$dev ,type="b")
```
Here the y-axis *dev* corresponds to the cross-validation error rate.
Step 4 of algorithm 8.1: prune the tree to obtain the nine-node tree:
```{r}
prune.carseats <- prune.misclass(tree.carseats, best = best.node.no)
plot(prune.carseats)
text(prune.carseats, pretty = 0)
```
Calculate the correct test rate of the pruned tree:
```{r}
tree.pred <- predict(prune.carseats, newdata = Carseats.test, type = 'class')
table(tree.pred, High.test)
(94 + 60) / (94 + 60 + 22 + 24)
```
# 8.3.2 Fitting Regression Trees
Fit a regression tree on the *Boston* data set:
```{r}
library(MASS)
set.seed(1)
train <- sample(1: nrow(Boston), nrow(Boston) / 2)
tree.boston <- tree(medv ~ ., data = Boston, subset = train)
summary(tree.boston)
plot(tree.boston)
text(tree.boston, pretty = 0)
```
The tree predicts a median house price of \$46,380 for larger homes ($rm \ge 7.437$, more than 7 rooms per dwelling) in suburbs in which residents have high socioeconomic status ($lstat \lt 9.715$, percent of low status of the population(低端人口?)is less than 9.715%).
Notice that there are only 3 predictors have been used in constructing the tree.
While there are at least 6 predictors are strongly related with *medv* according to linear regression:
```{r}
summary(lm(medv ~ ., data = Boston, subset = train))
```
Fit with pruned tree:
```{r}
cv.boston <- cv.tree(tree.boston)
plot(cv.boston$size, cv.boston$dev, type = 'b')
prune.boston <- prune.tree(tree.boston, best = 5)
plot(prune.boston)
text(prune.boston, pretty = 0)
```
Compare predictions and actual data on the test data set and calculate the MSE:
```{r}
yhat <- predict(tree.boston, newdata = Boston[-train, ])
boston.test <- Boston[-train, 'medv']
plot(yhat, boston.test)
abline(0, 1)
mean((yhat - boston.test) ^ 2)
```
`abline(0, 1)` here means a line with intercept = 0 and slope = 1, which is $y = x$.
# 8.3.3 Bagging and Random Forests
Fit with bagging method (all predictors, in other words all columns but the response variable, are used at each split: `mtry = ncol(Boston) - 1`):
```{r}
library(randomForest)
set.seed(1)
bag.boston <- randomForest(medv ~ ., data = Boston, subset = train, mtry = ncol(Boston) - 1, importance = TRUE)
bag.boston
```
Test MSE of the bagging method:
```{r}
yhat.bag <- predict(bag.boston, newdata = Boston[-train, ])
plot(yhat.bag, boston.test)
abline(0, 1)
mean((yhat.bag - boston.test) ^ 2)
```
Bagging with more trees in the bag:
```{r}
bag.boston <- randomForest(medv ~ ., data = Boston, subset = train, mtry = ncol(Boston) - 1, ntree = 25)
yhat.bag <- predict(bag.boston, newdata = Boston[-train, ])
mean((yhat.bag - boston.test) ^ 2)
```
Fit with random forest (`mtry = 6`):
```{r}
set.seed(1)
rf.boston <- randomForest(medv ~ ., data = Boston, subset = train, mtry = 6, importance = TRUE)
yhat.rf <- predict(rf.boston, newdata = Boston[-train, ])
mean((yhat.rf - boston.test) ^ 2)
```
So test MSE of random forest (11.66) and bagging (13.5) are both much less than that of single tree (25.05) in above section.
Report the importance of each predictor:
```{r}
importance(rf.boston)
varImpPlot(rf.boston)
```
The most import 3 predictor (*rm*, *lstat* and *dis*) are the same with the output of `summary(tree.boston)`.
# 8.3.4 Boosting
Fit with boosted regression trees:
```{r}
library(gbm)
set.seed(1)
boost.boston <- gbm(medv ~ ., data = Boston[train, ], distribution = 'gaussian', n.trees = 5000, interaction.depth = 4)
summary(boost.boston)
```
Plot house price with 2 most important predictors:
```{r}
par(mfrow = c(1,2))
plot(boost.boston, i = 'rm')
plot(boost.boston, i = 'lstat')
```
Test MSE of the boosted model:
```{r}
yhat.boost <- predict(boost.boston, newdata = Boston[-train, ], n.trees = 5000)
mean((yhat.boost - boston.test) ^ 2)
```
This is similar to random forest (11.66), and better than bagging (13.5).
Fit with a different $\lambda$ (see equation (8.10) of algorithm 8.2 on page 322):
```{r}
boost.boston <- gbm(medv ~ ., data = Boston[train, ], distribution = 'gaussian', n.trees = 5000, interaction.depth = 4, shrinkage = 0.2, verbose = FALSE)
yhat.boost <- predict(boost.boston, newdata = Boston[-train, ], n.trees = 5000)
mean((yhat.boost - boston.test) ^ 2)
```
So in this case $\lambda = 0.2$ leads to a slightly lower test MSE than the default $\lambda$ (0.001).