subject | - | an identification code; there are several observations for each subject, but because the girls were hospitalized at different ages, the number of observations, and the age at the last observation, vary |
age | - | the subject's age in years at the time of observation; all but the last observation for each subject were collected retrospectively at intervals of two years, starting at 8. |
exercise | - | the amount of exercise in which the subject engaged, expessed as estimated hours per week |
group | - | a factor indicating whether the subject is "patient" or "control" |
lcavol | - | log(cancer volume) |
lweight | - | log(prostate weight) |
age | - | age |
lbph | - | log(benign prostatic hyperplasia amount) |
svi | - | seminal vesicle invasion |
lcp | - | log(capsular penetration) |
gleason | - | Gleason score |
pgg45 | - | percentage Gleason scores 4 or 5 |
The goal is to predict the log of PCA (lpsa) from these measurements.
Years of smoking | Cigs/day | ||||||
1-9 | 10-14 | 15-19 | 20-24 | 25-34 | 35+ | ||
15-19 | 0/3121 | 0/3577 | 0/4317 | 0/5683 | 0/3042 | 0/670 | |
20-24 | 0/2937 | 1/3286 | 0/4214 | 1/6385 | 1/4050 | 0/1166 | |
25-29 | 0/2288 | 1/2546 | 0/3185 | 1/5483 | 4/4290 | 0/1482 | |
30-34 | 0/2015 | 2/2219 | 4/2560 | 6/4687 | 9/4268 | 4/1580 | |
35-39 | 1/1648 | 0/1826 | 0/1893 | 5/3646 | 9/3529 | 6/1336 | |
40-44 | 2/1310 | 1/1886 | 2/1334 | 12/2411 | 11/2424 | 10/924 | |
45-49 | 0/927 | 2/988 | 2/849 | 9/1567 | 10/1409 | 7/556 | |
50-54 | 3/710 | 4/684 | 2/470 | 7/857 | 5/663 | 4/255 | |
55-59 | 0/606 | 3/449 | 5/280 | 7/416 | 3/284 | 1/104 |