- Yoav Benjamini
- Department
of Statistics and Operations Research
- 972-3-6408756 (voice)
- 972-3-6409357 (fax)
- ybenja@post.tau.ac.il
- Office Hours: Tuesday 1500-1600.
Lecture Hours:
Wednesday 16-19 Schrieber 08
Important Announcement
NONE
Books:
- Inteligent Data Analysis - an Introduction
by Berthold and Hand (HMT2).
- Data Analysis and Regression
(The green book) by Mosteller and Tukey (The green book).
Computing:
The statistical software we shall use is Splus. Tutorial material
is available in the following documents (which can also be printed).
Other books on S and Splus which we shall use are:
-
Statistical Models in S
by Chambers and Hastie.
- S-plus Guide
Guide to Statistical and Mathematical Analysis
- The New S Language
by Becker Chambers and Wilks.
Regarding the SAS software:
Mina thinks the online help that is accesible from the program is more
helpful for beginners. You can get her help by writing to mina@post.tau.ac.il.
This year
introductory talk by Vered can be viewed, but you may benefit from Ronen's
Introductory talk given 4 years ago about SAS which is available as PowerPoint Presentation.
Topics:
- Introduction
- Data we get and data we analyse (Types of Data; Symmetry Additivity
and Linearity; Data Cleansing) (Splus program for analysing
skewness)
- Visualization of Data (Principles; Foundamental Problems; Parallel
Coordinates; Manet) Power Point Presentation .
- Mining of Association Rules.
- The pitfalls of multiplicity and the False Discovery Rate Power Point presentation (including material on
model selection)
- Multiple Linear Regression
- Jacknifing and cross validation (olar)
- Model Selection in high dimensions
- Logistic Regression
- Nearest Neighbor and Case Base Reasoning
- The curses and blessings of high dimensionality
- Decision Trees
- Bootstrapping (retsu'at hamagaf) and Bagging
- Neural Nets Models
Please keep on looking at this list as the course progresses. Some of
these will be activated later.
Requirements:
Homework sets will be assigned and graded.
- Targil 1 Handout
- Targil 2 using data which can also be read from math.tau.ac.il/~ybenja/segal/seg.dat
. Now available comments on solutions .
- Targil 3 using data which can also be read from math.tau.ac.il/~ybenja/kdd/ex2.txt
- Targil 4: For the country data for year 2000, to be found at http://pwt.econ.upenn.edu/php_site/pwt_index.php,
build a model explaining Per Capita Gross National Product by the other variables.
Summarize your results in 2 pages (including figures).
- Targil 5
A final independent group project of analysing a real data set is to
be handed in by the end of the exam break. The project might consist of participating
in the a data mining competition. Submitting in time for the competition
will be specially awarded by appropriate grade. Participation is in groups.
Last update: January, 2006