
Department of Statistics &
Operations Research
Statistics Seminars
2007/2008

Second Term
4,
March*
|
Yonina
Eldar,
Department of Electrical Engineering, Technion
|
|
Rethinking
Biased Estimation: Improving Maximum Likelihood and the Cramer-Rao Bound
|
11, March
|
Michal
Rosen-Zvi, Haifa Research Labs, IBM
|
|
Selecting anti-HIV therapies based on a variety of genomic and
clinical factors
|
25, March
|
Allan R. Sampson, University of Pittsburgh
|
|
Simultaneous
confidence bands for isotonic functions
|
8, April
|
Havi
Murad, Bar Ilan University & Gertner
Institute
|
|
Estimating and testing interactions in linear regression
models when explanatory variables are subject to classical measurement error
|
29, April
|
Yair
Goldberg
|
|
Manifold learning: The price of normalization
|
6, May
|
Micha
Mandel
|
|
Estimating Hitting Times from Panel Data
|
24, June
|
Alla Nikolaevsky
|
|
|
Seminar starts at 10:00!
23,
October
|
Saharon
Rosset, Tel
Aviv University
|
|
Story of
IBM Research's Success in KDD/Netflix Cup 2007
|
6,
November
|
Daniel Yekutieli, Tel Aviv
University
|
|
Bayesian
adjusted inference for selected parameters
|
20,
November
|
David Steinberg, Tel Aviv University
|
|
Bayesian
ideas in experimental design
|
11,
December
|
Camil
Fuchs, Tel Aviv University
|
|
A new statistical
approach applied to an archeological find – Prospects and Criticism
|
18,
December
|
Patrik Guggenberger, UCLA
|
|
On the size of testing procedures
|
25,
December
|
Galit
Shmueli, University
of Maryland
|
|
Explanatory
vs. Predictive Models in Scientific Research
|
1, January*
|
Avishai
Mandelbaum, Industrial Engineering &
Management, Technion
|
|
QED Queues:
Quality- and Efficiency-Driven Call Centers
|
8,
January
|
Noam
Slonim, IBM Research
|
|
FIRE:
Finding Informative Regulatory Elements?
|
15,
January
|
Alon
Zakai
|
|
|
|
|
|
|
* Joint Statistics/OR seminar
Seminars are held on
Tuesdays, 10.30 am, Schreiber
Building, 309 (see the
TAU map ). The seminar
organizer is Daniel Yekutieli. To join the seminar mailing list or any other
inquiries - please call (03)-6409612 or email yekutiel@post.tau.ac.il
Details of previous seminars:
ABSTRACTS
·
Saharon Rosset, Tel
Aviv University
Story of IBM
Research's Success in KDD/Netflix Cup 2007
The KDD
Cup is the most prominent yearly data mining/modeling competition. This year it
was associated with the Netflix 1M$ challenge – using
the same data, but
to address a different modeling challenge.
In total over 70 groups participated in the Cup this year, and the IBM
Research teams won one of the tasks and came third on the other.
For more: http://www.cs.uic.edu/~liub/Netflix-KDD-Cup-2007.html I will briefly describe the Netflix Challenge and KDD Cup tasks, and then discuss our
approaches to the Cup problems and the keys to our success.
Joint work with Claudia Perlich and
Yan Liu of IBM Research.
·
Daniel
Yekutieli, Tel Aviv University
Bayesian adjusted inference for selected
parameters
Benjamini and Yekutieli suggested viewing FDR adjusted inference as marginal
inference for selected parameters. I will explain this approach – focusing on
its weaknesses. I will argue that overlooking the problem of selective
inference is an abuse of Bayesian methodology; and introduce "valid"
Bayesian adjusted inference for selected parameters. I will use the
preposterior framework to study the frequentist properties of Bayesian
inference for selected parameters. I will explain the relation between the
Bayesian adjusted inference for selected parameters and Storey's
pFDR and Efron's local FDR.
To make the discussion clearer I will demonstrate use of the Bayesian
adjustments on microarray data.
Bayesian ideas in
experimental design
The design of an experiment always reflects what the
experimental team knows about the process under study. Use of prior
information is essential. So it is perhaps surprising that formal ideas of
Bayesian statistics have not received much emphasis in the design of
experiments. This talk will present some basics on Bayesian experimental
design, following the review article of Chaloner and Verdinelli (Statistical Science, 1995). Then I will
look in more depth at the case of design for nonlinear statistical models,
including GLM's. For those settings, the use of
Bayesian methods has been held back by heavy computational burdens. I
will discuss a heuristic solution to this problem proposed by Hovav Dror and then a formal
mathematical solution that is part of joint work with Bradley Jones and Chris Gotwalt.
·
Camil
Fuchs, Tel Aviv University
A new statistical
approach applied to an archeological find – Prospects and Criticism
The
statistical methodology was called lately to settle the controversy raised by a
scientific puzzle of major importance. The puzzle is related to the re-analyzed
inscriptions on the ossuaries from an ancient tomb from East
Talpiot, unearthed in 1980. A new statistical
approach was developed by Feuerverger to handle the
intricacies of the complex problem. The presented conclusions claim that the
inscriptions indicate that this may be the burial site of the New Testament
(NT) family (the family of Jesus from Nazareth).
In terms of the new approach, the defined level of 'surprisingness'
for the cluster of names in the tomb is found to be very high, i.e., under the
specified provisos, there is a very low probability that a random sample of
such ossuaries contains a cluster of names which is more surprising than the
cluster found. Undoubtedly, if validated, a discovery with potential to stir
major interest both in academic as well as in religious circles.
But are the
results uncontestable? I don't think
so.
·
Patrik Guggenberger, Department of Economics, UCLA
On the size
of testing procedures
We consider inference based on a test statistic that has a
limit distribution that is discontinuous in nuisance parameters. For example, a
t-statistic in an AR(1) model has a limit distribution
that is discontinuous in the
autoregressive parameter. The paper shows that subsampling,
m out of n bootstrap, and standard fixed critical value tests based on such a
test statistic often have asymptotic size – defined as the limit of the
finite-sample size – that is greater than the nominal level of the tests. This
is due to lack of uniformity in the pointwise asymptotics. We determine precisely the asymptotic size of
such tests or confidence intervals under a general set of high-level conditions
that are relatively easy to verify. The high-level conditions are verified in
several examples. In situations where the size of the test or confidence
interval is distorted, several methods are provided that alleviate or fully
cure the problem of size-distortion.
·
Galit
Shmueli, Robert H. Smith School of Business,
University of Maryland, College Park MD, USA
Explanatory vs. Predictive Models in Scientific Research
Explanatory models are designed for testing hypotheses that specify how and
why certain empirical phenomena occur. Predictive models are aimed at
predicting new observations with high accuracy. An age-old debate in philosophy
of science deals with the difference between predictive and explanatory goals.
In mainstream statistical research, however, the distinction between
explanatory and predictive modeling is often overlooked, and there is a
near-exclusive focus on explanatory methodology. This focus has permeated into
empirical research in many fields such as information systems, economics and,
in general, the social sciences. We investigate the issue from a statistical
modeling perspective. Our premise is that (1) both explanatory and predictive
statistical models are essential for
advancing scientific research; and (2) the different goals lead to key
differences at each step of the modeling process. In this talk we discuss these
two issues and in particular, we analyze each step of
the statistical modeling process (from data collection to model use) and describe the different statistical
components and issues that arise in explanatory modeling vs. predictive
modeling.
Joint work with Otto Koppius, Rotterdam School of
Management, Erasmus
University, The
Netherlands.
·
Avishai
Mandelbaum, Industrial Engineering & Management, Technion
QED Queues: Quality- and Efficiency-Driven Call Centers
Through examples of Service Operations, with a focus on
Telephone Call Centers, I review empirical findings that motivate or are
motivated by (or both) interesting research questions. These findings give rise
to features that are prerequisites for useful service models, for example
customers’ (im)patience, time-varying demand,
heterogeneity of customers and servers, over-dispersion in Poisson arrivals,
generally-distributed (as opposed to exponential) service- and
patience-durations, and more. Empirical analysis also enables validation of
existing models and protocols, either supporting or refuting their relevance
and robustness.
The mathematical framework for my models is asymptotic queueing theory, where limits are taken as the number of
servers increases indefinitely, in a way that maintains a delicate balance
against the offered-load. Asymptotic analysis reveals an operational regime
that achieves, under already moderate scale, remarkably high levels of both
service quality and efficiency. This is the QED Regime, discovered by Erlang and characterized by Halfin
& Whitt. (QED = Quality- and Efficiency-Driven).
My main data-source is a unique repository of call-centers
data, designed and maintained at the Technion’s SEE
Laboratory. (SEE = Service Enterprise
Engineering). The data is unique in that it is transaction-based: it details
the individual operational history of all the calls handled by the
participating call centers. (For example, one source of data is a network of 4
call centers of a U.S. bank, spanning 2.5 years and covering about 1000 agents;
there are 218,047,488 telephone calls overall, out of which 41,646,142 where
served by agents, while the rest were handled by answering machines.) To
support data analysis, a universal data-structure and a friendly interface have
been developed, under the logo DataMOCCA = Data MOdels for Call Centers Analysis. (I shall have with me DataMOCCA DVD’s for academic distribution.)
·
Noam
Slonim, IBM Research
FIRE: Finding Informative Regulatory Elements?
Gene expression is directly regulated by protein
transcription factors that bind at particular DNA or RNA sites in a sequence
specific manner. A comprehensive characterization of these functional
non-coding elements, or motifs, remains
a formidable challenge, especially for higher eukaryotes. I will
present a rigorous computational methodology for ab-initio
motif discovery from expression data, that utilizes the concept of mutual
information, and have the following characteristics:
(i) directly applicable to _any_ type of
expression data:
(ii) model-independent
(iii) simultaneously finds DNA motifs in upstream regions and RNA motifs in
3'UTRs and highlights their functional relations,
(iv) scales well to metazoan genomes,
(v) yields very few false positive predictions if any
(vi) incorporates systematic analysis of the
functional coherence of the predicted motifs, their conservation,
positional and orientation biases, cooperativity, and
co-localization with other motifs
(vii) displays predictions via a novel user-friendly
graphical interface.
I will present results for a variety of data types,
measured for different organisms, including yeast, worm, fly, human, and the
Plasmodium parasite responsible for malaria. I will further discuss in detail
surprising observations regarding gene expression regulation that were
overlooked by previous studies and naturally arise of our analysis.
Based on joint work with Olivier Elemento
and Saeed Tavazoie
·
Alon
Zakai, Hebrew
University
Consistency,
Rate-Optimality and Local Behavior
In the context of nonparametric regression, we will consider the connection
between local behavior -- being influenced mostly by the close-by part of the training
set, when constructing an estimate at a
point x -- and the statistical properties of consistency and rate-optimality.
First, we will see that all consistent estimators -- i.e., that asymptotically
achieve the lowest possible expected loss for any distribution on (X,Y) --
necessarily exhibit local behavior, even those that appear to be defined in a
non-local manner. Consistency is in fact logically equivalent to the
combination of two properties: (1) a specific form of local behavior and (2)
that the method's mean (over the entire X distribution) correctly estimates the
true mean; thus, local behavior is fundamental to consistency. We will then
consider a stronger form of local behavior, strict locality (of which kernel
estimators are an example), and note that while strictly local estimators can
achieve minimax rates, they cannot achieve the
actual minimax loss for finite sample sizes.
·
Yonina
Eldar, Technion
Rethinking Biased
Estimation: Improving Maximum Likelihood and the Cramer-Rao
Bound
One of the goals of statistical estimation theory is the development of
performance bounds when estimating parameters of interest in a given model, as
well as determining estimators that achieve these bounds. When the parameters
to be estimated are deterministic, a popular approach is to restrict attention
to unbiased estimators and develop bounds on the smallest mean-squared error
(MSE) achievable within estimators of this class. Although it is well-known
that lower MSE can be achieved by allowing for a bias, in applications it is
typically unclear how to choose such an appropriate bias.
In this talk we develop bounds that dominate the conventional unbiased Cramer-Rao bound
(CRB) so that the resulting MSE bound is lower than the CRB for all values of the unknowns. When an
efficient maximum-likelihood (ML)
estimator achieving the CRB exists, we show how to construct an estimator with
lower MSE regardless of the true unknown values, by linearly transforming the
ML estimator. We then specialize the results to linear estimation in linear
models. In particular, we derive a class
of estimators with lower MSE than the
conventional least-squares approach for all values of the unknown parameters,
thus leading to estimation methods that are provably better than least-squares.
The procedures we develop are based on a
saddle-point formulation of the problem which admits the use of convex
optimization tools.
·
Michal
Rosen-Zvi, , Haifa
Research Labs, IBM.
Selecting
anti-HIV therapies based on a variety of genomic and clinical factors
Motivation: Optimizing HIV therapies is crucial since the virus rapidly
develops mutations to evade drugs pressure. Recent studies have shown that
genotypic information might not be sufficient for the design of therapies and
that other clinical and demographical factors may play a role in therapy
failure. This study is designed to assess the improvement in prediction
achieved when such information is taken into account. We use these factors to
generate a prediction engine using a variety of machine-learning methods and to
determine which clinical conditions are most misleading in terms of predicting
the outcome of a therapy. Three different machine-learning techniques were
used: generative discriminative method, regression with derived evolutionary
features, and regression with a mixture of effects. All three methods had
similar performances with an area under the ROC curve (AUC) of 0:77. A set of
three similar engines limited to genotypic information only achieved an AUC of
0:75. A straightforward combination of the three engines consistently improves
the prediction, with significantly better prediction when the full set of
features is employed. The combined engine improves on predictions obtained from
an on-line state-ofthe- art resistance interpretation
system. Moreover, engines tend to disagree more on the outcome of failure
therapies than regarding successful ones. Careful analysis of the differences
between the engines revealed those mutations and drugs most closely associated
with uncertainty of the therapy outcome.
Joint work with Andre Altmann, Mattia Prosperi, Ehud Aharoni, Hani
Neuvirth, Anders S¨onnerborg,
Eugen Sch¨ ulter, Daniel Struck, Yardena
Peres, Francesca Incardona, Rolf Kaiser, Maurizio Zazzi, and Thomas Lengauer
·
Allan
R. Sampson, University
of Pittsburgh
Simultaneous
confidence bands for isotonic functions
For the observed data, (xi,
Yxi(j)), j = 1,…,nxi, i = 1,…,k, we suppose the underlying model: Yx(j) ~ N (φ(x),σ2
), j = 1,…,nx, x ε Ψ, where Ψ = {x1,…,xk}
is a finite linearly ordered set, (for xi,xj ε
Ψ then xi < xj or xj
< xi), φ is an unknown function nondecreasing in x, and σ2 > 0.
Procedures for simultaneous confidence bands for φ(x), for x ε Ψ
are considered for bivariate regression data observed
from this model. This model can be viewed in the context of nonparametric
regression where we use the knowledge that φ(x) is nondecreasing in x.
Several existing procedures which provide simultaneous bands will be described
in terms of a common point of view. We introduce a new “bandwidth” procedure
which generalizes the previously introduced procedures of Korn
(1982) and Lee (1996). This bandwidth procedure has what can be thought of as a
tuning parameter which allows one to take into account prior ideas about how
rapidly φ(x) increases relative to σ2. Two examples will be given to
illustrate the application of these bands. Extensions of the bandwidth
procedure to the case when Ψ is a partially ordered space will also be
discussed.
·
Havi
Murad, Bar Ilan University
& Gertner Institute
Estimating and testing interactions in linear regression models when
explanatory variables are subject to classical measurement error
Estimating and testing interactions in a linear regression model when normally
distributed explanatory variables are subject to classical measurement error is
complex, since the interaction term is a product of two variables and involves
errors of more complex structure. Our aim is to develop simple methods, based
on the method of moments (MM) and regression calibration (RC) that yield
consistent estimators of the regression coefficients and their standard errors
when the model includes one or more interactions. In contrast to the available
methods using structural equations models framework, our methods allow errors
that are correlated with each other and can deal with measurements of
relatively low reliabilities.
Using simulations, we show that, under the normality assumptions, the RC method
yields estimators with negligible bias and is superior to MM in both bias and
variance. We also show that the RC method also yields the correct Type I error
rate of the test of the interaction. However, when the true covariates are not
normally distributed, we recommend using MM. We provide an example, using data
from the Israeli Glucose Intolerance, Obesity and Hypertension (GOH) study,
relating homocystein to plasma folate
and plasma vitamin B12 levels.
Joint work with Laurence Freedman.
·
Yair Goldberg, Hebrew University
Manifold
learning: The price of normalization.
The problem of finding a compact representation for
high-dimensional data is encountered in many areas of science and has motivated
the development of various dimension-reducing algorithms. The Laplacian EigenMap
dimension-reducing algorithm (Belkin & Niyogi, 2003),
widely used for its intuitive
approach and computational simplicity, claims to reveal the underlying
non-linear structure of high-dimensional data. We present a general class of
examples in which the Laplacian EigenMap
fails to generate a reasonable reconstruction of the data given to it. We both
prove our results analytically and show them empirically. This phenomenon is
then explained with an analysis of the limit-case behavior of the Laplacian EigenMap algorithm both
using asymptotics and the continuous Laplacian operator. We also discuss the relevance of these
findings to the algorithms Locally Linear Embedding (Roweis
and Saul, 2000), Local Tangent Space Alignment (Zhang and Zha,
2004), Hessian Eigenmap (Donoho
and Grimes, 2004), and Diffusion Maps (Coifman and Lafon, 2006).
Joint work with Alon Zakai, Dan Kushnir and Ya'acov Ritov.
·
Micha Mandel, Hebrew University
Estimating
Hitting Times from Panel Data
I consider
data on independent Markov processes which are not followed continuously, but
are only observed at several time points and their states then are recorded.
The aim is to estimate time-to-event probabilities which are defined on the
processes' paths. A simple example of such an event is the first visit to a set
of states. However, more interesting events are defined by several time points.
An example is the first time the process stays in state j at least \Delta time
units. Such events are very important in studying diseases such as multiple
sclerosis, where the focus is on sustained progression defined by a worsening
that lasts six months or more. In this
talk, I discuss modeling and estimation of panel data in both the continuous
and the discrete time cases, and present new methods for prediction. I
demonstrate the methodology using data from a phase III clinical trial of
relapsing remitting multiple sclerosis.
Nonparametric
estimation of a jump point in function's derivatives in regression model with
stationary Gaussian noise
We
consider a statistical problem of nonparametric estimation of jump point
in the
-th derivative of some unknown
function
where the noise is not white but stationary Gaussian. We
assume that the
-th derivative of the function
satisfies Holder
condition for
, has a jump at
and can have an arbitrary
form for
with some vanishingly
small
. Since we do not
assume any assumptions on regularity of
for
, it is essentially the one-sided change point detection
problem. The goal is to estimate the jump point
. The proposed estimator is based on changes in empirical
wavelet coefficients of the data and exploits the relation between function’s
local regularity properties at a certain point and the rate of decay of
function’s wavelet coefficients differences near this point across increasing
resolution levels.
We
proved that under mild conditions on covariance structure of the noise, the resulting minimax rate of one-sided detection of a jump in the
m-th derivative is
. We develop a procedure for the jump point detection in the
-th
derivative of the function
that achieves this minimax rate. The resulting estimator is based on the first
exceedance over a level-dependent threshold of
empirical wavelet coefficients differences at a sufficiently high resolution
level. We illustrate the proposed procedure on the example of P-phase detection
of seismic signal.
This seminar is part of a defense on Alla
Nikolaevsky PhD thesis