Department of Statistics & Operations Research

Statistics Seminars




Second Term

4, March*

Yonina Eldar,  Department of Electrical Engineering, Technion


Rethinking Biased Estimation: Improving Maximum Likelihood and the Cramer-Rao Bound

11, March

Michal Rosen-Zvi, Haifa Research Labs, IBM


Selecting anti-HIV therapies based on a variety of genomic and clinical factors

25, March

Allan R. Sampson, University of Pittsburgh 


Simultaneous confidence bands for isotonic functions

8, April

Havi Murad,  Bar Ilan University & Gertner Institute


Estimating and testing interactions in linear regression  models when explanatory variables are subject to classical  measurement error

29, April

Yair Goldberg


Manifold learning: The price of normalization

6, May

Micha Mandel


Estimating Hitting Times from Panel Data

24, June

Alla Nikolaevsky


Nonparametric estimation of a jump point in function's derivatives in regression model with stationary Gaussian noise



Seminar starts at 10:00!





First Term

23, October

Saharon Rosset, Tel Aviv University


Story of IBM Research's Success in KDD/Netflix Cup 2007

6, November

Daniel Yekutieli, Tel Aviv University


Bayesian adjusted inference  for selected parameters

20, November

David Steinberg, Tel Aviv University


Bayesian ideas in experimental design

11, December

Camil Fuchs, Tel Aviv University


A new statistical approach applied to an archeological find – Prospects and Criticism

18, December

Patrik Guggenberger, UCLA


On the size of testing procedures

25, December

Galit Shmueli, University of Maryland


Explanatory vs. Predictive Models in Scientific Research

1, January*

Avishai Mandelbaum, Industrial Engineering & Management, Technion


QED Queues: Quality- and Efficiency-Driven Call Centers

8, January

Noam Slonim, IBM Research


 FIRE: Finding Informative Regulatory Elements?

15, January

Alon Zakai


Consistency, Rate-Optimality and Local Behavior






* Joint Statistics/OR seminar




Seminars are held on Tuesdays, 10.30 am, Schreiber Building, 309 (see the TAU map ).  The seminar organizer is Daniel Yekutieli. To join the seminar mailing list or any other inquiries - please call (03)-6409612 or email


Details of previous seminars:



·             Saharon Rosset, Tel Aviv University

Story of IBM Research's Success in KDD/Netflix Cup 2007

The KDD Cup is the most prominent yearly data mining/modeling competition. This year it was associated with the Netflix 1M$ challenge – using the same data,  but to address a different modeling challenge.  In total over 70 groups participated in the Cup this year, and the IBM Research teams won one of the tasks and came third on the other.

For more:  I will briefly describe the Netflix Challenge and KDD Cup tasks, and then discuss our approaches to the Cup problems and the keys to our success.

Joint work with Claudia Perlich and Yan Liu of IBM Research.


·              Daniel Yekutieli, Tel Aviv University

Bayesian adjusted inference for selected parameters

Benjamini and Yekutieli suggested viewing FDR adjusted inference as marginal inference for selected parameters. I will explain this approach – focusing on its weaknesses. I will argue that overlooking the problem of selective inference is an abuse of Bayesian methodology; and introduce "valid" Bayesian adjusted inference for selected parameters. I will use the preposterior framework to study the frequentist properties of Bayesian inference for selected parameters. I will explain the relation between the Bayesian adjusted inference for selected parameters and Storey's pFDR and Efron's local FDR.

To make the discussion clearer I will demonstrate use of the Bayesian adjustments on microarray data.


·             David Steinberg, Tel Aviv University

Bayesian ideas in experimental design

The design of an experiment always reflects what the experimental team knows about the process under study.  Use of prior information is essential.  So it is perhaps surprising that formal ideas of Bayesian statistics have not received much emphasis in the design of experiments.  This talk will present some basics on Bayesian experimental design, following the review article of Chaloner and Verdinelli (Statistical Science, 1995).  Then I will look in more depth at the case of design for nonlinear statistical models, including GLM's.  For those settings, the use of Bayesian methods has been held back by heavy computational burdens.  I will discuss a heuristic solution to this problem proposed by Hovav Dror and then a formal mathematical solution that is part of joint work with Bradley Jones and Chris Gotwalt.

·             Camil Fuchs, Tel Aviv University

A new statistical approach applied to an archeological find – Prospects and Criticism

The statistical methodology was called lately to settle the controversy raised by a scientific puzzle of major importance. The puzzle is related to the re-analyzed inscriptions on the ossuaries from an ancient tomb from East Talpiot, unearthed in 1980. A new statistical approach was developed by Feuerverger to handle the intricacies of the complex problem. The presented conclusions claim that the inscriptions indicate that this may be the burial site of the New Testament (NT) family (the family of Jesus from Nazareth). In terms of the new approach, the defined level of 'surprisingness' for the cluster of names in the tomb is found to be very high, i.e., under the specified provisos, there is a very low probability that a random sample of such ossuaries contains a cluster of names which is more surprising than the cluster found. Undoubtedly, if validated, a discovery with potential to stir major interest both in academic as well as in religious circles.


But are the results uncontestable?   I don't think so.




·                     Patrik Guggenberger,   Department of Economics, UCLA

On the size of testing procedures


We consider inference based on a test statistic that has a limit distribution that is discontinuous in nuisance parameters. For example, a t-statistic in an AR(1) model has a limit distribution that is  discontinuous in the autoregressive parameter. The paper shows that subsampling, m out of n bootstrap, and standard fixed critical value tests based on such a test statistic often have asymptotic size – defined as the limit of the finite-sample size – that is greater than the nominal level of the tests. This is due to lack of uniformity in the pointwise asymptotics. We determine precisely the asymptotic size of such tests or confidence intervals under a general set of high-level conditions that are relatively easy to verify. The high-level conditions are verified in several examples. In situations where the size of the test or confidence interval is distorted, several methods are provided that alleviate or fully cure the problem of  size-distortion.




·             Galit Shmueli, Robert H. Smith School of Business, University of Maryland, College Park MD, USA

Explanatory vs. Predictive Models in Scientific Research

Explanatory models are designed for testing hypotheses that specify how and why certain empirical phenomena occur. Predictive models are aimed at predicting new observations with high accuracy. An age-old debate in philosophy of science deals with the difference between predictive and explanatory goals. In mainstream statistical research, however, the distinction between explanatory and predictive modeling is often overlooked, and there is a near-exclusive focus on explanatory methodology. This focus has permeated into empirical research in many fields such as information systems, economics and, in general, the social sciences. We investigate the issue from a statistical modeling perspective. Our premise is that (1) both explanatory and predictive statistical models are essential for
advancing scientific research; and (2) the different goals lead to key differences at each step of the modeling process. In this talk we discuss these two issues and in particular, we analyze each step of
the statistical modeling process (from data collection to model use)  and describe the different statistical components and issues that arise in explanatory modeling vs. predictive modeling.

Joint work with Otto Koppius, Rotterdam School of Management, Erasmus University, The Netherlands.




·             Avishai Mandelbaum, Industrial Engineering & Management, Technion


QED Queues: Quality- and Efficiency-Driven Call Centers


Through examples of Service Operations, with a focus on Telephone Call Centers, I review empirical findings that motivate or are motivated by (or both) interesting research questions. These findings give rise to features that are prerequisites for useful service models, for example customers’ (im)patience, time-varying demand, heterogeneity of customers and servers, over-dispersion in Poisson arrivals, generally-distributed (as opposed to exponential) service- and patience-durations, and more. Empirical analysis also enables validation of existing models and protocols, either supporting or refuting their relevance and robustness.


The mathematical framework for my models is asymptotic queueing theory, where limits are taken as the number of servers increases indefinitely, in a way that maintains a delicate balance against the offered-load. Asymptotic analysis reveals an operational regime that achieves, under already moderate scale, remarkably high levels of both service quality and efficiency. This is the QED Regime, discovered by Erlang and characterized by Halfin & Whitt. (QED = Quality- and Efficiency-Driven).


My main data-source is a unique repository of call-centers data, designed and maintained at the Technion’s SEE Laboratory. (SEE = Service Enterprise Engineering). The data is unique in that it is transaction-based: it details the individual operational history of all the calls handled by the participating call centers. (For example, one source of data is a network of 4 call centers of a U.S. bank, spanning 2.5 years and covering about 1000 agents; there are 218,047,488 telephone calls overall, out of which 41,646,142 where served by agents, while the rest were handled by answering machines.) To support data analysis, a universal data-structure and a friendly interface have been developed, under the logo DataMOCCA = Data MOdels for Call Centers Analysis. (I shall have with me DataMOCCA DVD’s for academic distribution.)




·        Noam Slonim, IBM Research

FIRE: Finding Informative Regulatory Elements?

Gene expression is directly regulated by protein transcription factors that bind at particular DNA or RNA sites in a sequence specific manner. A comprehensive characterization of these functional non-coding  elements, or motifs, remains a formidable challenge, especially for higher  eukaryotes.  I will present a rigorous computational methodology for ab-initio motif discovery from expression data, that utilizes the concept of mutual information, and have the following characteristics:
(i)   directly applicable to _any_ type of expression data:
(ii)  model-independent
(iii) simultaneously finds DNA motifs in upstream regions and RNA motifs in 3'UTRs and highlights their functional relations,
(iv)  scales well to metazoan genomes,
(v)   yields very few false positive predictions if any

(vi)  incorporates systematic analysis of the functional coherence of  the predicted motifs, their conservation, positional and orientation biases, cooperativity, and co-localization with other motifs

(vii) displays predictions via a novel user-friendly graphical interface.

I will present results for a variety of data types, measured for different organisms, including yeast, worm, fly, human, and the Plasmodium parasite responsible for malaria. I will further discuss in detail   surprising observations regarding gene expression regulation that were overlooked by previous studies and naturally arise of our analysis.

Based on joint work with Olivier Elemento and Saeed Tavazoie


·             Alon Zakai, Hebrew University

Consistency, Rate-Optimality and Local Behavior

In the context of nonparametric regression, we will consider the connection between local behavior -- being influenced mostly by the close-by part of the training set, when constructing an estimate at a
point x -- and the statistical properties of consistency and rate-optimality. First, we will see that all consistent estimators -- i.e., that asymptotically achieve the lowest possible expected loss for any distribution on (X,Y) -- necessarily exhibit local behavior, even those that appear to be defined in a non-local manner. Consistency is in fact logically equivalent to the combination of two properties: (1) a specific form of local behavior and (2) that the method's mean (over the entire X distribution) correctly estimates the true mean; thus, local behavior is fundamental to consistency. We will then consider a stronger form of local behavior, strict locality (of which kernel estimators are an example), and note that while strictly local estimators can achieve minimax rates, they cannot achieve the
actual minimax loss for finite sample sizes.


·             Yonina Eldar,  Technion

Rethinking Biased Estimation: Improving Maximum Likelihood and the Cramer-Rao Bound

One of the goals of statistical estimation theory is the development of performance bounds when estimating parameters of interest in a given model, as well as determining estimators that achieve these bounds. When the parameters to be estimated are deterministic, a popular approach is to restrict attention to unbiased estimators and develop bounds on the smallest mean-squared error (MSE) achievable within estimators of this class. Although it is well-known that lower MSE can be achieved by allowing for a bias, in applications it is typically unclear how to choose such an appropriate bias.
In this talk we develop bounds that dominate the conventional unbiased  Cramer-Rao bound (CRB) so that the resulting MSE bound is lower than the CRB  for all values of the unknowns. When an efficient maximum-likelihood (ML)
estimator achieving the CRB exists, we show how to construct an estimator with lower MSE regardless of the true unknown values, by linearly transforming the ML estimator. We then specialize the results to linear estimation in linear models. In  particular, we derive a class of estimators with lower MSE than the  conventional least-squares approach for all values of the unknown parameters, thus leading to estimation methods that are provably better than  least-squares.
 The procedures we develop are based on a saddle-point formulation of the problem which admits the use of convex optimization tools.




·             Michal Rosen-Zvi, , Haifa Research Labs, IBM.


Selecting anti-HIV therapies based on a variety of genomic and clinical factors

Motivation: Optimizing HIV therapies is crucial since the virus rapidly develops mutations to evade drugs pressure. Recent studies have shown that genotypic information might not be sufficient for the design of therapies and that other clinical and demographical factors may play a role in therapy failure. This study is designed to assess the improvement in prediction achieved when such information is taken into account. We use these factors to generate a prediction engine using a variety of machine-learning methods and to determine which clinical conditions are most misleading in terms of predicting the outcome of a therapy. Three different machine-learning techniques were used: generative discriminative method, regression with derived evolutionary features, and regression with a mixture of effects. All three methods had similar performances with an area under the ROC curve (AUC) of 0:77. A set of three similar engines limited to genotypic information only achieved an AUC of 0:75. A straightforward combination of the three engines consistently improves the prediction, with significantly better prediction when the full set of features is employed. The combined engine improves on predictions obtained from an on-line state-ofthe- art resistance interpretation system. Moreover, engines tend to disagree more on the outcome of failure therapies than regarding successful ones. Careful analysis of the differences between the engines revealed those mutations and drugs most closely associated with uncertainty of the therapy outcome. 

Joint work with Andre Altmann, Mattia Prosperi, Ehud Aharoni, Hani Neuvirth, Anders S¨onnerborg, Eugen Sch¨ ulter, Daniel Struck, Yardena Peres, Francesca Incardona, Rolf Kaiser, Maurizio Zazzi, and Thomas Lengauer



·             Allan R. Sampson, University of Pittsburgh 


Simultaneous confidence bands for isotonic functions


For the observed data, (xi, Yxi(j)), j = 1,…,nxi, i = 1,…,k, we suppose the underlying model:  Yx(j) ~ N (φ(x),σ2 ), j = 1,…,nx, x ε Ψ,   where Ψ = {x1,…,xk} is a finite linearly ordered set, (for xi,xj ε Ψ then xi < xj or xj < xi), φ is an unknown function nondecreasing in x, and σ2 > 0. Procedures for simultaneous confidence bands for φ(x), for x ε Ψ are considered for bivariate regression data observed from this model. This model can be viewed in the context of nonparametric regression where we use the knowledge that φ(x) is nondecreasing in x. Several existing procedures which provide simultaneous bands will be described in terms of a common point of view. We introduce a new “bandwidth” procedure which generalizes the previously introduced procedures of Korn (1982) and Lee (1996). This bandwidth procedure has what can be thought of as a tuning parameter which allows one to take into account prior ideas about how rapidly φ(x) increases relative to σ2. Two examples will be given to illustrate the application of these bands. Extensions of the bandwidth procedure to the case when Ψ is a partially ordered space will also be discussed.




·                 Havi Murad, Bar Ilan University & Gertner Institute

Estimating and testing interactions in linear regression  models when explanatory variables are subject to classical  measurement error

Estimating and testing interactions in a linear regression model when normally distributed explanatory variables are subject to classical measurement error is complex, since the interaction term is a product of two variables and involves errors of more complex structure. Our aim is to develop simple methods, based on the method of moments (MM) and regression calibration (RC) that yield consistent estimators of the regression coefficients and their standard errors when the model includes one or more interactions. In contrast to the available methods using structural equations models framework, our methods allow errors that are correlated with each other and can deal with measurements of relatively low reliabilities.
Using simulations, we show that, under the normality assumptions, the RC method yields estimators with negligible bias and is superior to MM in both bias and variance. We also show that the RC method also yields the correct Type I error rate of the test of the interaction. However, when the true covariates are not normally distributed, we recommend using MM. We provide an example, using data from the Israeli Glucose Intolerance, Obesity and Hypertension (GOH) study, relating homocystein to plasma folate and plasma vitamin B12 levels.

Joint work with Laurence Freedman.



·           Yair Goldberg, Hebrew University

Manifold learning: The price of normalization.

The problem of finding a compact representation for high-dimensional data is encountered in many areas of science and has motivated the development of various dimension-reducing algorithms. The Laplacian EigenMap dimension-reducing algorithm (Belkin & Niyogi, 2003),  widely used for its  intuitive approach and computational simplicity, claims to reveal the underlying non-linear structure of high-dimensional data. We present a general class of examples in which the Laplacian EigenMap fails to generate a reasonable reconstruction of the data given to it. We both prove our results analytically and show them empirically. This phenomenon is then explained with an analysis of the limit-case behavior of the Laplacian EigenMap algorithm both using asymptotics and the continuous Laplacian operator. We also discuss the relevance of these findings to the algorithms Locally Linear Embedding (Roweis and Saul, 2000), Local Tangent Space Alignment (Zhang and Zha, 2004), Hessian Eigenmap (Donoho and Grimes, 2004), and Diffusion Maps (Coifman and Lafon, 2006).

Joint work with Alon Zakai, Dan Kushnir and Ya'acov Ritov.



·             Micha Mandel, Hebrew University

Estimating Hitting Times from Panel Data

I consider data on independent Markov processes which are not followed continuously, but are only observed at several time points and their states then are recorded. The aim is to estimate time-to-event probabilities which are defined on the processes' paths. A simple example of such an event is the first visit to a set of states. However, more interesting events are defined by several time points. An example is the first time the process stays in state j at least \Delta time units. Such events are very important in studying diseases such as multiple sclerosis, where the focus is on sustained progression defined by a worsening that lasts six months or more.  In this talk, I discuss modeling and estimation of panel data in both the continuous and the discrete time cases, and present new methods for prediction. I demonstrate the methodology using data from a phase III clinical trial of relapsing remitting multiple sclerosis.


·             Alla Nikolaevsky, Tel Aviv University

Nonparametric estimation of a jump point in function's derivatives in regression model with stationary Gaussian noise

We consider a statistical problem of nonparametric estimation of  jump point   in the -th derivative of some unknown function where the noise is not white but stationary Gaussian. We assume that the -th derivative of the function satisfies Holder condition for , has a jump at    and can have an arbitrary form for   with some vanishingly small .  Since we do not assume any assumptions on regularity of  for , it is essentially the one-sided change point detection problem. The goal is to estimate the jump point . The proposed estimator is based on changes in empirical wavelet coefficients of the data and exploits the relation between function’s local regularity properties at a certain point and the rate of decay of function’s wavelet coefficients differences near this point across increasing resolution levels.

We proved that under mild conditions on covariance structure of the noise, the resulting  minimax rate of one-sided detection of a jump in the m-th derivative is . We develop a procedure for the jump point detection in the -th derivative of the function  that achieves this minimax rate. The resulting estimator is based on the first exceedance over a level-dependent threshold of empirical wavelet coefficients differences at a sufficiently high resolution level. We illustrate the proposed procedure on the example of P-phase detection of seismic signal.


This seminar is part of a defense on Alla Nikolaevsky PhD thesis