# Statistics Seminars

### Second Semester

###### Matan Gavish, HUJI

Optimal thresholding of singular values and eigenvalues

###### Tirza Routtenberg, BGU

Estimation after parameter selection: Estimation methods, performance analysis, and adaptive sampling

###### Saharon Rosset, TAU

Estimating Random-X Prediction Error of Regression Models

###### 17  May

Lucien Birge, Université Pierre et Marie Curie  (Paris VI)

ρ-estimation – a robust alternative to maximum likelihood

###### Jeroen de Mast,  University of Amsterdam

Estimation of the random error of binary tests using adaptive polynomials

###### 26 June

Asaf Cohen, University of Michigan

A Multiclass Queueing Model in the Moderate-Deviation Heavy-Traffic Regime

###### 5 July

Ofer Harel, University of Connecticut

### First Semester

###### 17 November

Omer Weissbrod, Tel Aviv University and Technion

Modern Applications of Linear Mixed Models for Large Genetic Studies

###### Shahar Mendelson, Technion

The small-ball method and its application in Learning Theory

###### Amichai Painsky

Generalized Independent Component Analysis Over Finite Alphabets

8 December

Armin Schwartzman, NC State University

###### Confidence Regions for Excursion Sets in Asymptotically Gaussian Random Fields

22 December

îòøëú úåîëú îåîçä ì÷áéòú äðãéøåú ùì ò÷áåú ðòìééí

5 January

Omer Bobrowski, Duke University

Random Topology and its Applications

Seminars are held on Tuesdays, 10.30 am, Schreiber Building, 309 (see the TAU map ). The seminar organizer is Daniel Yekutieli.

To join the seminar mailing list or any other inquiries - please call (03)-6409612 or email 12345yekutiel@post.tau.ac.il54321 (remove numbers unless you are a spammer…)

Seminars from previous years

ABSTRACTS

·         Jelle Goeman, Radboud university medical center

Hommel's method for false discovery proportions

We study the combination of closed testing with local tests based on the Simes inequality that underlies Hochberg's and Hommel's methods for familywise error control. We show this combination can also be used to make simultaneous confidence statements for the false discovery proportion of arbitrary sets of hypotheses. These confidence statements are at least as powerful as those arising trivially from Hommel’s method, and can be much more powerful. Calculation time is linear in set size after an initial preparatory step that takes m log(m) time for m hypotheses. By their simultaneity, the coverage of these confidence statements is guaranteed even if sets of interest are chosen post hoc. With this method, the role of the user and the method can be reversed in multiple testing. Where normally the user chooses the error rate and the multiple testing method the rejected set, now the user can choose the rejected set and the multiple testing method calculates the error rate. This new method has some interesting connections to the Benjamini & Hochberg FDR method, which we hope to explore further.

·         Abraham Wyner, University of Pennsylvania

Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers

There is a large literature explaining why AdaBoost is a successful classifier. The literature on AdaBoost focuses on classifier margins and boosting's interpretation as the optimization of an exponential likelihood function. These existing explanations, however, have been pointed out to be incomplete. A random forest is another popular ensemble method for which there is substantially less explanation in the literature. We introduce a novel perspective on AdaBoost and random forests that proposes that the two algorithms work for essentially similar reasons. While both classifiers achieve similar predictive accuracy, random forests cannot be conceived as a direct optimization procedure. Rather, random forests is a self-averaging, interpolating algorithm which creates a "spikey-smooth” classifier. We show that AdaBoost has the same property. We conjecture that both AdaBoost and random forests  therefore succeed because of this mechanism. We provide a number of examples and some theoretical justification to support this explanation. In the process, we question the conventional wisdom that suggests that boosting algorithms for classification require regularization or early stopping and should be limited to low complexity classes of learners, such as decision stumps. We conclude that boosting should be used like random forests: with large decision trees and without direct regularization or early stopping.

·         Omer Weissbrod, Tel Aviv University and Technion

Modern Applications of Linear Mixed Models for Large Genetic Studies

In recent years, linear mixed models (LMMs) have become an indispensable tool in large-scale genetic studies. At their core, LMMs are probabilistic models encoding the assumption that individuals sharing similar genotypes are more likely to share similar phenotypes. Although their origins date back at least to the 1950s, use of LMMs in genetic studies has recently skyrocketed, owing to increasing sample sizes and algorithmic improvements.

I will describe a model that improves on standard LMMs by assuming a complex parametric form for the covariance matrix, using multiple-kernel methods from the machine learning literature. The resulting model can automatically identify genetic interactions, and enables jointly inferring all kernel parameters efficiently. In an analysis of eight human genetic diseases and over a hundred mouse phenotypes, our method outperforms competing methods to attain state of the art phenotype prediction results.

If time allows, I will additionally describe modern methods for LMM hypothesis testing under binary phenotypes, with a special emphasis on proper handling of case-control ascertainment.

This work was done my PhD supervisors, Saharon Rosset and Dan Geiger.

·         Shahar Mendelson, Technion

The small-ball method and its application in Learning Theory

The small-ball method leads to nontrivial lower bounds on the infimum of certain empirical processes, most notably, on $inf_{f \in F} \frac{1}{N}\sum_{i=1}^N f^2(X_i)$, where $F$ is a class of functions on a probability space $(\Omega,\mu)$ and $X_1,...,X_N$ are independent, distributed according to $\mu$. Such estimates play an essential role in many problems, for example, Sparse Recovery; the smallest singular value of random matrices with iid rows; (random) Gelfand widths of convex bodies, etc. The main feature of the small-ball method is that it requires rather minimal assumptions on the underlying class $F$. Thus, the lower bound may be established even in situations in which two-sided concentration is simply false.

I will present the highlights of the method and then focus on one application: obtaining a sharp (minimax) bound on the error rate of Empirical Risk Minimization performed in a convex class. Using the small-ball method one may obtain a rate that scales correctly with the "richness" of the class (the somewhat vague term will be clarified) and with the noise level of the problem; in particular, when the noise level tends to zero, the rate tends to "noise-free" (realizable) rate.

·         Amichai Painsky, Tel Aviv University

Generalized Independent Component Analysis Over Finite Alphabets

Independent component analysis (ICA) is a statistical method for transforming an observable multidimensional random vector into components that are as statistically independent as possible from each other. Usually the ICA framework assumes a model according to which the observations are generated (such as a linear transformation with additive noise). ICA over finite fields is a special case of ICA in which both the observations and the independent components are over a finite alphabet.

In this work we consider a generalization of this framework in which an observation vector is decomposed to its independent components (as much as possible) with no prior assumption on the way it was generated. This generalization is also known as Barlow’s minimal redundancy representation problem and is considered an open problem. We propose several theorems and show that this NP hard problem can be accurately solved with a branch and bound search tree algorithm, or tightly approximated with a series of linear problems. Moreover, we show there exists a simple transformation (namely, order permutation) which provides a greedy yet very effective approximation of the optimal solution. We further show that while not every random vector can be efficiently decomposed into independent components, the vast majority of vectors do decompose very well (that is, with a small constant cost), as the dimension increases. Our contribution provides the first efficient set of solutions to Barlow’s problem.

The minimal redundancy representation has many applications, mainly in the fields of neural networks and deep learning. In this work we show that this formulation further applies to large alphabet source coding.

Joint work with Prof. Saharon Rosset and Prof. Meir Feder from the EE department.

·         Armin Schwartzman, NC State University

Confidence Regions for Excursion Sets in Asymptotically Gaussian Random Fields, with an Application to Climate

The goal of this work is to give confidence regions for the excursion set of a spatial function above a given threshold from repeated noisy observations on a fine grid of fixed locations. Given an asymptotically Gaussian estimator of the target function, a pair of data-dependent nested excursion sets are constructed that are sub- and super-sets of the true excursion set, respectively, with a desired confidence. Asymptotic coverage probabilities are determined via a multiplier bootstrap method, not requiring Gaussianity of the original data nor stationarity or smoothness of the limiting Gaussian field. The method is used to determine geographical regions where the mean summer and winter temperatures are expected to increase by mid 21st century by more than 2 degrees Celsius.

·         éåøí é÷åúéàìé, äîçì÷ä ìîãòé äîçùá, îëììä à÷ãîéú äãñä éøåùìéí.

îòøëú úåîëú îåîçä ì÷áéòú äðãéøåú ùì ò÷áåú ðòìééí

ò÷áåú ðòìééí îäååú àçú äøàéåú äçùåáåú áîùôè äôìéìé, àê äï ðúåðåú ìàçøåðä úçú äú÷ôä áãáø äúå÷ó äîãòé ùìäï. äôø÷èé÷ä ä÷ééîú îúáññú òì çéôåù ñéîðéí éçåãééí áò÷áåú äðîöàåú áæéøú äôùò åâí áäèáòåú ðñéåï ùì ðòìéé äçùåãéí äðáã÷åú áîòáãä. ñéîðéí éçåãééí àìä äí ôâîéí ÷èðéí áã"ë äðåöøéí áðòì áîäìê äùéîåù áä. ëîåú äôâîéí äúåàîéí ùîöìéçéí ìîöåà áò÷áú æéøú äôùò åáäèáòú äðñéåï ùáîòáãä åâí âåãìí åöåøúí îäååéí îãã ìîéãú ääúàîä áéï äò÷áåú. ëãé ìçæ÷ àú äúå÷ó äîãòé ùì òðó ò÷áåú äðòìééí ôéúçðå îòøëú ìäòøëä ëîåúéú ùì ðãéøåúí ùì ôâîéí áäèáòåú ðñéåï. àñôðå îàâø ðúåðéí âãåì ùì ë 13 àìôéí ôâîéí ùåðéí òì ôðé ëàøáò îàåú ðòìééí, åàôééðå àú äðãéøåú ùì äôâîéí. äàôééåï ëìì äòøëú äääúôìâåéåú ùì ùìåùä îàôééðéí ùì äôâîéí – îé÷åîí òì ôðé äðòì, ëéååðí, åöåøúí. ìöåøê æä ôéúçðå ùéèåú ìäòøëú îàôééðéí àìä, ëåìì ùâéàú äîãéãä òáåø ëì îàôééï. áäøöàä àöéâ àú äîòøëú, ùéèåú äàéñåó åäòøëú ääúôìâåéåú, äáòéåú ä÷ééîåú åëéååðéí ìäîùê äîç÷ø.

áùéúåó:

éøåï ùåø, ùøéðä åéæðø åöãå÷ öç, îòáãú ñéîðéí åçåîøéí, äîçì÷ä ìæéäåé ôìéìé, îùèøú éùøàì.

ôøåô îéëä îðãì åúìîéãú äãå÷èåøè ðòîé ÷ôìï-ãîøé, äîçì÷ä ìñèèéñèé÷ä, äàåðéáøñéèä äòáøéú.

·         Omer Bobrowski, Duke University

Random Topology and its Applications

The study of random topology focuses on describing high-level qualitative properties of random spaces. This field has been rapidly developed in the past decade, and is driven both by deep theoretical questions and state-of-the-art data analysis applications. Topological Data Analysis (TDA) broadly refers to the use of concepts from mathematical topology to analyze data and networks. In the past decade a variety of powerful topological tools has been introduced, and were proven useful for applications in various fields (e.g. shape analysis, signal processing, neuroscience, and genomic research). The theory developed in random topology aims to provide a solid foundation for the statistical analysis of these methods, which to date is at a very preliminary stage.

This talk will be divided into three parts. First, I will provide an introduction and motivation to random topology and TDA. Next, we will discuss the theory of random geometric complexes. In particular, we will consider their Betti numbers (counting “cycles” in different dimensions), and phase transitions related to these. Finally, we will present a few examples of combining the theory of random complexes with statistical problems related to TDA. In particular, we are interested in analyzing topological noise” that appears in such problems, for the purposes of filtering and hypothesis testing.

·         Sivan Sabato, Ben-Gurion University

Active linear regression

We propose a new active learning algorithm for parametric linear regression with random design. We provide finite sample convergence guarantees for general distributions in the misspecified model. This is the first active learner for this setting that provably can improve over passive learning.
Following the stratification technique advocated in Monte-Carlo function integration, our active learner approaches the optimal risk using piecewise constant approximations. Experiments demonstrate the algorithm's convergence to the oracle rates.

Joint work with Remi Munos, INRIA Lille.

·         Yanyuan Ma, University of South Carolina

Robust mixed-effects model for clustered failure time data: application to Huntington's disease event measures

An important goal in clinical and statistical research is estimating the distribution for clustered failure times, which have a naturalintra-class dependency and are subject to censoring. We propose to handle these inherent challenges with a novel approach that does not impose restrictive modeling or distributional assumptions. Rather, using a logit transformation, we relate the distribution for clustered failure times to covariates and a random, subject-specific effect such that the covariates are modeled with unknown functional forms, and  the random effect is distribution-free and potentially correlated with the covariates. Over a range of time points, the model is shown to be reminiscent of an additive logistic mixed effect model.  Such a structure allows us to handle censoring via pseudo-value regression and to develop semiparametric techniques that completely factor out the unknown random effect. We show both theoretically and empirically that the resulting estimator is consistent for any choice of random effect distribution and for any dependency structure between the random effect and covariates. Lastly, we illustrate the method's utility in an application to the Cooperative Huntington's Observational Research Trial data, where our method provides new insights into differences between motor and cognitive impairment event times in subjects at risk for Huntington's disease.

·         Matan Gavish, HUJI

Optimal thresholding of singular values and eigenvalues

It is common practice in multivariate and matrix-valued data analysis to reduce dimensionality by performing a Singular Value Decomposition or Principal Component Analysis, and keeping only $r$ singular values or principal components, the rest being presumably associated with noise. However, the literature does not propose a disciplined criterion to determine $r$; most practitioners still look for the elbow in the Scree Plot'', a 50-years-old heuristic performed by eye. I'll review a line of work which develops a systematic approach to eigenvalue and singular value thresholding. This approach assumes that the  signal is low-rank and that the noise is rotationally invariant. Recent results derive optimal thresholds in the presence of quite general noise distributions.

Joint work with David Donoho, Iain Johnstone and Edgar Dobriban (Stanford).

·         Tirza Routtenberg, BGU

Estimation after parameter selection: Estimation methods, performance analysis, and adaptive sampling

In many practical parameter estimation problems, such as medical experiments and cognitive radio communications, parameter selection is performed prior to estimation. The selection process has a major impact on subsequent estimation by introducing a selection bias and creating coupling between decoupled parameters. As a result, classical estimation theory may be inappropriate and inaccurate and a new methodology is needed. In this study, the problem of estimating a preselected unknown deterministic parameter, chosen from a parameter set based on a predetermined data-based selection rule, Ψ, is considered. In this talk, I present a general non-Bayesian estimation theory for estimation after parameter selection, includes estimation methods, performance analysis, and adaptive sampling strategies. First, I use the post-selection mean-square error (PSMSE) criterion as a performance measure instead of the commonly used mean-square-error (MSE). The corresponding Cramér-Rao-type bound on the PSMSE of any Ψ-unbiased estimator is derived, where the Ψ -unbiasedness is in the Lehmann unbiasedness sense. The post-selection maximum-likelihood (PSML) estimator is presented and its Ψ–efficiency properties are demonstrated. Practical implementations of the PSML estimator are proposed as well. Finally, I discuss the concept of adaptive sampling in a two-sampling stages scheme of selection and estimation.

·         Lotem Kaplan, TAU

Empirical Approach for Sequential Bayesian Inference with Application to Active Learning

Active learning is a sequential learning framework, that holds two main components: (i) A classification model, and (ii) Querying strategy to choose the next data instance to observe. The querying strategy can have important consequences for the performance of the classifier.

The Active learning field is closely related to sequential experiment design, in the statistical literature. We approach this problem by applying sequential Bayesian inference to logistic regression. In this work we follow and extend the empirical representation of posterior distributions by Dror and Steinberg (2008) and show its application to binary data and active learning.

This seminar describes Lotem Kaplan's dissertation work under the supervision of Prof. David Steinberg and is presented as part of the PhD requirements of Tel Aviv University.

·         Saharon Rosset, TAU

Estimating Random-X Prediction Error of Regression Models

The areas of model selection and model evaluation for predictive modeling have received extensive treatment in the statistics literature, leading to both theoretical results and practical methods based on covariance penalties and other approaches. However, the vast majority of this work is based on the Fixed-X assumption'', where covariate values are assumed to be non-random and known. By contrast, in most modern predictive modeling applications, it is more reasonable to take the Random-X'' view, where future prediction points are random and new. In this work we concentrate on examining the applicability of the covariance-penalty approaches to this problem. We propose a decomposition of the Random-X prediction error that clarifies the additional error due to Random-X, which is present in both the variance and bias components of the error. This decomposition is very general, but we focus on its application to the most fundamental case, of least squares regression. We show how to quantify the excess variance using simple random-matrix results, leading to a covariance penalty approach generalizing previous methods, like Tukey's MSPE and GCV. To account for excess bias, we propose to take only the bias component of the ordinary cross validation (OCV) estimate, resulting in a hybrid penalty term. This hybrid penalty approach is shown to be empirically superior to OCV in regression model evaluation simulations. This empirical observation is supported by a common representation of the two methods, which shows that the OCV has one more random component.

This talk is based on joint work with Ryan Tibshirani, and describes work in progress.

·         Lucien Birge, Université Pierre et Marie Curie  (Paris VI)

ρ-estimation – a robust alternative to maximum likelihood

It is well-known that the maximum likelihood, although widely used in Statistics, suffers from various defects. From a theoretical point of view, strong assumptions are needed to ensure that it performs in a satisfactory way and from a practical one, it is definitely not robust, i.e. quite sensitive to model misspecifications. These problems are mainly connected to the use of the log function, which is not bounded, to define the log-likelihood ratios. Replacing it by a specific bounded function leads to an alternative methods with better properties, including robustness (in the sense of Peter Huber). Moreover, in nice parametric models and when the true distribution does belong to the model, the new estimator is asymptotically equivalent to the maximum likelihood estimator.

The interested persons may look at http://arxiv.org/abs/1403.6057.1, this is joint work with Yannick Baraud and Mathieu Sart, this talk is a Sackler distinguished lecture.

·         Jeroen de Mast, University of Amsterdam

Estimation of the random error of binary tests using adaptive polynomials

We propose an efficient and robust method for estimating the random error of binary tests and measurements in the situation that a gold standard or reference standard is not available. Recent studies have shown that in this situation, the widely used false acceptance probability (FAP) and false rejection probability (FRP), or their complements specificity and sensitivity,  are difficult to estimate. We show by mathematical analysis that this problem is inherent to the unavailability of a gold standard. Instead, the proposed method determines the random components of the error probabilities: the inconsistent acceptance and rejection probabilities IAP and IRP . The estimation method is based on an extension of the beta distribution, and adapts the model complexity to the data. For estimating efficiency, the approach works from a sample taken from the stream of initially rejected parts and also incorporates a historical rejection rate (“baseline data”). A simulation study demonstrates the method’s robustness and is the basis for sample size recommendations. The usefulness of the method is shown from a real-life application, where it outperforms existing methods.

·         Regev Schweiger, TAU

Fast and accurate construction of confidence intervals for heritability

Estimation of heritability is fundamental in genetic studies. Recently, heritability estimation using linear mixed models (LMMs) has gained popularity, because these estimates can be obtained from unrelated individuals collected in genome-wide association studies. Typically, heritability estimation under LMMs uses the restricted maximum likelihood (REML) approach. Existing methods for the construction of confidence intervals and standard errors for REML rely on asymptotic properties. However, these assumptions are often violated due to the bounded parameter space, statistical dependencies, and limited sample size, leading to biased estimates and inflated or deflated confidence intervals. In our work, we show that the construction of confidence intervals by current methods is inaccurate, especially when the true heritability is relatively low or relatively high. We propose a computationally efficient method for the estimation of the distribution of the heritability estimator, and for the construction of accurate confidence intervals.

Joint work with Eran Halperin, Saharon Rosset, Shachar Kaufman and Eleazar Eskin.

·         Asaf Cohen, University of Michigan

A Multiclass Queueing Model in the Moderate-Deviation Heavy-Traffic Regime

Risk sensitive control with vanishing noise problems deals with a risk averse decision maker who observes a small noise controlled diffusion process. The model was presented by Jacobson in [4] and was broadly studied afterwards; see, e.g., [3]. By Girsanov's change of measure type of arguments one usually deduces that the limit of such problems is governed by a differential game, whose solution plays a role in finding an asymptotically optimal control in the original problem. Traditionally, queueing systems under heavy-traffic are approximated by diffusion processes. I will present a critically loaded system with small overload probability. Thus, the evolution of the system is approximated by a small noise diffusion. The controller is risk averse and uses a discounted version of the risk sensitive cost. Change of measure arguments do not apply here. We developed new tools to show that the model is governed by a differential (deterministic) game. We explicitly solved the game and showed that its solution eventually leads to an asymptotically optimal stationary feedback policy in the queueing problem.

This is joint work with Rami Atar. The talk is based on [1] and [2], with a focus on the new techniques that were recently developed in [1] to prove the convergence.

[1] R. Atar and A. Cohen. An asymptotically optimal control for a multiclass queueing model in the moderate-deviation heavy-traffic regime. submitted to Annals of Applied Probability, 2016.

[2] R. Atar and A. Cohen. A differential game for a multiclass queueing model in the moderate-deviation heavy-traffic regime. Mathematics of Operations Research, to appear, 2016.

[3] W. H. Fleming. Risk sensitive stochastic control and differential games. Communications in Information and Systems, 6(3):161-177, 2006.

[4] D. H. Jacobson. Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games. IEEE Trans. Automatic Control, AC-18(2):124-131, 1973.

·         Ofer Harel, University of Connecticut

Clustering Incomplete Data via Normal Mixture Models

Model-based clustering using Normal mixture models provides a framework to describe how data groups together using Normal distributions. However, the existing methods for such analyses require complete data. One way to handle incomplete data is multiple imputation, a simulation-based approach which bypasses many of the disadvantages present in other methods for handling incomplete data. However, it is difficult to apply multiple imputation and cluster analysis in a straightforward manner.  In this presentation, we develop MICA-N: Multiply Imputed Cluster Analysis for Normally distributed data. MICA-N adds clustering methods to particular steps in multiple imputation in order to create a way to cluster incomplete normal data. We illustrate how MICA-N outperforms existing methodology with a simulation, and then demonstrate the utility of MICA-N on yeast gene expression data. This is joint work with Chantal Larose and Dipak Dey.