1 March 
Sivan Sabbato, Ben Gurion University 

Active linear regression 
8 March 
Yanyuan Ma, University of South Carolina 

Robust mixedeffects model for clustered failure time data: application to Huntington's disease event measures 
15 March 
Matan Gavish, HUJI 


29 March 
Tirza Routtenberg, BGU 


5 April 
Lotem Kaplan, TAU 

Empirical Approach for Sequential Bayesian Inference with Application to Active Learning 
12 April 
Saharon Rosset, TAU 


17 May 
Lucien Birge, Université Pierre et Marie Curie (Paris VI) 


31 May 
Jeroen de Mast, University of Amsterdam 

Estimation of the random error of binary tests using adaptive polynomials 
7 June 
Regev Schweiger, TAU 

Fast and accurate construction of confidence intervals for heritability 
26 June 
Asaf Cohen, University of Michigan 

A Multiclass Queueing Model in the ModerateDeviation HeavyTraffic Regime 
5 July 
Ofer Harel, University of Connecticut 

20 October 
Jelle Goeman, Radboud University Medical Center 

Hommel's method for false discovery proportions 
10 November 
Abraham Wyner, University of Pennsylvania 

Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers 
17 November 
Omer Weissbrod, Tel Aviv University and Technion 

Modern Applications of Linear Mixed Models for Large Genetic Studies 
24 November 
Shahar Mendelson, Technion 

The smallball method and its application in Learning Theory 
1 December 
Amichai Painsky 

Generalized Independent Component Analysis Over Finite Alphabets 
8 December 
Armin Schwartzman, NC State University 

Confidence Regions for Excursion Sets in Asymptotically Gaussian Random Fields 
22 December 
Yoram Yekutieli, Hadassa Academic College 


5 January 
Omer Bobrowski, Duke University 

Seminars are held on Tuesdays, 10.30 am, Schreiber Building, 309 (see the TAU map ). The seminar organizer is Daniel Yekutieli.
To join the seminar mailing list or any other inquiries  please call (03)6409612 or email 12345yekutiel@post.tau.ac.il54321 (remove numbers unless you are a spammer…)
Seminars from previous years
ABSTRACTS
· Jelle Goeman, Radboud university medical center
Hommel's method for false discovery proportions
We study the combination of closed testing with local tests based on the Simes inequality that underlies Hochberg's and Hommel's methods for familywise error control. We show this combination can also be used to make simultaneous confidence statements for the false discovery proportion of arbitrary sets of hypotheses. These confidence statements are at least as powerful as those arising trivially from Hommel’s method, and can be much more powerful. Calculation time is linear in set size after an initial preparatory step that takes m log(m) time for m hypotheses. By their simultaneity, the coverage of these confidence statements is guaranteed even if sets of interest are chosen post hoc. With this method, the role of the user and the method can be reversed in multiple testing. Where normally the user chooses the error rate and the multiple testing method the rejected set, now the user can choose the rejected set and the multiple testing method calculates the error rate. This new method has some interesting connections to the Benjamini & Hochberg FDR method, which we hope to explore further.
· Abraham Wyner, University of Pennsylvania
Explaining the Success of AdaBoost and
Random Forests as Interpolating Classifiers
There is a large literature explaining why AdaBoost is a successful classifier.
The literature on AdaBoost focuses on classifier margins and boosting's
interpretation as the optimization of an exponential likelihood function. These
existing explanations, however, have been pointed out to be incomplete. A random
forest is another popular ensemble method for which there is substantially less
explanation in the literature. We introduce a novel perspective on AdaBoost and
random forests that proposes that the two algorithms work for essentially
similar reasons. While both classifiers achieve similar predictive accuracy,
random forests cannot be conceived as a direct optimization procedure. Rather,
random forests is a selfaveraging, interpolating algorithm which creates a
"spikeysmooth” classifier. We show that AdaBoost has the same property.
We conjecture that both AdaBoost and random forests therefore succeed
because of this mechanism. We provide a number of examples and some theoretical
justification to support this explanation. In the process, we question the
conventional wisdom that suggests that boosting algorithms for classification
require regularization or early stopping and should be limited to low
complexity classes of learners, such as decision stumps. We conclude that
boosting should be used like random forests: with large decision trees and
without direct regularization or early stopping.
· Omer Weissbrod, Tel Aviv University and Technion
Modern Applications of Linear Mixed Models for Large Genetic Studies
In recent years, linear mixed models (LMMs) have become an indispensable tool in largescale genetic studies. At their core, LMMs are probabilistic models encoding the assumption that individuals sharing similar genotypes are more likely to share similar phenotypes. Although their origins date back at least to the 1950s, use of LMMs in genetic studies has recently skyrocketed, owing to increasing sample sizes and algorithmic improvements.
I will describe a model that improves on standard LMMs by assuming a complex parametric form for the covariance matrix, using multiplekernel methods from the machine learning literature. The resulting model can automatically identify genetic interactions, and enables jointly inferring all kernel parameters efficiently. In an analysis of eight human genetic diseases and over a hundred mouse phenotypes, our method outperforms competing methods to attain state of the art phenotype prediction results.
If time allows, I will additionally describe modern methods for LMM hypothesis testing under binary phenotypes, with a special emphasis on proper handling of casecontrol ascertainment.
This work was done my PhD supervisors, Saharon Rosset and Dan Geiger.
The smallball method and its application in Learning Theory
The smallball method leads to nontrivial lower bounds on the infimum of certain empirical processes, most notably, on $inf_{f \in F} \frac{1}{N}\sum_{i=1}^N f^2(X_i)$, where $F$ is a class of functions on a probability space $(\Omega,\mu)$ and $X_1,...,X_N$ are independent, distributed according to $\mu$. Such estimates play an essential role in many problems, for example, Sparse Recovery; the smallest singular value of random matrices with iid rows; (random) Gelfand widths of convex bodies, etc. The main feature of the smallball method is that it requires rather minimal assumptions on the underlying class $F$. Thus, the lower bound may be established even in situations in which twosided concentration is simply false.
I will present the highlights of the method and then focus on one application: obtaining a sharp (minimax) bound on the error rate of Empirical Risk Minimization performed in a convex class. Using the smallball method one may obtain a rate that scales correctly with the "richness" of the class (the somewhat vague term will be clarified) and with the noise level of the problem; in particular, when the noise level tends to zero, the rate tends to "noisefree" (realizable) rate.
· Amichai Painsky, Tel Aviv University
Generalized Independent Component Analysis Over Finite Alphabets
Independent component analysis (ICA) is a statistical method for transforming an observable multidimensional random vector into components that are as statistically independent as possible from each other. Usually the ICA framework assumes a model according to which the observations are generated (such as a linear transformation with additive noise). ICA over finite fields is a special case of ICA in which both the observations and the independent components are over a finite alphabet.
In this work we consider a generalization of this framework in which an observation vector is decomposed to its independent components (as much as possible) with no prior assumption on the way it was generated. This generalization is also known as Barlow’s minimal redundancy representation problem and is considered an open problem. We propose several theorems and show that this NP hard problem can be accurately solved with a branch and bound search tree algorithm, or tightly approximated with a series of linear problems. Moreover, we show there exists a simple transformation (namely, order permutation) which provides a greedy yet very effective approximation of the optimal solution. We further show that while not every random vector can be efficiently decomposed into independent components, the vast majority of vectors do decompose very well (that is, with a small constant cost), as the dimension increases. Our contribution provides the first efficient set of solutions to Barlow’s problem.
The minimal redundancy representation has many applications, mainly in the fields of neural networks and deep learning. In this work we show that this formulation further applies to large alphabet source coding.
Joint work with Prof. Saharon Rosset and Prof. Meir Feder from the EE department.
· Armin Schwartzman, NC State University
Confidence Regions for Excursion Sets in Asymptotically Gaussian Random Fields, with an Application to Climate
The goal of this work is to give confidence regions for the excursion set of a spatial function above a given threshold from repeated noisy observations on a fine grid of fixed locations. Given an asymptotically Gaussian estimator of the target function, a pair of datadependent nested excursion sets are constructed that are sub and supersets of the true excursion set, respectively, with a desired confidence. Asymptotic coverage probabilities are determined via a multiplier bootstrap method, not requiring Gaussianity of the original data nor stationarity or smoothness of the limiting Gaussian field. The method is used to determine geographical regions where the mean summer and winter temperatures are expected to increase by mid 21st century by more than 2 degrees Celsius.
· יורם יקותיאלי, המחלקה למדעי המחשב, מכללה אקדמית הדסה ירושלים.
מערכת תומכת מומחה לקביעת הנדירות של עקבות נעליים
עקבות נעליים מהוות אחת הראיות החשובות במשפט הפלילי, אך הן נתונות לאחרונה תחת התקפה בדבר התוקף המדעי שלהן. הפרקטיקה הקיימת מתבססת על חיפוש סימנים יחודיים בעקבות הנמצאות בזירת הפשע וגם בהטבעות נסיון של נעליי החשודים הנבדקות במעבדה. סימנים יחודיים אלה הם פגמים קטנים בד"כ הנוצרים בנעל במהלך השימוש בה. כמות הפגמים התואמים שמצליחים למצוא בעקבת זירת הפשע ובהטבעת הנסיון שבמעבדה וגם גודלם וצורתם מהווים מדד למידת ההתאמה בין העקבות. כדי לחזק את התוקף המדעי של ענף עקבות הנעליים פיתחנו מערכת להערכה כמותית של נדירותם של פגמים בהטבעות נסיון. אספנו מאגר נתונים גדול של כ 13 אלפים פגמים שונים על פני כארבע מאות נעליים, ואפיינו את הנדירות של הפגמים. האפייון כלל הערכת הההתפלגויות של שלושה מאפיינים של הפגמים – מיקומם על פני הנעל, כיוונם, וצורתם. לצורך זה פיתחנו שיטות להערכת מאפיינים אלה, כולל שגיאת המדידה עבור כל מאפיין. בהרצאה אציג את המערכת, שיטות האיסוף והערכת ההתפלגויות, הבעיות הקיימות וכיוונים להמשך המחקר.
בשיתוף:
ירון שור, שרינה ויזנר וצדוק צח, מעבדת סימנים וחומרים, המחלקה לזיהוי פלילי, משטרת ישראל.
פרופ’ מיכה מנדל ותלמידת הדוקטורט נעמי קפלןדמרי, המחלקה לסטטיסטיקה, האוניברסיטה העברית.
· Omer Bobrowski, Duke University
Random Topology and its Applications
The study of random topology focuses on describing highlevel qualitative properties of random spaces. This field has been rapidly developed in the past decade, and is driven both by deep theoretical questions and stateoftheart data analysis applications. Topological Data Analysis (TDA) broadly refers to the use of concepts from mathematical topology to analyze data and networks. In the past decade a variety of powerful topological tools has been introduced, and were proven useful for applications in various fields (e.g. shape analysis, signal processing, neuroscience, and genomic research). The theory developed in random topology aims to provide a solid foundation for the statistical analysis of these methods, which to date is at a very preliminary stage.
This talk will be divided into three parts. First, I will provide an introduction and motivation to random topology and TDA. Next, we will discuss the theory of random geometric complexes. In particular, we will consider their Betti numbers (counting “cycles” in different dimensions), and phase transitions related to these. Finally, we will present a few examples of combining the theory of random complexes with statistical problems related to TDA. In particular, we are interested in analyzing ``topological noise” that appears in such problems, for the purposes of filtering and hypothesis testing.
· Sivan Sabato, BenGurion University
Active linear regression
We propose a new active learning algorithm for parametric
linear regression with random design. We provide finite sample convergence
guarantees for general distributions in the misspecified model. This is the
first active learner for this setting that provably can improve over passive
learning.
Following the stratification technique advocated in MonteCarlo function
integration, our active learner approaches the optimal risk using piecewise
constant approximations. Experiments demonstrate the algorithm's convergence to
the oracle rates.
Joint work with Remi Munos, INRIA Lille.
· Yanyuan Ma, University of South Carolina
Robust mixedeffects model for clustered failure time
data: application to Huntington's disease event measures
An important goal in clinical and statistical research is estimating the distribution for clustered failure times, which have a naturalintraclass dependency and are subject to censoring. We propose to handle these inherent challenges with a novel approach that does not impose restrictive modeling or distributional assumptions. Rather, using a logit transformation, we relate the distribution for clustered failure times to covariates and a random, subjectspecific effect such that the covariates are modeled with unknown functional forms, and the random effect is distributionfree and potentially correlated with the covariates. Over a range of time points, the model is shown to be reminiscent of an additive logistic mixed effect model. Such a structure allows us to handle censoring via pseudovalue regression and to develop semiparametric techniques that completely factor out the unknown random effect. We show both theoretically and empirically that the resulting estimator is consistent for any choice of random effect distribution and for any dependency structure between the random effect and covariates. Lastly, we illustrate the method's utility in an application to the Cooperative Huntington's Observational Research Trial data, where our method provides new insights into differences between motor and cognitive impairment event times in subjects at risk for Huntington's disease.
Optimal thresholding of singular values and eigenvalues
It is common practice in multivariate and matrixvalued data analysis to reduce dimensionality by performing a Singular Value Decomposition or Principal Component Analysis, and keeping only $r$ singular values or principal components, the rest being presumably associated with noise. However, the literature does not propose a disciplined criterion to determine $r$; most practitioners still look for the ``elbow in the Scree Plot'', a 50yearsold heuristic performed by eye. I'll review a line of work which develops a systematic approach to eigenvalue and singular value thresholding. This approach assumes that the signal is lowrank and that the noise is rotationally invariant. Recent results derive optimal thresholds in the presence of quite general noise distributions.
Joint work with David Donoho, Iain Johnstone and Edgar Dobriban (Stanford).
Estimation after parameter selection: Estimation methods, performance analysis, and adaptive sampling
In many practical parameter estimation problems, such as medical experiments and cognitive radio communications, parameter selection is performed prior to estimation. The selection process has a major impact on subsequent estimation by introducing a selection bias and creating coupling between decoupled parameters. As a result, classical estimation theory may be inappropriate and inaccurate and a new methodology is needed. In this study, the problem of estimating a preselected unknown deterministic parameter, chosen from a parameter set based on a predetermined databased selection rule, Ψ, is considered. In this talk, I present a general nonBayesian estimation theory for estimation after parameter selection, includes estimation methods, performance analysis, and adaptive sampling strategies. First, I use the postselection meansquare error (PSMSE) criterion as a performance measure instead of the commonly used meansquareerror (MSE). The corresponding CramérRaotype bound on the PSMSE of any Ψunbiased estimator is derived, where the Ψ unbiasedness is in the Lehmann unbiasedness sense. The postselection maximumlikelihood (PSML) estimator is presented and its Ψ–efficiency properties are demonstrated. Practical implementations of the PSML estimator are proposed as well. Finally, I discuss the concept of adaptive sampling in a twosampling stages scheme of selection and estimation.
Empirical Approach for Sequential Bayesian Inference with Application to Active Learning
Active learning is a sequential learning framework, that holds two main components: (i) A classification model, and (ii) Querying strategy to choose the next data instance to observe. The querying strategy can have important consequences for the performance of the classifier.
The Active learning field is closely related to sequential experiment design, in the statistical literature. We approach this problem by applying sequential Bayesian inference to logistic regression. In this work we follow and extend the empirical representation of posterior distributions by Dror and Steinberg (2008) and show its application to binary data and active learning.
This seminar describes Lotem Kaplan's dissertation work under the supervision of Prof. David Steinberg and is presented as part of the PhD requirements of Tel Aviv University.
Estimating RandomX Prediction Error of Regression Models
The areas of model selection and model evaluation for predictive modeling have received extensive treatment in the statistics literature, leading to both theoretical results and practical methods based on covariance penalties and other approaches. However, the vast majority of this work is based on the ``FixedX assumption'', where covariate values are assumed to be nonrandom and known. By contrast, in most modern predictive modeling applications, it is more reasonable to take the ``RandomX'' view, where future prediction points are random and new. In this work we concentrate on examining the applicability of the covariancepenalty approaches to this problem. We propose a decomposition of the RandomX prediction error that clarifies the additional error due to RandomX, which is present in both the variance and bias components of the error. This decomposition is very general, but we focus on its application to the most fundamental case, of least squares regression. We show how to quantify the excess variance using simple randommatrix results, leading to a covariance penalty approach generalizing previous methods, like Tukey's MSPE and GCV. To account for excess bias, we propose to take only the bias component of the ordinary cross validation (OCV) estimate, resulting in a hybrid penalty term. This hybrid penalty approach is shown to be empirically superior to OCV in regression model evaluation simulations. This empirical observation is supported by a common representation of the two methods, which shows that the OCV has one more random component.
This talk is based on joint work with Ryan Tibshirani, and describes work in progress.
· Lucien Birge, Université Pierre et Marie Curie (Paris VI)
ρestimation – a robust alternative to maximum likelihood
It is wellknown that the maximum likelihood, although widely used in Statistics, suffers from various defects. From a theoretical point of view, strong assumptions are needed to ensure that it performs in a satisfactory way and from a practical one, it is definitely not robust, i.e. quite sensitive to model misspecifications. These problems are mainly connected to the use of the log function, which is not bounded, to define the loglikelihood ratios. Replacing it by a specific bounded function leads to an alternative methods with better properties, including robustness (in the sense of Peter Huber). Moreover, in nice parametric models and when the true distribution does belong to the model, the new estimator is asymptotically equivalent to the maximum likelihood estimator.
The interested persons may look at http://arxiv.org/abs/1403.6057.1, this is joint work with Yannick Baraud and Mathieu Sart, this talk is a Sackler distinguished lecture.
· Jeroen de Mast, University of Amsterdam
Estimation of the random error of binary tests using adaptive polynomials
We propose an efficient and robust method for estimating the random error of binary tests and measurements in the situation that a gold standard or reference standard is not available. Recent studies have shown that in this situation, the widely used false acceptance probability (FAP) and false rejection probability (FRP), or their complements specificity and sensitivity, are difficult to estimate. We show by mathematical analysis that this problem is inherent to the unavailability of a gold standard. Instead, the proposed method determines the random components of the error probabilities: the inconsistent acceptance and rejection probabilities IAP and IRP . The estimation method is based on an extension of the beta distribution, and adapts the model complexity to the data. For estimating efficiency, the approach works from a sample taken from the stream of initially rejected parts and also incorporates a historical rejection rate (“baseline data”). A simulation study demonstrates the method’s robustness and is the basis for sample size recommendations. The usefulness of the method is shown from a reallife application, where it outperforms existing methods.
Fast and accurate construction of confidence intervals for heritability
Estimation of heritability is fundamental in genetic studies. Recently, heritability estimation using linear mixed models (LMMs) has gained popularity, because these estimates can be obtained from unrelated individuals collected in genomewide association studies. Typically, heritability estimation under LMMs uses the restricted maximum likelihood (REML) approach. Existing methods for the construction of confidence intervals and standard errors for REML rely on asymptotic properties. However, these assumptions are often violated due to the bounded parameter space, statistical dependencies, and limited sample size, leading to biased estimates and inflated or deflated confidence intervals. In our work, we show that the construction of confidence intervals by current methods is inaccurate, especially when the true heritability is relatively low or relatively high. We propose a computationally efficient method for the estimation of the distribution of the heritability estimator, and for the construction of accurate confidence intervals.
Joint work with Eran Halperin, Saharon Rosset, Shachar Kaufman and Eleazar Eskin.
· Asaf Cohen, University of Michigan
A Multiclass Queueing Model in the ModerateDeviation HeavyTraffic Regime
Risk sensitive control with vanishing noise problems deals with a risk averse decision maker who observes a small noise controlled diffusion process. The model was presented by Jacobson in [4] and was broadly studied afterwards; see, e.g., [3]. By Girsanov's change of measure type of arguments one usually deduces that the limit of such problems is governed by a differential game, whose solution plays a role in finding an asymptotically optimal control in the original problem. Traditionally, queueing systems under heavytraffic are approximated by diffusion processes. I will present a critically loaded system with small overload probability. Thus, the evolution of the system is approximated by a small noise diffusion. The controller is risk averse and uses a discounted version of the risk sensitive cost. Change of measure arguments do not apply here. We developed new tools to show that the model is governed by a differential (deterministic) game. We explicitly solved the game and showed that its solution eventually leads to an asymptotically optimal stationary feedback policy in the queueing problem.
This is joint work with Rami Atar. The talk is based on [1] and [2], with a focus on the new techniques that were recently developed in [1] to prove the convergence.
[1] R. Atar and A. Cohen. An asymptotically optimal control for a multiclass queueing model in the moderatedeviation heavytraffic regime. submitted to Annals of Applied Probability, 2016.
[2] R. Atar and A. Cohen. A differential game for a multiclass queueing model in the moderatedeviation heavytraffic regime. Mathematics of Operations Research, to appear, 2016.
[3] W. H. Fleming. Risk sensitive stochastic control and differential games. Communications in Information and Systems, 6(3):161177, 2006.
[4] D. H. Jacobson. Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games. IEEE Trans. Automatic Control, AC18(2):124131, 1973.
· Ofer Harel, University of Connecticut
Clustering Incomplete Data via Normal Mixture Models
Modelbased clustering using Normal mixture models provides a framework to describe how data groups together using Normal distributions. However, the existing methods for such analyses require complete data. One way to handle incomplete data is multiple imputation, a simulationbased approach which bypasses many of the disadvantages present in other methods for handling incomplete data. However, it is difficult to apply multiple imputation and cluster analysis in a straightforward manner. In this presentation, we develop MICAN: Multiply Imputed Cluster Analysis for Normally distributed data. MICAN adds clustering methods to particular steps in multiple imputation in order to create a way to cluster incomplete normal data. We illustrate how MICAN outperforms existing methodology with a simulation, and then demonstrate the utility of MICAN on yeast gene expression data. This is joint work with Chantal Larose and Dipak Dey.