16 March 
Tirza Routtenberg, BGU 
Performance Bounds for Estimation After Model Selection 
Zoom recording 

6 April 
Nir Keret, TAU 
Optimal Cox Regression Subsampling Procedure with Rare Events 
Zoom recording 

13 April 
Malgorzata Bogdan, Wroclaw U. of Science and Technology 
Ghost Quantitative Trait Loci and hotspots: What might happen if the signal is not sparse? 
Zoom recording 

18 May 
Gil Kur, MIT 
On the Minimal Error of Empirical Risk Minimization 
Zoom recording 

25 May 
Vladimir Vovk, Royal Holloway, University of London 

Abstract 
Zoom recording 
1 June 
Assaf Rabinowicz, TAU 

Abstract 
Zoom recording 
8 June 
David Steinberg, TAU 

Abstract 
Zoom recording 





20 October 
Felix Abramovich, TAU 
Highdimensional classification by sparse logistic regression 

27 October 
Taeho Kim, Haifa University 
Improved Multiple Confidence Intervals via Thresholding Informed by Prior Information 

12 November 
Somabha Mukherjee, UPenn 
Statistical Inference on Dependent Combinatorial Data: The Ising Model 

17 November 
Amit Moscovic, Princeton 
Nonparametric estimation of highdimensional shape spaces with applications to structural biology 

1 December 
Ruth Heller, TAU 
Optimal control of false discovery criteria in the twogroup model 

8 December (6:30pm) 
Alon Kipnis 
Twosample problem for large, sparse, highdimensional distributions under rare/weak perturbations 

15 December 
Yves Rozenholc, Paris Descartes 
Differential Analysis in Transcriptomic : the Strength of Randomly Picking socalled Reference Genes 

29 December 
Dan Vilenchik, Ben Gurion U. 
Computationalstatistical tradeoffs in the problem of finding sparse Principal Components in highdimensional data 


5 January 
Rui Castro 




§ 


























Seminars are held on Tuesdays, 10.30 am, Schreiber Building, 309 (see the TAU map ). The seminar organizer is Daniel Yekutieli.
To join the seminar mailing list or any other inquiries  please call (03)6409612 or email 12345yekutiel@post.tau.ac.il54321 (remove numbers unless you are a spammer…)
Seminars from previous years
ABSTRACTS
Highdimensional classification by sparse logistic regression
In this talk we consider highdimensional classification. We discuss first highdimensional binary classification by sparse logistic regression, propose a model/feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the nonasymptotic bounds for the resulting misclassification excess risk. Implementation of any complexity penaltybased criterion, however, requires a combinatorial search over all possible models. To find a model selection procedure computationally feasible for highdimensional data, we consider logistic Lasso and Slope classifiers and show that they also achieve the optimal order. We extend further the proposed approach to multiclass
classification by sparse multinomial logistic regression and discuss various possible types of sparsity in the multiclass setup.
This is a joint work with Vadim Grinshtein and Tomer Levy.
· Taeho Kim, Haifa University
Improved Multiple Confidence Intervals via Thresholding Informed by Prior Information
Consider a statistical decision problem where multiple sets of parameters are of interest. To simultaneously infer about these parameters, a multiple interval estimator (MIE) can be constructed. In this study, an MIE with better performance than existing MIEs, in particular, relative to a zbased MIE, is developed using a thresholding approach. The determination of the thresholds in this MIE is informed by assigning prior distributions to each of the sets of parameters. The performance of the MIE is evaluated using two measures: (i) a global coverage rate and (ii) a global expected content, which are both averages with respect to the prior distribution. The proposed MIE procedure, which is developed with respect to these two performance measures, is called a Bayes MIE with thresholding (BMIE_Thres).
A multivariate normal model with the conjugate prior is utilized to develop the BMIE_Thres for the mean vector. The behavior of BMIE_Thres is then analytically investigated in terms of the performance measures. It is shown that the performance of the BMIE_Thres approaches those of the zbased MIE as the thresholds become large.
In this presentation, inseason baseball batting average data and leukemia gene expression data are used to demonstrate the procedure for the known and unknown standard deviations settings, respectively. In addition, simulation studies are also presented to compare the BMIE_Thres with the classical and Bayes MIEs.
· Somabha Mukherjee Department of Statistics, The Wharton School, University of Pennsylvania
Statistical Inference on Dependent Combinatorial Data: The Ising Model
Dependent data arise in all avenues of science, technology and society, such as facebook friendship networks, epidemic networks, election data and peer group effects. Analysis of dependent combina torial data is crucial for understanding the behavior of edge and higherorder motif estimates in very large and inaccessible networks, deriving asymptotics of graphbased tests for equality of distributions, in the study of coincidences, and many more seemingly diverse areas in statistics and probability. In this talk, I am going to focus on the Ising model, which is a useful framework introduced by statistical physicists, and later used by statisticians, for modeling dependent binary data. In its original form, the Ising model can capture only pairwise interactions, which are seldom observed in the real world. For example, in a peer group, the decision of an individual is affected not just by pairwise communications, but by interactions with larger community tuples. It is also known in Physics, that atoms on a crystal surface interact not just in pairs, but in triplets and higherorder tuples. These higherorder interactions can be captured by the so called tensor Ising models, where the Hamiltonian (sufficient statistic) is a multilinear form of degree p. I will show how to estimate the natural parameters of this model, why maximumlikelihood estimation fails in more general Ising models, and will briefly talk about the asymptotics of the parameter estimates in this model. The asymptotics are highly nonstandard, characterized by the presence of a critical curve in the interior of the parameter space on which the estimates have a limiting mixture distribution, and a surprising superefficiency phenomenon at the boundary point(s) of this critical curve. I will also consider a more realistic version of the Ising model, which is a generalization of the vanilla logistic regression, and talk briefly about estimating the natural parameters of this model under sparsity assumptions on the parameters. Towards the end, I will talk briefly about some other places where dependent combinatorial data arise, including graphbased nonparametric tests for equality of
· Amit Moscovich, Princeton University.
Nonparametric estimation of highdimensional shape spaces with applications to structural biology
Over the last twenty years, there have been major advances in nonlinear dimensionality reduction, or manifold learning, and nonparametric regression of highdimensional datasets with low intrinsic dimensionality. A key idea in this field is the use of datadependent Fourierlike basis vectors given by the eigenvectors of a graph Laplacian. These eigenvectors provide a natural basis for representing and estimating smooth signals. Their use for estimation over arbitrary domains generalizes the classical notion of regression using orthogonal function series. In this talk, I will discuss the application of such methods for mapping spaces of volumetric shapes with continuous motion. Three lines of research will be presented:
(i) Highdimensional nonparametric estimation of distributions of volumetric signals from noisy linear measurements.
(ii) Leveraging the Wasserstein optimal transport metric for manifold learning and clustering.
(iii) Nonlinear independent component analysis for analyzing independent motions.
A key motivation for this work comes from structural biology, where breakthrough advances in cryoelectron microscopy have led to thousands of atomicresolution reconstructions of various proteins in their native states. However, the success of this field has been mostly limited to the estimation of rigid structures, while many important macromolecules contain several parts that can move in a continuous fashion, thus forming a manifold of conformations which cannot be estimated using existing tools. The methods described in this talk present progress towards the solution of this grand challenge, namely the extension of pointestimation methods which output a single 3D conformation to estimators of entire manifolds of conformations.
Optimal control of false discovery criteria in the twogroup model
The highly influential twogroup model in testing a large number of
statistical hypotheses assumes that the test statistics are drawn independently
from a mixture of a high probability null distribution and a low probability
alternative. Optimal control of the marginal false discovery rate (mFDR), in
the sense that it provides maximal power (expected true discoveries) subject to
mFDR control, is known to be achieved by thresholding the local false discovery
rate (locFDR), the probability of the hypothesis being null given the set of
test statistics, with a fixed threshold. We address the challenge of
controlling optimally the popular false discovery rate (FDR) or positive FDR
(pFDR) in the general twogroup model, which also allows for dependence between
the test statistics. These criteria are less conservative than the mFDR
criterion, so they make more rejections in expectation.
We derive their optimal multiple testing (OMT) policies, which turn out to be
thresholding the locFDR with a threshold that is a function of the entire set
of statistics. We develop an efficient algorithm for finding these policies,
and use it for problems with thousands of hypotheses. We illustrate these
procedures on gene expression studies.
Joint work with Saharon Rosset
Twosample problem for large, sparse, highdimensional distributions under rare/weak perturbations
Consider two samples, each obtained by independent draws from two possibly different distributions over the same finite and large set of features. We would like to test whether the two distributions are identical, or not. We propose a method to perform a twosample test of this form by taking featurebyfeature pvalues based on a binomial allocation model, combining the pvalues using Higher Criticism. Performance on realworld data (e.g. authorship attribution challenges) shows this to be an effective unsupervised, untrained discriminator even under violations of the binomial allocation model.
We analyze the method in a `rare/weak departures' setting where, if two distributions are actually different, they differ only in relatively few features and only by relatively subtle amounts. We perform a phase diagram analysis in which the phase space quantifies how rare and how weak such departures are. Although our proposal does not require any formal specification of an alternative hypothesis, nor does it require any specification of a baseline or null hypothesis, in the limit where counts are high, the method delivers the optimal phase diagram in the rare/weak setting: it is asymptotically fully powerful inside the region of phase space where a formally specified test would have been fully powerful. In the limit where counts are low, we derive the phase diagram as well, although the optimality of the resulting diagram remains an open question.
· Yves Rozenholc, Paris Descartes
Differential
Analysis in Transcriptomic : the Strength of Randomly Picking socalled
Reference Genes.
Transcriptomic analysis are characterized by being not directly
quantitative and to only provide relative measurements of expression levels up
to an unknown individual scaling factor. Assuming that some housekeeeping genes
are known, one can use their observed relative expression levels to get a
normalization (Vandesompele et al. 2002). However, in exploratory differential
analysis, it is easily understandable that reference genes cannot always be
known in advance. Apart from the crude normalization by the total count
(Marioni et al. 2008), several
methods have been proposed to circumvent this issue : upper quantile (Bullard
et al. 2010), trimmed mean of M values (TMM) (Mark D. Robinson and Oshlack
2010) and interindividual median count ratio accross gene (Anders and Huber
2010), which can be found in the Bioconductor packages DESeq2 (Love, Huber, and
Anders 2014) and EdgeR (Mark D Robinson, McCarthy, and Smyth 2010). More
recently, Li et al. (2012) propose to use loglinear fits to detect DE genes,
however it also relies on a scaling factor estimation achieved by starting from
the total count to selected iteratively a subset of genes associated with small
values of a Poisson goodnessoffit statistic. All these methods are based on
the belief that reference genes may be identified as their expression levels
are expected to be more stable in the overall population. However, first, one
can easily understand that the unknown scaling factors may have a strong
deleterious effect on this belief, second, one can build counterexamples to
such approaches by considering reference genes showing more variability than
nonhousekepping ones.
In brief, actual procedures for differential analysis in such highthrouput
transcriptomic experiments are build on a preliminary step, which consists in
finding some non differential expressions to estimate the scaling factors. Then
data are reused for testing. It is not only unsatisfactory to lack a good
recipe for this first step, but also unproper and statistically worst, to do a differential
analysis by having to run at first a nondifferential analysis on the same
data.
Our intensive iterative random procedure for detection can be summarized as
follow. At each step of the iteration, a random subset of genes is selected and
considered to be made of reference genes, used to get a normalization. After
this normalization, the nonselected genes are tested for differential
behaviors. Along the iterations, the detections for each gene are pooled. After
the iterations, the pooled detections are compared to the rates of potential
wrong detections due to misspicking randomly genes in the unknown set of DE
genes. Our method controls the FWER for any test procedure having its level and
power controled when the scaling factors are known. It is adaptive to the
unknown number of genes which would be detectable, given the observations, if
the scaling factors were known, assuming only that the number of DE genes is
less than half of the total number of genes.
Moreover, enjoying that our procedure behaves as if reference genes were
available, we propose and study a unified testing procedure for differential
analysis, adapted to our random detector for the two classical modelizations
Poisson and Negative binomial. This test derives from a procedure where scaling
factors would be known and in this sense satisfies the requirements in term of
type I and II errors of our random procedure. Assuming that the expressions
levels are high enough, we study its properties. It is shown to be
approximately a standard Gaussian and we derive nonasymptotic control for this
approximation such that the test can have its level well controlled at finite
distance.
· Dan Vilenchik, Ben Gurion U.
Computationalstatistical tradeoffs in the problem of finding sparse Principal Components in highdimensional data
The problem of consistently estimating the covariance matrix of a pdimensional random variable X is well understood when the ratio p/n goes to zero “sufficiently fast” (n being the number of samples). In many applied scenarios one is trying to solve an easier task, estimating the leading eigenvector(s) of the covariance matrix (known as its leading Principal Component(s)). However, in a highdimensional setting, where the ratio p/n is a constant or even grows to infinity with n, both tasks becomes much tricker. In some cases, efficiency and statistical consistency need to be traded off. One popular approach to settle this tradeoff is by using various efficient estimators that guarantee consistency only in a certain regime of parameters. In this talk we consider a different approach. We suggest a hierarchy of estimators, each level in the hierarchy is an estimator that spends more computational resources than its predecessors. We provide a rigorous analysis of our approach in the spikedcovariance model, where we explicate the required level in the hierarchy to guarantee a statistically consistent solution, as a function of the SNR. We also provide simulation results that demonstrate the usefulness of our approach.
Paper appeared in COLT 2020.
Performance Bounds for Estimation After Model Selection
In many practical parameter estimation problems, such as coefficient estimation of polynomial regression and directionofarrival (DOA) estimation, the exact model is unknown and a model selection stage is performed prior to estimation. This databased model selection stage affects the subsequent estimation, e.g. by introducing a selection bias. Thus, new methodologies are needed for both frequentist and Bayesian estimation. In this study, the problem of estimating unknown parameters after a databased model selection stage is considered. In the considered setup, the selection of a model is equivalent to the recovery of the deterministic support of the unknown parameter vector. We assume that the databased model selection criterion is given and analyze the consequent Bayesian and frequentist estimation properties for this specific criterion. For Bayesian parameter estimation after model selection, we develop the selective Bayesian CramérRao bound (CRB) on the meansquarederror (MSE) of coherent estimators that force unselected parameters to zero. Similarly, for the frequentist (nonBayesian) estimation of deterministic unknown parameters, we derive the corresponding frequentist CRB on the MSE of any coherent estimator, which is also Lehmannunbiased. To this end, the relevant Lehmannunbiasedness is defined, with respect to the model selection rule. We analyze the properties of the proposed selective CRBs including the order relation with the oracle CRBs that assume knowledge of the model. The selective CRBs are evaluated in simulations and are shown as an informative lower bound on the performance of practical coherent estimators. As time permits, I will discuss similar ideas that can be applied to estimation in GoodTuring models.
Optimal Cox Regression Subsampling Procedure with Rare Events
Massive sized survival datasets are becoming increasingly prevalent with the development of the healthcare industry. Such datasets pose computational challenges unprecedented in traditional survival analysis usecases. A popular way for coping with massive datasets is downsampling them to a more manageable size, such that the computational resources can be afforded by the researcher. Cox proportional hazards regression has remained one of the most popular statistical models for the analysis of survival data todate. This work addresses the settings of right censored and possibly left truncated data with rare events, such that the observed failure times constitute only a small portion of the overall sample. We propose Cox regression subsamplingbased estimators that approximate their fulldata partiallikelihoodbased counterparts, by assigning optimal sampling probabilities to censored observations, and including all observed failures in the analysis. Asymptotic properties of the proposed estimators are established under suitable regularity conditions, and simulation studies are carried out to evaluate the finite sample performance of the estimators. We further apply our procedure on UKbiobank colorectal cancer genetic and environmental risk factors.
· Malgorzata Bogdan, Wroclaw U. of Science and Technology
Ghost Quantitative Trait Loci and hotspots: What might happen if the signal is not sparse?
Ghost quantitative trait loci (QTL) are the false discoveries in QTL mapping that arise due to the “accumulation” of the polygenic effects, uniformly distributed over the genome. The locations of ghost QTL depend on a specific sample correlation structure determined by the genotypes at all loci and have a tendency to replicate when the same genotypes are used to study multiple QTL, e.g. using recombinant inbred lines or studying the expression QTL. In this case, the ghost QTL phenomenon can lead to false hotspots, where multiple QTL show apparent linkage to the same locus. We illustrate the problem using the classic backcross design and propose a solution based on the extended mixed effect model, where the random effects are allowed to have a nonzero mean. We provide formulas for estimating the thresholds for the corresponding ttest statistics and use them in the stepwise selection strategy, which allows for a simultaneous detection of several QTL. We report the results of extensive simulation studies which illustrate that our approach can eliminate ghost QTL/false hotspots, while preserving a high power of true QTL detection. This is a joint work with Jonas Wallin (Lund University), Piotr Szulc (University of Wroclaw), Rebecca Doerge (CMU) and David Siegmund (Stanford).
On the Minimal Error of Empirical Risk Minimization
In recent years, highly expressive machine learning models, i.e. mod els that can express rich classes of functions, are becoming more and more commonly used due their success both in regression and classifica tion tasks, such models are deep neural nets, kernel machines and more. From the classical theory statistics point of view (the minimax theory), rich models tend to have a higher minimax rate, i.e. any estimator must have a high risk (a “worst case scenario” error). Therefore, it seems that for modern models the classical theory may be too conservative and strict.
In this talk, we consider the most popular procedure for regression task, that is Empirical Risk Minimization with squared loss (ERM) and we shall analyze its minimal squared error both in the random and the fixed design settings, under the assumption of a convex family of functions. Namely, the minimal squared error that the ERM attains on estimating any function in our class in both settings. In the fixed design setting, we show that the error is governed by the global complexity of the entire class. In contrast, in random design, the ERM may only adapt to simpler models if the local neighborhoods around the regression function are nearly as complex as the class itself, a somewhat counterintuitive conclusion. We provide sharp lower bounds for performance of ERM for both Donsker and nonDonsker classes. This talk is based on joint work with Alexander Rakhlin.