14 March 
Jonathan Rozenblatt, BGU 


21 March 
David Horn, TAU 

The WeightShape Decomposition of ScaleSpace Distributions: a Framework for Clustering Algorithms 
28 March 
Hilary Finucane, MIT 


4 April 
Yakir Reshef, MIT 

Estimating functional correlation from genomewide association study summary statistics 
16 May 
Amit Moscovich Eiger, TAU 

Minimaxoptimal semisupervised regression on unknown manifolds 
23 May 
Marianna Pensky, University of Central Florida 


6 June 
Zhaohui Qin, Emory University 

Utilizing Big Data to solve small data inference problem 
27 June 
Ofer Harel, Uconn 


3 July 
Nathan Srebro, TTI 

1 November 
Daniel Yekutieli, TAU 


8 November 
Dan Garber, Dan Garber, Toyota Technological Institute at Chicago 


16 November 
Geoff Vining, Virgina Tech –Schreiber 008 

A Cautionary Note on Bayesian Approaches within Quality Improvement 
29 November 
Dovi Poznanski, TAU 

From Kaplun to Schreiber in 24 slides: a nonformal astrostatistics talk 
13 December 
Daniel Nevo, Harvard 


20 December 
Tamar Sofer, University of Washington 


3 January 
Roee Guttman, Brown University 


10 January 
Yaakov Malinovsky, UMBC 


17 January 
Assaf Weinstein, Stanford 


24 January 
Aya Cohen, Technion 






Seminars are held on Tuesdays, 10.30 am, Schreiber Building, 309 (see the TAU map ). The seminar organizer is Daniel Yekutieli.
To join the seminar mailing list or any other inquiries  please call (03)6409612 or email 12345yekutiel@post.tau.ac.il54321 (remove numbers unless you are a spammer…)
Seminars from previous years
ABSTRACTS
From posthoc analysis to postselection inference
I will give an introductory talk explaining the connection between the work of Tukey and Scheffe on posthoc analysis, Benjamini and Hochberg’s work on the FDR, the work of Efron and colleagues on the Bayesian FDR, my work with Benjamini on selective inference, the work of Berk et al. on postselection inference, and recent work on frequentist and Bayesian postselection inferences based on the conditional likelihood.
· Dan Garber, Toyota Technological Institute at Chicago
Faster Projectionfree Machine Learning and Optimization
Projected gradient descent (PGD), and its close variants, are often considered the methods of choice for solving a large variety of machine learning optimization problems, including empirical risk minimization, statistical learning, and online convex optimization. This is not surprising, since PGD is often optimal in a very appealing informationtheoretic sense. However, for many problems PGD is infeasible both in theory and practice since each step requires to compute an orthogonal projection onto the feasible set. In many important cases, such as when the feasible set is a nontrivial polytope, or a convex surrogate for a lowrank structure, computing the projection is computationally inefficient in highdimensional settings. An alternative is the conditional gradient method (CG), aka FrankWolfe algorithm, that replaces the expensive projection step with a linear optimization step over the feasible set. Indeed in many problems of interest, the linear optimization step admits much more efficient algorithms than the projection step, which is the reason to the substantial regained interest in this method in the past decade. On the downside, the convergence rates of the CG method often fall behind that of PGD and its variants.
In this talk I will survey an ongoing effort to design CG variants that on one hand enjoy the cheap iteration complexity of the original method, and on the other hand converge provably faster, and are applicable to a wider variety of machine learning settings. In particular I will focus on the cases in which the feasible set is either a polytope or a convex surrogate for lowrank matrices. Results will be demonstrated on applications including: LASSO, video colocalization, optical character recognition, matrix completion, and multiclass classification.
· Geoff Vining, Virgina Tech
A
Cautionary Note on Bayesian Approaches within Quality Improvement
Bayesian approaches are increasingly popular within the statistics community.
However, they currently do not seem to find wide application within the
industrial statistics/quality improvement community. This paper
examines some of the basic reasons why. It begins by reviewing Box's perspective
on the scientific method and discovery. It then examines Deming's
concepts of analytic versus enumerative studies. Together, these
concepts provide a framework for evaluating when Bayesian approaches make good
sense, where they make little sense, and where they fall somewhere inbetween.
This paper uses examples based on statistical sampling plans and the design and
analysis of experiments to illustrate its basic points
Evaluation of Within Group Agreement
Complex, multilevel theories are common in the behavioral sciences research where notions of collective phenomena such as group affect, team efficacy, and organizational climate are studied. A major challenge for researchers working in these areas is that higher level phenomena often cannot be assessed directly, but rather inferences must be made from data collected at lower levels of analysis. In many cases, these phenomena are understood conceptually to arise from lower levels, often from individuals, within these collectives. The methodological implication is that measurement should take place at the lower level, (e.g., the individual level) and then the data should be aggregated to the level of interest, (e.g., the group level or organizational level). It is accepted that within group agreement is a prerequisite for aggregating individual ratings to the group level. Agreement reflects the degree to which the members of the group share a similar view so that the aggregated value can be used to reflect their view.
When justifying aggregation agreement indices, rWG(J), or AD are used together with the intraclass correlation ICC to demonstrate agreement and consistency among lowerlevel units. Along with the progress on evaluating agreement based on, rWG(J) , or AD are still many practical questions how to infer from the calculated agreement indices whether the agreement is large enough to justify aggregation.
In the seminar I shall introduce the rWG(J) and AD indices, explain their properties and how they are used ( and misused). I shall describe and discuss the RGR method, (Bliese & Halverson ,1996) which compares the estimated agreement indices and ICC obtained for actual team data to that of ‘‘pseudo teams’’ formed by randomly combining individual responses into ‘’teams" I shall also point out open questions that still remain concerning how to use the observed values of these indices to infer about agreement and briefly describe recent new developments.
Joint work with Etti Doveh.
· Dovi Poznanski, Astronomy, TAU
From Kaplun to Schreiber in 24 slides: a nonformal astrostatistics talk
Prompted by a random lunch discussion I will discuss with this esteemed crowd a few of the projects my team has worked on, or is currently advancing, hoping to give you a glimpse of what we do in the field of "bigdata astronomy", and spark some discussion on the methodology. I will discuss how we stack spectra of extragalactic objects in order to recover the tiny imprint of the gas in our own galaxy, a project to study supermassive black holes via instrumental systematic noise, and our discoveries using an anomaly detection algorithm that we developed.
Causal mediation analysis for
generalized linear models
In epidemiological, social science and other scientific studies,
mediation analysis is often carried out to assert whether the effect of a
treatment or an exposure on an outcome of interest is mediated by another
covariate. This task concerns the underlying causal mechanism. In this talk, I
will first present the counterfacual framework for causal inference, and
provide background on causal mediation analysis while introducing the causal
parameters of interest. A common method for mediation analysis, termed
"the difference method", compares estimates from models with and
without the suspected mediator and results in estimates that can have a causal
interpretation under certain assumptions. I will formulate the problem
for generalized linear models, and consider the issue of having the same link
function for the conditional and marginal models. Causal mediation effects will
be then estimated by utilizing a data duplication algorithm, together with a
generalized estimating equations approach that also provides straightforward
variance estimation.
This is joint work with Xiaomei Liao and Donna Spiegelman
· Tamar Sofer, University of Washington
Novel approaches for analysis of complex genetic data sets
The Hispanic Community Health Study is a large genetic health study of Hispanic/Latino individuals. Study participants were sampled via a twostage design, leading to a complicated correlation structure, where people may be both genetically and environmentally correlated. I will present two analysis approaches for studies with such complicated structure: a method for estimating the proportion of outcome variance due to genetic effects (heritability) and particularly, confidence intervals for heritability, and a metaanalysis method for combining association studies conducted on multiple study strata, when individuals are correlated between strata.
· Roee Guttman, Brown University
Beyond DifferenceinDifferences – A Bayesian Procedure to Estimate the Effects of Nursing Home BedHold Policies
Nursing home bedhold policies provide continuity of care for Medicaid beneficiaries by paying nursing homes to reserve beds so residents can return to their facility of occupancy following an acute hospitalization. Two outcomes that are useful in assessing the effects of these policies on the quality of care are the nursing home's rates of acute hospitalization and mortality. Evaluation of policy implications in the absence of randomized experiments has been an important research question in health services research, quantitative sociology and economics. DifferenceInDifferences (DID) methods have been frequently used to account for changes over time unrelated to the policy. Using DID, the change experienced by the group subjected to the policy is adjusted by the change experienced by the group not subjected to the policy. The underlying assumption is that the time trend in the control group is an adequate proxy for the time trend that would have occurred in the treatment group in the absence of the policy. DID may suffer from weaknesses when more than two time points are considered, when the outcomes are not normally distributed or are not scalar, and when the treatment effect is heterogeneous. We propose a new Bayesian procedure that relies on multiply imputing the potential outcomes using past outcomes to overcome these weaknesses. We provide an efficient algorithm to approximate the full Bayesian procedure, and we apply it to estimate the impact of nursing home’s bedhold policy.
· Yaakov Malinovsky, University of Maryland, Baltimore County
Nested Group Testing Procedures
Group testing has its origin in the identi_cation of syphilis in the US army during World War II. It is a useful method that has broad applications in medicine, engineering, and even in airport security control. Consider a _nite population of N units, where unit i has a probability p to be defective. A group test is a simultaneous test on an arbitrary group of units with two possible outcomes: all units are good or at least one of the units is defective. The group testing problem is to construct a procedure which classifes all units in a given population, with as small as possible expected number of tests. In this talk I review previously known results in the group testing literature andpresent new results characterizing optimality of commonly used nested group testing procedures. If time allows, the generalized group testing problem (where unit i has a probability pi to be defective) will be discussed as well. This is Joint work with Paul Albert, NCI
References:
Malinovsky, Y., Albert, P. S. (2016). Revisiting nested group testing procedures: new results, comparisons, and robustness (available in https://arxiv.org/pdf/1608.06330v2.pdf).
Malinovsky, Y. (2016). Sterrett procedure for the generalized group testing problem (available in https://arxiv.org/pdf/1609.04478v2.pdf).
· Assaf Weinstein, Stanford
Empirical Bayes Estimation of a Heteroscedastic Normal Mean
I will revisit a classical problem: X_i~N(theta_i,V_i) indep, V_i known, i=1,...,n, and the goal is to estimate the (nonrandom) means theta_i under sum of squared errors. When the variances are all equal, linear empirical Bayes estimators which model the true means as i.i.d. random variables, lead to (essentially) the JamesStein estimator, and have strong frequentist justifications. In the heteroscedastic case such empirical Bayes estimators are less adequate if the V_i and theta_i are dependent in their empirical distribution. We suggest a new empirical Bayes procedure that groups together observations with similar variances and applies a spherically symmetric estimator to each group separately. Our estimator is exactly minimax and at the same time asymptotically achieves the risk of a stronger oracle than the usual one. The motivation for the new estimator comes from extending a compound decision theory argument from equal variances to unequal variances.
This is joint work with Larry Brown, Zhuang Ma and CunHui Zhang.
A hypothesis testing view of searchlight pattern analysis
Searchlight MultiVoxel Pattern Analysis (MVPA) has been tremendously popular in the neuroimaging community since its introduction, about 10 years ago. The idea of fitting a local/scan/searchlight classifier, can also be found in the genetics literature. In this talk I will outline a typical MVPA analysis pipeline and cast it as a statistical multivariate hypothesis test so that it may be compared to the mass univariate approach (i.e. multiple univariate testing). Seen as a multivariate testing problem, I will discuss the implied hypotheses, potential power gains, and computational shortcuts.
Some of the ideas in this talk have been published in [13]. Some are still work in progress and are yet to be published.
[1] Gilron, Roee, Jonathan Rosenblatt, and Roy Mukamel. “Addressing the ‘problem’ of Temporal Correlations in MVPA Analysis.” In Proceeding of the The 6th International Workshop on Pattern Recognition in Neuroimaging, 2016.
[2] Gilron, Roee, et al. "What's in a pattern? Examining the type of signal multivariate analysis uncovers at the group level." NeuroImage 146 (2017): 113120.
[3] Rosenblatt, Jonathan, Roee Gilron, and Roy Mukamel. “BetterThanChance Classification for Signal Detection.” arXiv:1608.08873 [Stat], August 31, 2016. http://arxiv.org/abs/1608.08873.
The WeightShape Decomposition of ScaleSpace Distributions: a Framework for Clustering Algorithms
We propose an analysis scheme which addresses the scalespace distribution, based on Gaussian kernels applied to datapoints in featurespace. By adding an entropylike variable we prove that the scalespace probability distribution can be written as a product of a weightfunction and a shapedistribution. This weightshape decomposition allows for the construction of three different clustering schemes. Clustering based on the shape distribution coincides with the Quantum Clustering method.
The clustering methodologies are based on flow of replica points in featurespace, We demonstrate and compare them on natural datasets. Our scheme provides an analytic demonstration of pure point and line attractors of replica dynamics. The appearance of the latter will be demonstrated in a bigdata analysis.
Heritability enrichment of specifically expressed genes
identifies diseaserelevant cell types and tissues
For many diseases and traits, genome wide association studies (GWAS) have identified a large number of associated regions of the genome, but to move from an associated region of the genome to a better understanding of the relevant biological processes often requires invitro experiments done in the right cell types or tissues. The relevant cell types and tissues are often unknown, and identifying them is a key step in learning biology from GWAS. In this talk, I will describe our recent work on identifying diseaserelevant cell types and tissues by joint analysis of GWAS data with gene expression data.
I will first describe stratified LD score regression, a method that uses GWAS summary statistics to fit a random effects model. The parameters of this model provide information about the disease such as whether regions of the genome active in a given tissue (e.g., liver) tend to be more associated with disease than regions of the genome active in a second tissue (e.g., brain), adjusting for several confounders and modeling the fact that there are causal variants that are not included in the GWAS. I will then describe our application of this method to gene expression data from several sources, including the GTEx and PsychENCODE consortia, together with GWAS summary statistics for 48 diseases and traits with an average sample size of 86,850. In this analysis, we identified many enrichments, including an enrichment of inhibitory neurons over excitatory neurons for bipolar disorder, and enrichments in the cortex for schizophrenia and in the striatum for migraine. Our results demonstrate that our approach is a powerful way to leverage gene expression data for interpreting GWAS signal.
Estimating functional correlation from genomewide association study summary statistics
Genomewide association studies (GWAS) have grown tremendously in recent years, and new largescale genomics data sets have provided a lens through which to interpret these data to learn disease biology. This is often done by combining GWAS data with genomic annotations containing unsigned information about whether a genetic variant is relevant or not to a biological process such as transcription factor binding. However, there are also many genomic annotations that yield signed information about whether a variant promotes or hinders a biological process. We introduce a method for estimating whether variants with concordant signs according to a genomic annotation also have concordant directions of effect on a trait of interest. Our approach is modelbased, requires only GWAS summary statistics, accounts for correlations among genetic variants and the presence of unmeasured variants, and has the advantage of robustness to some plausible types of confounding. We present preliminary findings obtained by applying our method using signed annotations constructed using a sequencebased predictor of transcription factor binding.
Minimaxoptimal semisupervised regression on unknown manifolds
In recent years, many semisupervised regression and classification methods have been proposed. These methods demonstrated empirical success on some data sets, whereas on others the unlabeled data did not appear to help.
To analyze semisupervised learning theoretically, it is often assumed that the data points lie on a lowdimensional manifold. Under this assumption [1] and [2] have shown that classical nonparametric regression methods, using only the labeled data, can achieve optimal rates of convergence. This implies that asymptotically, as the number of labeled points tends to infinity, unlabeled data does not help. However, typical semisupervised scenarios involve few labeled points, and plenty of unlabeled ones.
In this work ([3]) we clarify the potential benefits of unlabeled data under the manifold assumption, given a fixed amount of labeled points. Specifically, we prove that for a Lipschitz function on a manifold, a simple semisupervised regression method based on geodesic knearestneighbors achieves the finitesample minimax bound on the mean squared error, provided that sufficiently many unlabeled points are available. Furthermore, we show that this approach is computationally efficient, requiring only O(k N log N) operations to estimate the regression function for all N labeled and unlabeled points. We illustrate this approach on two datasets with a manifold structure: indoor localization using WiFi fingerprints and facial pose estimation. In both cases, the proposed method is more accurate and much faster than the popular Laplacian eigenvector regressor [4].
The talk should be accessible to anyone with a general background in statistics and machine learning. Specifically, no knowledge of manifold geometry or minimax theory is assumed.
[1] Bickel, P. J. and Li, B. "Local polynomial regression on unknown manifolds". Tomography, Networks and Beyond (2007).
[2] Lafferty, J. and Wasserman, L. "Statistical analysis of semisupervised regression". NIPS (2007).
[3] Moscovich, A. Jaffe, A. and Nadler, B. "Minimaxoptimal semisupervised regression on unknown manifolds". AISTATS (2017). http://proceedings.mlr.press/v54/moscovich17a.html
[4] Belkin, M. and Niyogi, P. "Semisupervised learning on riemannian manifolds". Machine learning (2004).
· Marianna Pensky, University of Central Florida
Classification with many classes: challenges and pluses.
We consider highdimensional multiclass classification of normal vectors, where unlike standard assumptions, the number of classes may be also large. We derive the (nonasymptotic) conditions on effects of significant features, and the low and the upper bounds for distances between classes required for successful feature selection and classification with a given accuracy. Furthermore, we study an asymptotic setup where the number of classes is growing with the dimension of feature space and sample sizes. To the best of our knowledge, our paper is the first to study this important model. In particular, we present an interesting and, at first glance, somewhat counterintuitive phenomenon that the precision of classification can improve as the number of classes grows.
· Zhaohui Qin, Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
Utilizing Big Data to solve small data inference problem – Alternatives to hierarchical models with applications to genomics data
Modern highthroughput biotechnologies such as microarray and next generation sequencing produce a massive amount of information for each sample assayed. However, in a typical high throughput experiment, only limited amount of data are observed for each individual feature, thus the classical ‘large p, small n’ problem. Bayesian hierarchical model, capable of borrowing strength across features within the same dataset, has been recognized as an effective tool in analyzing such data. However, the shrinkage effect, the most prominent feature of hierarchical features, can lead to undesirable overcorrection for some features. In this work, we discuss possible causes of the overcorrection problem and propose several alternative solutions. Our strategy is rooted in the facts that in the Big Data era, large amount of historical data are available which should be taken advantage of. Our strategy presents a new framework to enhance the Bayesian hierarchical model.
Imputation of Race and Ethnicity in Health Insurance Claims
The State of Connecticut is currently populating an All Payers Claims Database (APCD) which will hold all healthcare claims data for residents of Connecticut. The APCD will be a valuable resource for the study of healthcare delivery, costs and outcomes. It is also a potential resource for the study of health disparities in Connecticut. However, since very few healthcare claims records include the race and ethnicity of the beneficiary (approximately 3%), their use for the study of health disparities is very limited. The imputation of race and ethnicity in these claims data would greatly increase the value of the data held in the APCD and may lead to better healthcare outcomes for CT residents. Currently no model exists to impute race and ethnicity in CT healthcare claims. This project aims to use previously existing CT birth records data held by the Department of Public Health (DPH) to produce an imputation model that can be used to impute race and ethnicity in CT healthcare claims, thereby greatly increasing the utility of the data in the CT APCD. In addition, the model created for this project can be then extended for use in other states, increasing the general utility of healthcare claims. (This is joint work with Robert Aseltine and Yishu Xue).
Supervised Learning without Discrimination
As machine learning is
increasingly being used in areas protected by antidiscrimination law, or
in other domains which are socially and morally sensitive, the problem of algorithmicly
measuring and avoiding prohibited discrimination in machine learning
is pressing. What does it mean for a predictor to not discriminate with
respect to protected group (e.g. according to race, gender, etc)? We propose a
notion of nondiscrimination that can be measured statistically, used algorithmically,
and avoids many of the pitfalls of previous definitions. We further study
what type of discrimination and nondiscrimination can be
identified with oblivious tests, which treat the predictor as an opaque
blackbox, and what different oblivious tests tell us about possible discrimination.
Joint work with Suriya Gunasekar, Mortiz Hardt, Mesrob Ohannessian, Eric
Pierce and Blake Woodwoorth