6 March |
Jerome Tubiana, Ecole Normale Supérieure |
|
Compositional Representations in Restricted Boltzmann Machines: theory and applications |
13 March |
Felix Abramovich, TAU |
|
|
20 March |
Naomi Kaplan, HUJI |
|
Statistical Methods for Evaluating Forensic Evidence: The Case of Shoe Prints |
27 March |
Omer Weissbrod, Harvard School of Public Health |
|
Modeling High-Dimensional Data with Case-Control Sampling and Dependency Structures |
1 May |
Saharon Rosset, TAU |
|
|
8 May |
Richard Olshen, Stanford |
|
|
15 May |
Malka Gorfine, TAU |
|
|
22 May |
Stefan Steiner, Waterloo University |
|
Estimating Risk-Adjusted Process Performance with a Bias/Variance Trade-off |
30 May* |
Donald Rubin, Harvard |
|
|
5 June |
Aryeh Kontorovich, BGU |
|
|
26 June |
Serge Guzy, Adjunct Professor of Pharmacometrics in the University of Minessota, President and CEO of POP_PHARM |
|
24 October |
Haim Avron, TAU |
|
Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees |
14 November |
Roi Weiss, Weizmann Institute |
|
Nearest-Neighbor Sample Compression: Efficiency, Consistency, Infinite Dimensions |
21 November |
Douglas Wiens, University of Alberta |
|
|
5 December |
Regev Schweiger, TAU |
|
Detecting heritable phenotypes without a model: Fast permutation testing for heritability |
12 December |
Barak Sober, TAU |
|
Approximation of Functions Over Manifolds by Moving Least Squares |
26 December |
Avi Dgani, Geocartography Knowledge Group |
|
גיאוקרטוגרפיה: מודלים גיאו-אנליטיים וכלי מחקר חדשניים, מיצרים עולם יישומים רב-תחומי |
2 January |
Amichai Pinsky, HUJI and MIT |
|
|
9 January |
Phil Reiss, Haifa University |
|
Statistical Issues in the Study of Neurodevelopmental Trajectories |
16 January |
Ari Pakman, Columbia University |
|
Sampling with Velocities |
|
|
|
|
|
|
|
|
Seminars are held on Tuesdays, 10.30 am, Schreiber Building, 309 (see the TAU map ). The seminar organizer is Daniel Yekutieli.
To join the seminar mailing list or any other inquiries - please call (03)-6409612 or email 12345yekutiel@post.tau.ac.il54321 (remove numbers unless you are a spammer…)
Seminars from previous years
ABSTRACTS
Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees
Random Fourier features is one of the most popular techniques for scaling up kernel methods, such as kernel ridge regression. However, despite impressive empirical results, the statistical properties of random Fourier features are still not well understood. The talk is based on a recent paper in which we take steps toward filling this gap. Specifically, we approach random Fourier features from a spectral matrix approximation point of view, give tight bounds on the number of Fourier features required to achieve a spectral approximation, and show how spectral matrix approximation bounds imply statistical guarantees for kernel ridge regression.
Qualitatively, our results are twofold: on one hand, we show that random Fourier feature approximation can provably speed up kernel ridge regression under reasonable assumptions. At the same time, we show that the method is suboptimal, and sampling from a modified distribution in Fourier space, given by the leverage function of the kernel, yields provably better performance. We study this optimal sampling distribution for the Gaussian kernel, achieving a nearly complete characterization for the case of low-dimensional bounded datasets. Based on this characterization, we propose an efficient sampling scheme with guarantees superior to random Fourier features in this regime.
This is joint work with Michael Kapralov (EPFL), Cameron Musco (MIT), Christopher Musco (MIT), Ameya Velingker (EPFL), and Amir Zandieh (EPFL)
· Roi Weiss, Weizmann Institute
Nearest-Neighbor Sample Compression: Efficiency,
Consistency, Infinite Dimensions
This talk deals with Nearest-Neighbor (NN) learning algorithms in metric
spaces. This seemingly naive learning paradigm remains competitive against more
sophisticated methods and, in its celebrated k-NN version, has been placed on a
solid theoretical foundation. Although the classic 1-NN is well known to be
inconsistent in general, in recent years a series of works has presented
variations on the theme of margin-regularized 1-NN algorithms, as an alternative
to the Bayes-consistent k-NN. These algorithms enjoy a number of statistical
and computational advantages over the traditional k-NN. Salient among these are
explicit data-dependent generalization bounds and considerable runtime and
memory savings.
In this talk we examine a recently proposed compression-based 1-NN algorithm, which enjoys additional advantage in the form of tighter generalization bounds and increased efficiency in time and space. We show that this algorithm is strongly Bayes-consistent in metric spaces with finite doubling dimension — the first compression-based multi-class 1-NN algorithm proven to be both computationally efficient and Bayes-consistent. Rather surprisingly, we discover that this algorithm continues to be Bayes-consistent even in a certain infinite dimensional setting, in which the basic measure-theoretic conditions on which classic consistency proofs hinge are violated. This is all the more surprising, since it is known that k-NN is not Bayes-consistent in this setting, thus raising several interesting open problems.
Joint work with Aryeh Kontorovich and Sivan Sabato.
Approximation of Functions Over Manifolds by Moving Least Squares
We approximate a function defined over a $d$-dimensional manifold $M⊂R^n$ utilizing only noisy function values at noisy locations on the manifold. To produce the approximation we do not require any knowledge regarding the manifold other than its dimension $d$. The approximation scheme is based upon the Manifold Moving Least-Squares (MMLS) and is therefore resistant to noise in the domain $M$ as well. Furthermore, the approximant is shown to be smooth and of approximation order of $O(h^{m+1})$ for non-noisy data, where $h$ is the mesh size w.r.t $M$, and $m$ is the degree of the local polynomial approximation. In addition, the proposed algorithm is linear in time with respect to the ambient space dimension $n$, making it useful for cases where $d<<n$. This assumption, that the high dimensional data is situated on (or near) a significantly lower dimensional manifold, is prevalent in many high dimensional problems. We put our algorithm to numerical tests against state-of-the-art algorithms for regression over manifolds and show its potential.
This talk is based upon a joint work with Yariv Aizenbud & David Levin
· Douglas Wiens, University of Alberta
Robustness of Design: A Survey
When an experiment is conducted for purposes which include fitting a particular model to the data, then the 'optimal' experimental design is highly dependent upon the model assumptions - linearity of the response function, independence and homoscedasticity of the errors, etc. When these assumptions are violated the design can be far from optimal, and so a more robust approach is called for. We should seek a design which behaves reasonably well over a large class of plausible models.
I will review the progress which has been made on such problems, in a variety of experimental and modelling scenarios - prediction, extrapolation, discrimination, survey sampling, dose-response, machine learning, etc.
· Regev Schweiger, TAU
Detecting heritable phenotypes without a model: Fast permutation testing for heritability
Estimation of heritability is fundamental in genetic studies. Recently, heritability estimation using linear mixed models has gained popularity because these estimates can be obtained from unrelated individuals collected in genome-wide association studies. When evaluating the heritability of a phenotype, it is important to accurately measure the statistical significance of obtaining an estimated heritability value under the null hypothesis of a zero true heritability value. One major problem with the parametric approach is that it strongly relies on the parametric model at hand. In contrast, permutation testing is a popular nonparametric alternative, whose advantages are that it does not require the assumption of a parametric form of the distribution of the statistic, and that it does not rely on asymptotic assumptions. Indeed, we show that permutation p-values for the heritability of methylation profiles of CpG sites from a cohort of 1799 samples are significantly larger than those calculated using asymptotic assumptions. In particular, sites which are significantly heritable according to the model, are often deemed to be non-significant, resulting in false positives and demonstrating the need for feasible permutation testing for heritability. Permutation testing, however, is often computationally prohibitive. Here, we propose an efficient method to perform permutation testing for heritability, achieving a speedup of up to several orders of magnitude, resulting in a method which is both highly efficient and does not suffer from model misspecification.
· אבי דגני, נשיא קב' גיאוקרטוגרפיה ופרופ' אמריטוס בחוג לגיאוגרפיה באוניברסיטת תל אביב
גיאוקרטוגרפיה: מודלים גיאו-אנליטיים וכלי מחקר חדשניים, מיצרים עולם יישומים רב-תחומי
"גיאוקרטוגרפיה" היא תחום מדעי חדש בגיאוגרפיה, ומונח שפרופ' דגני יצר בשנת 1968, כאשר נמנה עם חלוצי הפיתוח בעולם של תחום מערכות המידע הגיאוגרפיות הממוחשבותGIS- וכן Location Intelligence- LI . מדובר בבסיס תורתי תיאורטי, שהוליד מודלים גיאו-אנליטיים, מיפוי אנליטי, סגמנטציה של אוכלוסיות, מערכות מידע וכלי מחקר ותכנון ייחודיים המיושמים בתחומים רבים: תכנון עירוני, תכנון תחבורה, תחלואה ורפואה, כלכלה ומסחר, בדיקות היתכנות כלכלית, ניתוחי מיקום אופטימאלי, מחקרי חברה וקהילה, וגם סיקור ותחזיות פוליטיות...
· Amichai Pinsky, HUJI and MIT
Universal Loss and Gaussian Learning Bounds
In this talk I address two
basic predictive modeling problems: choosing a universal loss function, and how
to approach non-linear learning problems with linear means.
A loss function measures the discrepancy between the true values and the
estimated fits, for a given instance of data. Different loss functions
correspond to a variety of merits, and the choice of a "correct" loss
could sometimes be questionable. Here, I show that for binary classification
problems, the Bernoulli log-likelihood loss (log-loss) is universal with
respect to practical alternatives. In other words, I show that by minimizing
the log-loss we minimize an upper bound to any smooth, convex and unbiased
binary loss function. This property justifies the broad use of log-loss in
regression, in decision trees, as an InfoMax criterion (cross-entropy
minimization) and in many other applications.
I then address a Gaussian representation problem which utilizes the log-loss.
In this problem we look for an embedding of an arbitrary data which maximizes
its "Gaussian part" while preserving the original dependence between
the variables and the target. This embedding provides an efficient (and
practical) representation as it allows us to consider the favorable properties
of a Gaussian distribution. I introduce different methods and show that the
optimal Gaussian embedding is governed by the non-linear canonical correlations
of the data. This result provides a fundamental limit for our ability to
Gaussianize arbitrary data-sets and solve complex problems by linear means.
· Phil Reiss, Haifa University
Statistical Issues in the Study of Neurodevelopmental Trajectories
This talk will examine two statistical issues arising in the study of developmental trajectories in the brain, a key aim of current research in psychiatry. First, we discuss the relative efficiency of cross-sectional and longitudinal designs when the “trajectory” of interest is the mean of some quantity, such as thickness of the cerebral cortex, as a function of age. A classical variance inflation factor is generalized from the estimation of a scalar to the present setting of function estimation, and is further extended to penalized smoothing. Second, we consider the use of functional principal component analysis for estimation of individual trajectories. Specifically, we show how smoothing of the covariance surface can have surprising effects on the results, in particular when this is done by the currently popular approach of tensor product smoothing. The ideas will be illustrated with data from a large longitudinal study of cortical development.
· Ari Pakman, Columbia University
Sampling with Velocities.
Bayesian modeling relies on efficient techniques to perform posterior inference over complex probability distributions. Among Monte Carlo methods, two particularly efficient approaches enlarge the sampling space with velocity vectors: Hamiltonian Monte Carlo (HMC) and the Bouncy Particle Sampler (BPS). For HMC, I will first present two non-trivial distributions where the Hamiltonian equations of motion can be integrated exactly: truncated multivariate Gaussians and augmented binary distributions. I will then present an application of these techniques to a statistical neuroscience problem. For large datasets, stochastic versions of Metropolis-Hastings samplers do not preserve the distribution. I will present a stochastic version of the BPS, which allows to evaluate mini-batches of the data at eachiteration while introducing minimal bias in the sampled distribution.
· Jerome Tubiana, Ecole Normale Supérieure
Compositional Representations in Restricted Boltzmann Machines: theory and applications.
Restricted Boltzmann Machines (RBM) form a family of probability distributions simple yet powerful for modeling high-dimensional, complex data sets. Besides learning a generative model, they also extract features, producing a graded and distributed representation of data. However, not all variants of RBM perform equally well, and little theoretical arguments exist for these empirical observations. By analyzing an ensemble of RBMs with random weights using statistical physics tools, we characterize the structural conditions (statistics of weights, choice of non-linearity…) allowing the emergence of such efficient representations.
Lastly, we present a new application of RBMs: the analysis of protein sequences alignments. We show that RBMs extract high-order patterns of coevolution that arise from the structural and functional constraints of the protein family. These patterns can be recombined to generate artificial protein sequences with prescribed chemical properties.
Model selection in high-dimensional GLM with applications to sparse logistic regression and classification
In the first part of the talk we consider model selection in high-dimensional generalized linear models (GLM) by penalized maximum likelihood with a complexity penalty on the model size extending theknown results for Gaussian linear regression. We derive (non-asymptotic) upper bounds for the Kullback-Leibler risk of the resulting estimator and the corresponding minimax lower bounds. We discuss also computational aspects and present several related model selection procedures (Lasso, Slope) computationally feasible for high-dimensional data. In the second part we apply the obtained results for model/feature selection to high-dimensional classification by sparse logistic regression. We derive the misclassification excess risk of the resulting plug-in classifier and discuss its optimality among the class of sparse linear classifiers.
Statistical Methods for Evaluating Forensic Evidence: The Case of Shoe Prints
Our overall research goal is to utilize statistical methods in order to verify a match between crime scene shoeprints and a suspect's shoe. The degree of rarity for a given shoe print is defined as the probability that a random shoeprint has a pattern of faults (or “accidental marks”) that is similar to the shoeprint in question, meaning that the accidentals appear in exactly the same locations and have exactly the same orientations and shapes. This presentation will examine a more specific issue: the methods used to estimate the probability of accidentals appearing at a certain location on the shoe sole. Questions related to the complexity of the case under discussion will be raised. A second issue that will be discussed involves the basic assumption that accidentals and their characteristics (location, shape and orientation) are independent of each other and that for this reason the rarity of a shoe can be determined in a simple manner. However, if, as will be demonstrated, these marks and characteristics are not independent, the current form of assessment may need to be reconsidered.
· Omer Weissbrod, Harvard School of Public Health
Modeling High-Dimensional Data with Case-Control Sampling and Dependency Structures
Modern data sets in various domains often include units that were sampled non-randomly from the population and have a complex latent correlation structure. Here we investigate a common form of this setting, where every unit is associated with a latent variable, all latent variables are correlated, and the probability of sampling a unit depends on its response. Such settings often arise in case-control studies, where the sampled units are correlated due to spatial proximity, family relations, or other sources of relatedness. Maximum likelihood estimation in such settings is challenging from both a computational and statistical perspective, necessitating approximation techniques that take the sampling scheme into account.
We propose a family of approximate likelihood approaches by combining state of the art methods from statistics and machine learning, including composite likelihood, expectation propagation and generalized estimating equations. We demonstrate the efficacy of our proposed approaches via extensive simulations. We utilize them to investigate the genetic architecture of several complex disorders collected in case-control genetic association studies, where hundreds of thousands of genetic variants are measured for every individual, and the underlying disease liabilities of individuals are correlated due to genetic similarity. Our work is the first to provide a tractable likelihood-based solution for case-control data with complex dependency structures.
This is a joint work with Shachar Kaufman, David Golan and Saharon Rosset.
A preprint is available at: https://arxiv.org/abs/1801.03901
Optimal Procedures for Multiple Testing Problems
Multiple testing
problems are a staple of modern statistical analysis. The fundamental objective
of multiple testing procedures is to reject as many false null hypotheses as
possible (that is, maximize some notion of power),subject to controlling an
overall measure of false discovery, like family-wise error rate (FWER) or false
discovery rate (FDR). We formulate multiple testing of simple hypotheses as an
infinite-dimensional optimization problem, seeking the most powerful rejection
policy which guarantees strong control of the selected measure. In that sense,
our approach is a generalization of the optimal Neyman-Pearson test for a
single hypothesis. We show that for exchangeable hypotheses, for both FWER and
FDR and relevant notions of power, these problems can be formulated as infinite
linear programs and can in principle be solved for any number of hypotheses. We
apply our results to derive explicit optimal tests for FWER or FDR control for
three independent normal means. We find that the power gain over natural
competitors is substantial in all settings examined. We also characterize
maximin rules for complex alternatives, and demonstrate that such rules can be
found in practice, leading to improved practical procedures compared to
existing alternatives.
Joint work with Ruth Heller, Amichai Painsky and Ehud Aharoni
· Richard Olshen, Professor Emeritus of Biomedical Data Science, Stanford University
V(D)J Diversity and Statistical Inference
This talk will include an introduction to the topic of V(D)J rearrangements of particular subsets of T cells and B cells of the adaptive human immune system, in particular of IgG heavy chains. There are many statistical problems that arise in understanding these cells. This presentation will be my attempt to provide some mathematical and computational details that arise in trying to understand the data.
I have received considerable assistance from Lu Tian, and also Yi Liu; Andrew Fire and Scott Boyd have given valuable advice, and the same for Jorg Goronzy.
Nonparametric Adjustment for Measurement Error in Time to Event Data: Application to Risk Prediction Models
Mismeasured time to event data used as a predictor in risk prediction models will lead to inaccurate predictions. This arises in the context of self-reported family history, a time to event predictor often measured with error, used in Mendelian risk prediction models. Using validation data, we propose a method to adjust for this type of error. We estimate the measurement error process using a nonparametric smoothed Kaplan-Meier estimator, and use Monte Carlo integration to implement the adjustment. We apply our method to simulated data in the context of both Mendelian and multivariate survival prediction models. Simulations are evaluated using measures of mean squared error of prediction (MSEP), area under the response operating characteristics curve (ROC-AUC), and the ratio of observed to expected number of events. These results show that our method mitigates the effects of measurement error mainly by improving calibration and total accuracy. We illustrate our method in the context of Mendelian risk prediction models focusing on misreporting of breast cancer, fitting the measurement error model on data from the University of California at Irvine, and applying our method to counselees from the Cancer Genetics Network. We show that our method improves overall calibration, especially in low risk deciles.
Joint work with Danielle Braun and Giovanni Parmigiani, Department of Biostatistics, Harvard School of Public Health.
· Stefan Steiner, Waterloo University
Estimating Risk-Adjusted Process Performance with a Bias/Variance Trade-off
In the analysis of performance data over time, common objectives are to compare estimates of the current mean value of a process parameter with a target, over levels of the covariates, across multiple streams, and over time. When samples are taken over time, we can make the desired estimates using only the present time data or an augmented dataset that includes historical data. However, when the characteristic is drifting over time and sample sizes are small, the decision to include historical data trades precision for bias in the present time estimates. We propose an approach that regulates the bias-variance tradeoff using Weighted Estimating Equations where the estimating equations are based on a suitable Generalized Linear Model adjusting for the levels of the covariates. A customer loyalty survey for a smartphone vendor will be presented and resulting present time estimates of Net Promoter Score will be compared across various approaches applied to example data and simulated data.
Dealing with Nonignorable Missing Data
This talk has two connected parts. The first presents a new class of models for analyzing non-ignorable missing data, apparently first suggested by John Tukey at a recondite conference at ETS in Princeton NJ. The second part presents “enhanced tipping point displays,” introduced recently to visually reveal sensitivity of statistical conclusions to alterations of the assumptions about the reasons for missing data in the context of a real submission to the US FDA. Both topics rely on modern computing power but in very different ways, the former on numerical computational speed, the latter on the extreme flexibility of visual displays.
The wondrous one-coin model
We explore the surprising depth of the Naive Bayes classifier, which is
equivalent to the so-called "single-coin" model in crowdsourcing. The
journey begins with the powerful but not widely known Kearns-Saul inequality,
which is tailor-made for proving concentration for highly heterogeneous sums.
We use Kearns-Saul to prove tight upper finite-sample risk estimates for the Naive
Bayes classifier (the tight lower bounds follow from Pinsker's inequality). We
mention a few challenging open problems in this context. The Naive Bayes
story continues in what might be called the "multiple-coin" framework
of agnostic PAC minimax lower bounds. While the exact constant in the upper
bound remains out of reach, we were able to compute it for the lower bound via
a refined analysis of the Bayes-optimal risk. Finally, (time permitting) we revisit
Kearns-Saul in the problem of estimating the missing mass of a discrete
distribution. We obtain the tightest (and, arguably, simplest) concentration
bounds for the missing mass and discuss some applications.
Based on joint work with D. Berend and I. Pinelis.
· Serge Guzy, Adjunct Professor of Pharmacometrics in the University of Minessota, President and CEO of POP_PHARM
Statistical aspects of Pharmacometrics: A case study
Pharmacokinetic/Pharmacodynamic data in drug development are generated first in Preclinical stage and then in Phase I Clinical Development (Human Pharmacology), Phase II Clinical Development (Therapeutic Exploratory) and Phase III Clinical Development (Therapeutic Confirmatory) . The data are often very sparse (few observations per patient), longitudinal (samples collected at different times for each individual) and modeled with complex nonlinear models. It requires the use of the Population approach (Population PK/PD), a main branch of Pharmacometrics. The goal of this seminar is to present the statistical aspects of Pharmacometrics using a PK/PD case study as template. The case study focuses on the development of a continuous time, population-based Markov model of the drug associated dynamics of Migraine. The model describes the probability of migraine incidence as a function of exposure and time (PD response). The PD model has a time dependent placebo effect, a drug effect on the probability to shift from the “migraine” to the “no migraine” state with increasing exposure and a base flow rate enabling shift from the “migraine” to the “no migraine” state and vice versa