6 March 
Jerome Tubiana, Ecole Normale Supérieure 

Compositional Representations in Restricted Boltzmann Machines: theory and applications 
13 March 
Felix Abramovich, TAU 


20 March 
Naomi Kaplan, HUJI 


27 March 
Omer Weissbrod, Harvard School of Public Health 


1 May 
Saharon Rosset, TAU 


8 May 
Richard Olshen, Stanford 


15 May 
Malka Gorfine, TAU 


22 May 
Stefan Steiner, Waterloo University 


29 May 
Donald Rubin, Harvard 


5 June 
Aryeh Kontorovich, BGU 














24 October 
Haim Avron, TAU 

Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees 
14 November 
Roi Weiss, Weizmann Institute 

NearestNeighbor Sample Compression: Efficiency, Consistency, Infinite Dimensions 
21 November 
Douglas Wiens, University of Alberta 


5 December 
Regev Schweiger, TAU 

Detecting heritable phenotypes without a model: Fast permutation testing for heritability 
12 December 
Barak Sober, TAU 

Approximation of Functions Over Manifolds by Moving Least Squares 
26 December 
Avi Dgani, Geocartography Knowledge Group 

גיאוקרטוגרפיה: מודלים גיאואנליטיים וכלי מחקר חדשניים, מיצרים עולם יישומים רבתחומי 
2 January 
Amichai Pinsky, HUJI and MIT 


9 January 
Phil Reiss, Haifa University 

Statistical Issues in the Study of Neurodevelopmental Trajectories 
16 January 
Ari Pakman, Columbia University 

Sampling with Velocities 








Seminars are held on Tuesdays, 10.30 am, Schreiber Building, 309 (see the TAU map ). The seminar organizer is Daniel Yekutieli.
To join the seminar mailing list or any other inquiries  please call (03)6409612 or email 12345yekutiel@post.tau.ac.il54321 (remove numbers unless you are a spammer…)
Seminars from previous years
ABSTRACTS
Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees
Random Fourier features is one of the most popular techniques for scaling up kernel methods, such as kernel ridge regression. However, despite impressive empirical results, the statistical properties of random Fourier features are still not well understood. The talk is based on a recent paper in which we take steps toward filling this gap. Specifically, we approach random Fourier features from a spectral matrix approximation point of view, give tight bounds on the number of Fourier features required to achieve a spectral approximation, and show how spectral matrix approximation bounds imply statistical guarantees for kernel ridge regression.
Qualitatively, our results are twofold: on one hand, we show that random Fourier feature approximation can provably speed up kernel ridge regression under reasonable assumptions. At the same time, we show that the method is suboptimal, and sampling from a modified distribution in Fourier space, given by the leverage function of the kernel, yields provably better performance. We study this optimal sampling distribution for the Gaussian kernel, achieving a nearly complete characterization for the case of lowdimensional bounded datasets. Based on this characterization, we propose an efficient sampling scheme with guarantees superior to random Fourier features in this regime.
This is joint work with Michael Kapralov (EPFL), Cameron Musco (MIT), Christopher Musco (MIT), Ameya Velingker (EPFL), and Amir Zandieh (EPFL)
· Roi Weiss, Weizmann Institute
NearestNeighbor Sample Compression: Efficiency,
Consistency, Infinite Dimensions
This talk deals with NearestNeighbor (NN) learning algorithms in metric
spaces. This seemingly naive learning paradigm remains competitive against more
sophisticated methods and, in its celebrated kNN version, has been placed on a
solid theoretical foundation. Although the classic 1NN is well known to be
inconsistent in general, in recent years a series of works has presented
variations on the theme of marginregularized 1NN algorithms, as an
alternative to the Bayesconsistent kNN. These algorithms enjoy a number of
statistical and computational advantages over the traditional kNN. Salient
among these are explicit datadependent generalization bounds and considerable
runtime and memory savings.
In this talk we examine a recently proposed compressionbased 1NN algorithm, which enjoys additional advantage in the form of tighter generalization bounds and increased efficiency in time and space. We show that this algorithm is strongly Bayesconsistent in metric spaces with finite doubling dimension — the first compressionbased multiclass 1NN algorithm proven to be both computationally efficient and Bayesconsistent. Rather surprisingly, we discover that this algorithm continues to be Bayesconsistent even in a certain infinite dimensional setting, in which the basic measuretheoretic conditions on which classic consistency proofs hinge are violated. This is all the more surprising, since it is known that kNN is not Bayesconsistent in this setting, thus raising several interesting open problems.
Joint work with Aryeh Kontorovich and Sivan Sabato.
Approximation of Functions Over Manifolds by Moving Least Squares
We approximate a function defined over a $d$dimensional manifold $M⊂R^n$ utilizing only noisy function values at noisy locations on the manifold. To produce the approximation we do not require any knowledge regarding the manifold other than its dimension $d$. The approximation scheme is based upon the Manifold Moving LeastSquares (MMLS) and is therefore resistant to noise in the domain $M$ as well. Furthermore, the approximant is shown to be smooth and of approximation order of $O(h^{m+1})$ for nonnoisy data, where $h$ is the mesh size w.r.t $M$, and $m$ is the degree of the local polynomial approximation. In addition, the proposed algorithm is linear in time with respect to the ambient space dimension $n$, making it useful for cases where $d<<n$. This assumption, that the high dimensional data is situated on (or near) a significantly lower dimensional manifold, is prevalent in many high dimensional problems. We put our algorithm to numerical tests against stateoftheart algorithms for regression over manifolds and show its potential.
This talk is based upon a joint work with Yariv Aizenbud & David Levin
· Douglas Wiens, University of Alberta
Robustness of Design: A Survey
When an experiment is conducted for purposes which include fitting a particular model to the data, then the 'optimal' experimental design is highly dependent upon the model assumptions  linearity of the response function, independence and homoscedasticity of the errors, etc. When these assumptions are violated the design can be far from optimal, and so a more robust approach is called for. We should seek a design which behaves reasonably well over a large class of plausible models.
I will review the progress which has been made on such problems, in a variety of experimental and modelling scenarios  prediction, extrapolation, discrimination, survey sampling, doseresponse, machine learning, etc.
· Regev Schweiger, TAU
Detecting heritable phenotypes without a model: Fast permutation testing for heritability
Estimation of heritability is fundamental in genetic studies. Recently, heritability estimation using linear mixed models has gained popularity because these estimates can be obtained from unrelated individuals collected in genomewide association studies. When evaluating the heritability of a phenotype, it is important to accurately measure the statistical significance of obtaining an estimated heritability value under the null hypothesis of a zero true heritability value. One major problem with the parametric approach is that it strongly relies on the parametric model at hand. In contrast, permutation testing is a popular nonparametric alternative, whose advantages are that it does not require the assumption of a parametric form of the distribution of the statistic, and that it does not rely on asymptotic assumptions. Indeed, we show that permutation pvalues for the heritability of methylation profiles of CpG sites from a cohort of 1799 samples are significantly larger than those calculated using asymptotic assumptions. In particular, sites which are significantly heritable according to the model, are often deemed to be nonsignificant, resulting in false positives and demonstrating the need for feasible permutation testing for heritability. Permutation testing, however, is often computationally prohibitive. Here, we propose an efficient method to perform permutation testing for heritability, achieving a speedup of up to several orders of magnitude, resulting in a method which is both highly efficient and does not suffer from model misspecification.
· אבי דגני, נשיא קב' גיאוקרטוגרפיה ופרופ' אמריטוס בחוג לגיאוגרפיה באוניברסיטת תל אביב
גיאוקרטוגרפיה: מודלים גיאואנליטיים וכלי מחקר חדשניים, מיצרים עולם יישומים רבתחומי
"גיאוקרטוגרפיה" היא תחום מדעי חדש בגיאוגרפיה, ומונח שפרופ' דגני יצר בשנת 1968, כאשר נמנה עם חלוצי הפיתוח בעולם של תחום מערכות המידע הגיאוגרפיות הממוחשבותGIS וכן Location Intelligence LI . מדובר בבסיס תורתי תיאורטי, שהוליד מודלים גיאואנליטיים, מיפוי אנליטי, סגמנטציה של אוכלוסיות, מערכות מידע וכלי מחקר ותכנון ייחודיים המיושמים בתחומים רבים: תכנון עירוני, תכנון תחבורה, תחלואה ורפואה, כלכלה ומסחר, בדיקות היתכנות כלכלית, ניתוחי מיקום אופטימאלי, מחקרי חברה וקהילה, וגם סיקור ותחזיות פוליטיות...
· Amichai Pinsky, HUJI and MIT
Universal Loss and Gaussian Learning Bounds
In this talk I address two
basic predictive modeling problems: choosing a universal loss function, and how
to approach nonlinear learning problems with linear means.
A loss function measures the discrepancy between the true values and the
estimated fits, for a given instance of data. Different loss functions
correspond to a variety of merits, and the choice of a "correct" loss
could sometimes be questionable. Here, I show that for binary classification
problems, the Bernoulli loglikelihood loss (logloss) is universal with
respect to practical alternatives. In other words, I show that by minimizing
the logloss we minimize an upper bound to any smooth, convex and unbiased
binary loss function. This property justifies the broad use of logloss in
regression, in decision trees, as an InfoMax criterion (crossentropy
minimization) and in many other applications.
I then address a Gaussian representation problem which utilizes the logloss.
In this problem we look for an embedding of an arbitrary data which maximizes
its "Gaussian part" while preserving the original dependence between
the variables and the target. This embedding provides an efficient (and
practical) representation as it allows us to consider the favorable properties
of a Gaussian distribution. I introduce different methods and show that the
optimal Gaussian embedding is governed by the nonlinear canonical correlations
of the data. This result provides a fundamental limit for our ability to
Gaussianize arbitrary datasets and solve complex problems by linear means.
· Phil Reiss, Haifa University
Statistical Issues in the Study of Neurodevelopmental Trajectories
This talk will examine two statistical issues arising in the study of developmental trajectories in the brain, a key aim of current research in psychiatry. First, we discuss the relative efficiency of crosssectional and longitudinal designs when the “trajectory” of interest is the mean of some quantity, such as thickness of the cerebral cortex, as a function of age. A classical variance inflation factor is generalized from the estimation of a scalar to the present setting of function estimation, and is further extended to penalized smoothing. Second, we consider the use of functional principal component analysis for estimation of individual trajectories. Specifically, we show how smoothing of the covariance surface can have surprising effects on the results, in particular when this is done by the currently popular approach of tensor product smoothing. The ideas will be illustrated with data from a large longitudinal study of cortical development.
· Ari Pakman, Columbia University
Sampling with Velocities.
Bayesian modeling relies on efficient techniques to perform posterior inference over complex probability distributions. Among Monte Carlo methods, two particularly efficient approaches enlarge the sampling space with velocity vectors: Hamiltonian Monte Carlo (HMC) and the Bouncy Particle Sampler (BPS). For HMC, I will first present two nontrivial distributions where the Hamiltonian equations of motion can be integrated exactly: truncated multivariate Gaussians and augmented binary distributions. I will then present an application of these techniques to a statistical neuroscience problem. For large datasets, stochastic versions of MetropolisHastings samplers do not preserve the distribution. I will present a stochastic version of the BPS, which allows to evaluate minibatches of the data at eachiteration while introducing minimal bias in the sampled distribution.
· Jerome Tubiana, Ecole Normale Supérieure
Compositional Representations in Restricted Boltzmann Machines: theory and applications.
Restricted Boltzmann Machines (RBM) form a family of probability distributions simple yet powerful for modeling highdimensional, complex data sets. Besides learning a generative model, they also extract features, producing a graded and distributed representation of data. However, not all variants of RBM perform equally well, and little theoretical arguments exist for these empirical observations. By analyzing an ensemble of RBMs with random weights using statistical physics tools, we characterize the structural conditions (statistics of weights, choice of nonlinearity…) allowing the emergence of such efficient representations.
Lastly, we present a new application of RBMs: the analysis of protein sequences alignments. We show that RBMs extract highorder patterns of coevolution that arise from the structural and functional constraints of the protein family. These patterns can be recombined to generate artificial protein sequences with prescribed chemical properties.
· Richard Olshen, Professor of Biomedical Data Science, Emeritus, Stanford University