Department of Statistics &
Operations Research
Statistics Seminars
2006/2007
Note: the program is not final and is subject to possible
changes
Summer term
Second term
6, March

Tom Trigano, Hebrew University


Statistical signal processing and
spectrometry: some processing methods for higher counting rates

27, March

Dean Foster, University of Pennsylvania


On the Intrinsic Dimensionality of MultiView Regression

1, May

Nitzan Rosenfeld, Rosetta Genomics


MicroRNA discovery and application for diagnostics

8, May

Svetlana Bunimovich, Tel Aviv University


Immunotherapy
treatment of Bladder Cancer: A mathematical model

29, May

Mariana Pensky, University of Central Florida (download paper)


Bayesian Approach to Estimation and Testing in
Time Course Microarray Experiments

19, June

Ruth Heller, Tel Aviv University


Screening for Partial Conjunction Hypotheses

31,
October

Meir Smorodinsky, Tel Aviv University


מידול
הסתברותי של הסיכונים בפרוצדורות רפואיות

7,
November

Felix Abramovich, Tel Aviv University


Prelude and Fugue in Bayesian Testimation

28,
November

Isaac Meilijson and Alon Kaufman, Tel Aviv University


Does the neuron's gene expression carry information on its
synaptic connectivity?

26,
December

Adi BenIsrael


Probabilistic Distance Clustering (slides)

2, January*

David M. Steinberg, Tel Aviv University


Orthogonal
Latin Hypercube Designs

9,
January

Sigal Levy, Tel Aviv University


The
analysis of time dependent computer experiments

16,
January

Tal Pupko, Tel Aviv University


Probabilistic evolutionary models and their applications.

* Notice: special time / date / venue
Seminars are held on
Tuesdays, 10.30 am, Schreiber
Building, 309 (see the
TAU map ). is served before.
The seminar organizer is Daniel Yekutieli.
To join the seminar mailing list or any other inquiries 
please call (03)6409612 or email yekutiel@post.tau.ac.il
Details of previous seminars:
ABSTRACTS
·
מאיר סמורודינסקי
מידול הסתברותי של הסיכונים בפרוצדורות רפואיות
מקובל
להעריך פרוצדורות רפואיות באמצעות שקלול, על פני אוכלוסיית החולים, של מחירן
והסיכון הכרוך בביצוען לעומת התועלת שיכולה לצמוח מהם. את החולה, לעומת זאת,
מעניינים סיכויי ההחלמה שלו עצמו. לצורך כך, אציג מודלים הסתברותיים לרעש רציף
בזמן עבור אומדני הסיכון האינדיבידואליים של הפעולות האפשריות בהינתן התיק הרפואי
של החולה.
·
Felix
Abramovich, Tel Aviv University
Prelude
and Fugue in Bayesian Testimation
In the Prelude (a joint composition with Claudia Angelini, CNR Napoli) we
present a Bayesian multiple testing procedure. A hierarchical prior model
is based on imposing a prior distribution on the number of hypotheses arising
from alternatives (false nulls). We then apply the maximum a posteriori (MAP)
rule to find the most likely configuration of nulls and alternatives. We
discuss the relations between the proposed MAP procedure and its several existing
frequentist counterparts.
In the Fugue (a joint composition with Vadim Grinshtein, the Open University
and Marianna Pensky, University
of Central Florida) we
apply the developed MAP procedure to the
normal means problem for recovering a high dimensional vector observed in white
noise. The resulting Bayesian
testimator leads to a general thresholding rule which accomodates many of the
known thresholding and model selection procedures as its particular
cases. We discuss the optimality of the MAP testimator and specify the class of
priors for which it is adaptively minimax for a wide range of sparse
sequences.
·
Isaac Meilijson and Alon Kaufman, Tel Aviv University
Does
the neuron's gene expression carry information on its synaptic connectivity?
The speakers will discuss a largescale investigation on
the nematode Caenorhabditis Elegans – joint work with Gideon Dror and Eytan
Ruppin.
Probabilistic Distance
Clustering
Clustering
is a process of partitioning a data set into clusters, i.e. subsets of data
points that are similar in some sense. Probabilistic clustering is when cluster
membership is expressed by probabilities p(xC) that a point x belongs to a
cluster C. Distance clustering is when similar means close w.r.t. a given
distance function (Euclidean, Mahalanobis, etc.)
I present new approach and method for probabilistic clustering of data. Given
clusters, their sizes (except if these are unknown and have to be estimated),
centers, and the distances of data points from these centers, the probability
of cluster membership at any point is assumed to be inversely proportional to
its distance from the center of that cluster, and directly proportional to the
cluster size. The method is based on the above assumption, and on the joint
distance function, a weighted harmonic mean of distance from all cluster
centers, that evolves during the iterations, and captures the data in its low
contours. The method is simple, and works well and fast.
In addition to clustering, I present application to location of several
capacitated facilities, and to demixing of mixtures of distributions, where the
proposed method is a viable alternative to the EM method for estimating the
relevant parameters.
Joint work with Cem Iyigun, Rutgers
University.
· David M. Steinberg, Tel Aviv University
Orthogonal
Latin Hypercube Designs
Latin
Hypercube (LHC) designs are one of the most popular choices for experiments run
on computer simulators. As first
proposed by McKay, Beckman and Conover in 1979, LHC designs guarantee that
input factor settings are uniformly spread for each single factor, but rely on
“random mating” to achieve good spread in high dimensions. In experiments with many factors, some pairs
of factors typically have moderately high correlations and a number of schemes
have been proposed to reduce the correlations.
In this talk we show how to generate completely orthogonal LHC designs
by rotating twolevel factorial designs into LHC designs. For example, with 256 runs, our method
produces a LHC design with 248 orthogonal factors. The best known result previous to our work
achieved only 14 orthogonal factors. As
a side problem, we show how to arrange the columns of a saturated twolevel
fractional factorial with 2m runs so that every consecutive set of m columns is
a full twolevel factorial. We will also
present some results on orthogonal designs that are “nearly” LHC designs.
This is
joint work with Dennis Lin.
·
Sigal Levy, Tel Aviv University
The analysis of time dependent computer experiments
Computer
experiments are a convenient substitute to real life experiments, when such an
experiment is too complex, expensive or time consuming. This work is concerned with experiments in
which the output from each computer run is a dense time trace that is a
function of some low dimensional set of explanatory variables. We suggest
twostage methods for modelling and predicting such data, separating the model
for time from the model for the explanatory variables. The time dependence is
modelled by fitting known basis functions such as splines, as well as data
derived, shapebased basis functions. Such basis functions are generated by
clustering the data into similarly shaped functions and taking the mean
function of each cluster as a basis function. Several methods were tested for
modelling the relation to the explanatory variables. Bayesian and other models
that used additional information about the data set were considered as a means
of improving the fit obtained by the twostage methods.
Two data sets were analysed using these methods: a
simulation of response to chemotherapy, which yields the amount of cancer cells
in a patient’s body in response to different chemotherapy treatment protocols,
and a circadian rhythm simulation, showing the mRNA production and degradation
throughout a sleepwake cycle. Results show that the shapebased estimation
methods proved to be efficient in both cases, and that usually Kriging
predicted relation to the explanatory variables with the most accuracy.
·
Tal Pupko, Tel Aviv University
Probabilistic
evolutionary models and their applications.
In my talk
I will first give all needed biological background and will provide the
motivation for using evolutionary models. For example I will
discuss the evolutionary relationships between humans and Neanderthals. I will
then explain what are probabilistic evolution models and why they
are needed in the context of phylogenetic tree reconstruction. I will then give
the statistical/mathematical background of these continuous
time Markov models. I will discuss the maximum likelihood approach for learning
with these models. I will also explain the computational
challenges and will give a few applications.
·
Tom
Trigano, Hebrew University
Statistical signal processing
and spectrometry: some processing
methods for higher counting rates...
The main objective of spectrometry is to characterize the radioactive
elements of an unknown source by studying the energy of the emitted photons. When a photon interacts with a
detector, its photonic energy is converted into an electrical pulse, whose
integral energy is measured.
Since the detector has a finite
resolution, close arrival times of photons which can be modeled as an
homogeneous Poisson process cause pileups of individual pulses. This
phenomenon distorts energy spectra by introducing amongst other perturbations
multiple fake spikes.
Since the shape of photonic
impulses depends on many physical parameters, we consider this problem in a
nonparametric framework. By introducing an adapted model based on two marked
point processes, we establish a nonlinear relation between the probability
measure associated to the observations and the probability density function we
wish to estimate. It provides a framework to this problem, which can be
consider as a problem of nonlinear density deconvolution and nonparametric
density estimation from indirect measurements.
Using these considerations, we
propose an estimator obtained by direct inversion. We show that this estimator
is consistent and almost achieves the usual rate of convergence obtained in
classical nonparametric density estimation in the L2 sense. We show in both
simulated and real examples that the distortions caused by the pileup
phenomenon are well corrected by the algorithms derived from our estimators.
·
Dean Foster, University of Pennsylvania
On the Intrinsic Dimensionality
of MultiView Regression
In the multiview regression problem, we have
a regression problem where the input variable can be partitioned into two
different views, where it is assumed that either view of the example would be
sufficient for learning  this is essentially the cotraining assumption for
the regression problem. For example, the task might be to identify a person,
and the two views might be a video stream of the person and an audio stream of
the person.
We show how Canonical Correlation Analysis,CCA, (related to PCA for two random
variables) implies a ridge regression algorithm, where we can characterize the
intrinsic dimensionality of this regression problem by the correlation of the
two views. An interesting aspect of our analysis is that the norm used by the
ridge regression algorithm is derived from the CCA  no norm or Hilbert space
is assumed apriori (unlike in kernel methods).
(with Sham Kakade)
·
Nitzan Rosenfeld, Rosetta Genomics
MicroRNA discovery and application for diagnostics
I will describe the computational methodology we used for
identification of microRNAs in the human genome. I will introduce some of the
challenges and pitfalls of cancer diagnostics, and open the topic for
discussion. I will conclude by presenting our algorithmic approach to tissue
classification.
·
Svetlana
Bunimovich, Tel Aviv University
Immunotherapy
treatment of Bladder Cancer: A mathematical model
I present a
modeling study of bladder cancer growth and its treatment via immunotherapy
with Bacillus CalmetteGue´rin (BCG)  an attenuated strain of Mycobacterium
bovis (M. bovis). BCG immunotherapy is a clinically established
procedure for the treatment of superficial bladder cancer. However, the mode of
action has not yet been fully elucidated, despite extensive biological
research. The mathematical model presented here attempts to gain insights into
the different dynamical outcomes arising from tumorimmune interactions in the
bladder. I studied two types of the treatment: continuous and pulsed BCG therapy. Attention is given to estimating parameters and validating the model
using published data taken from in vitro, mouse and human studies. A mathematical analysis of the differential
equations identifies multiple equilibrium points, their stability properties,
and bifurcation points. Intriguing
regimes of bistability are identified in which treatment has the potential to
result in a tumorfree equilibrium or a fullblown tumor depending only on
initial conditions. In the case of continuous therapy, the model makes clear
that intensity of immunotherapy must be kept in limited bounds. While small
treatment levels may fail to clear the tumor, a treatment that is too large can
lead to an overstimulated immune system having dangerous side effects for the
patient. The
model predicts i) regimes in which immunotherapy cannot help; ii) the optimal
BCG dosage. Intense therapy can incur damage and side effects via the immune
system; iii) quantitative
relationships between BCG dosage, the cancer’s initial condition and tumour
growth rate that can be calculated prior to treatment.
Impulsive
differential equations are used for studying periodic BCG instillations (pulsed BCG therapy). The mathematical
analysis defines the critical threshold values of the BCG instillation dose and
rate of pulsing for tumor elimination. The final goal in this work is to
determine the applicable treatment regime that prevents the immune system side
effects (caused by BCG) and enhances tumor destruction.
·
Marianna Pensky, University of Central Florida
Bayesian Approach to Estimation and Testing in Time
Course Microarray Experiments
The objective of the paper is to
develop a truly functional fully Bayesian method which allows to identify differentially
expressed genes in a timecourse microarray experiment. Each gene expression
profile is modeled as an expansion over some orthonormal basis with
coefficients and the number of basis functions estimated from the data. The
proposed procedure deals successfully with various technical difficulties which arise in
microarray time course experiments
such as a small number of observations available, nonuniform sampling intervals, presence of
missing data or multiple data as well as temporal dependence between
observations for each gene. The procedure allows to account for various types
of errors thus offering a good compromise between nonparametric and normality
assumption based techniques. The method accounts for multiplicity, select
differentially expressed genes and rank them. In addition, all evaluations are
performed using analytic expressions, hence, the entire procedure requires very
small computational effort. The quality
of the procedure is studied by simulations. Finally the procedure is applied to
the case study of the human breast cancer cell stimulated with estrogen leading
to discovery of some new significant
genes which were not marked earlier due to the high variability in the raw data.
Joint work with Claudia Angelini,
Daniela De Canditiis and Margherita Mutarelli
·
Ruth Heller, Tel Aviv University
Screening for Partial Conjunction Hypotheses
We consider the problem of testing the partial
conjunction null hypothesis that there are less than u out of n false null hypotheses.
It offers an inbetween approach to the testing of the global null that all $n$
null hypotheses are true, and the conjunction null that not all of the n
alternative hypotheses are true. We address the problem of testing many partial
conjunction hypotheses simultaneously, a problem that arises when combining
maps of pvalues. Each map contains a large number of locations, and the n
pvalues per location come from different yet related hypotheses. We suggest
powerful test statistics for testing the partial conjunction null hypothesis
that are valid under dependence between the pvalues as well as under
independence. We suggest control of the FDR for testing the partial conjunction
hypotheses, and we prove that BH FDR controlling procedure remains valid under
various dependency structures. We apply our screening method to important
examples from
microarray metaanalysis and fMRI group analysis, and discuss its usefulness
for inference on spatial signals.
This seminar is part of a defense on Ruth Heller's
PhD thesis
·
Katherine S. Pollard,
UC Davis Genome Center & Department of Statistics
Detecting LineageSpecific Evolution
Genomic regions that vary in their patterns of sequence
conservation across a phylogeny are interesting candidates for the study of
evolutionary shifts in
function. We have developed two comparative genomic methods for detecting
lineagespecific evolution on a genomewide scale. The first approach, called
DLESS, is based on a phylogenetic hidden Markov model (phyloHMM), which
does not require the lineage of interest or the element boundaries to be
determined a priori. Applying DLESS to the ENCODE regions of the human genome,
we detected differences in patterns of loss and gain of conserved elements
between coding and noncoding regions and between vertebrate clades. DLESS has
very little power, however, to identify changes in substitution rate on a
single lineage. To address this question, we developed a second method that
begins with a set of ancestrally conserved elements and applies a likelihood
ratio test to screen these for the subset whose substitution rate is
significantly higher in a lineage of interest. With this approach we identified
202 Human Accelerated Regions (HARs), which are highly conserved among mammals
but show a significant increase in the rate of substitutions in the human
genome since divergence from the chimphuman ancestor. Bioinformatic characteristics of the HARs suggest that many
are involved in the regulation of gene expression. The most dramatically
accelerated region, HAR1, is part of a novel RNA gene (HAR1F) that is
expressed during human cortical development.