Department of Statistics & Operations Research

Statistics Seminars

2003/2004

You are Visitor No:

Note: the program is not final and is subject to possible changes


First Term

4, NovemberHaim Ricas, Tashtit Scientific Consultants
"Modern statistical software tools. I. Modern applied statistics with S-Plus"

18, NovemberHaim Ricas, Tashtit Scientific Consultants
"Modern statistical software tools. II. Statistical software for teaching statistics"

2, DecemberCamil Fuchs, Tel Aviv University
"Lifetime morbid risks in family studies: classical methods and a new model"

9, DecemberA Special Day of Seminars on Robust Design
Ron Kenett, KPA Ltd.
"Robust design and rapid development from computer simulations"
Hila Ginsburg, Tel Aviv University
"Designing experiments in robust design problems"

30, DecemberYechezkel (Eli) Kling, Tel Aviv University
"Aspects of multiplicity in statistical process control"

20, JanuaryStephen Fienberg, Carnegie Mellon University
"Characterizing probability distributions associated with multi-dimensional contingency tables"

Second Term

2, MarchUlrich Stadtmuller, University of Ulm
"Generalized linear models with functional data"

16, March Laurence Freedman, Bar Ilan University
"A new method for dealing with measurement error in explanatory variables of regression models"

23, MarchDan Geiger, Technion
"A new software for genetic linkage analysis"

30, MarchVyacheslav Abramov, Tel Aviv University
"Asymptotic methods for communication networks"

20, AprilDaniel Yekutieli, Tel Aviv University
"FDR confidence intervals"

4, MayAlbert Vexler, Central Bureau of Statistics
"Guarenteed maximum likelihood splitting tests of a linear regression model"

18, MayYoav Dvir, Tel Aviv University
"Local likelihood methods for nonparametric density estimation"

1, JuneInna Stainvas, Orbotech Ltd.
"A generative probabilistic oriented wavelet model for texture segmentation"

8, JuneYuval Nov, Stanford University
"Modeling and analysis of protein design under resource constraints"


Seminars are held on Tuesdays, 10.30 am,
Schreiber Building, 309 (see the TAU map ). is served before.
The seminar organiser is Felix Abramovich . To join the seminar mailing list and get updated information about current/forthcoming seminars and for other inquiries call (03)-6405389 or email felix@math.tau.ac.il
Details of previous seminars:

ABSTRACTS

Haim Ricas
"Modern statistical software tools. I. Modern applied statistics with S-Plus"

In this talk we present the new S-Plus modules designed for various problems in modern applied statistics :

Haim Ricas
"Modern statistical software tools. II. Statistical software for teaching statistics"

In this talk we demonstrate how the new statistical software can be helpful in teaching statistics. We present MathStatica - the new Mathematica module that allows both symbolic and numerical solutions to various probability and statistical problems. In particular, we illustrate how MathStatica can assist in teaching probability, estimation, statistical inference and asymptotic theory.

We also introduce TableCurve - a new software for curve fitting and equation discovery which can be an efficient tool for exploratory parametric and non-parametric data fitting .

Camil Fuchs
"Lifetime morbid risks in family studies: classical methods and a new model"

Family studies assess routinely the lifetime morbid risks of various diseases of the relatives of probands affected either by the studied diseases or by other related diseases.

In family studies in the psychiatric literature the lifetime morbid risks are usually determined either by methods originally designed for analyzing life tables, as the Kaplan-Meier product limit estimator and Cox proportional hazards model, or by simpler estimators like the Weinberg abridged method, or by the Stromgren method which can be considered as an elaboration to the Weinberg abridged method. In other cases, the lifetime morbid risks are assessed by the unadjusted proportion of the affected in the sample (known as the lifetime prevalence).

We shall show that the use of the Kaplan-Meier product limit estimator for the estimation of lifetime morbid risk may yield unreliable estimates. Furthermore, while the simplicity of the Stromgren method and the Weinberg abridged method is appealing, we suggest that under a proper model, those methods can be replaced by an equally simple statistic, which is shown to be a more accurate both on the average as well in the great majority of the specific cases. The increased accuracy is achieved particularly when the investigators do have some prior indification on the distribution of the ages at onset for those affected by the disorder.

Ron Kennet
"Robust design and rapid development from computer simulations"

Modern companies are under increasing pressure to reduce development time and to provide robust products. The TITOSIM project, funded by the European Community and headed by Fiat Research, developed statistical methodology and tools for using computer simulations to achieve these product development goals. Computer experiments are often conducted in order to optimize product performance while respecting constraints that may be imposed. Several methods for achieving robust design in this context are described and compared with the aid of a simple example problem. The methods presented compare classical as well as modern approaches and introduce the idea of a "stochastic response" to aid the search for robust solutions. Emphasis is placed on the efficiency of each method with respect to computational cost and the ability to formulate objectives that encapsulate the notion of robustness.

This is joint work with Ron Bates and David Steinberg.

Hila Ginsburg
"Designing experiments in robust design problems"

In the last decades the method of Robust Design that was originally suggested by Taguchi has been widely applied to various engineering areas. Usually, when a designer aims for a robust design of a system with unknown analytical form, he or she follows a two-step procedure. First, he or she fits a response function for the unknown system by using experimental arrays that are based on known design of experiments (DOE) criteria. Second, once the response function has been established, he or she formulates a Robust-Design criterion and solves it to obtain an optimal robust configuration.

In this work, we aim to combine both steps in a unified yet sequential DOE protocol. In particular, we suggest a methodology for designing experiments that minimize the variance of the optimal robust configuration. In other words, the variance of the optimal solution for the robust system is minimized already at the DOE stage. This new DOE optimal criteria prioritizes the various response's coefficients and enables the designer to indicate which coefficients should be estimated more accurately with respect to others in order to obtain a reliable robust solution.

The suggested method provides more information on the optimal robust solution by generating a (multidimensional) distribution of it. Numerical examples will be presented for illustration purposes.

Yechezkiel (Eli) Kling
"Aspects of multiplicity in statistical process control"

The area of Statistical Process Control (SPC) is explored in search of practical uses for the FDR. First, the "Multiplicity Problem", is reviewed in the context of Statistical Process Control (SPC). A few SPC situations that give rise to multiplicity are discussed and the concept of the p-value in this context is examined. Several possibilities for the incorporation of the p-values into SPC graphical display are examined. The p-valued SPC chart suggested is a simple, consistent, and intuitive display. This type of presentation enables the utilization of multiplicity protection methods without inhibiting the lay-users. The appropriateness of the FWE and FDR in the SPC situations is reviewed. Furthermore, a new family measure is constructed based on the discovery costs - the False Discovery Cost Ratio (FDCR). A p-value base FDCR controlling procedure is obtained by applying the Benjamini-Hochberg (1997) Weighted Linear Step-up procedure to the set of p-values corresponding to the individual hypotheses and to the intersection hypothesis; weighting them by the appropriate discovery costs. Three possible statistics for testing the intersection hypothesis are examined: Simes' statistic, Fisher's statistic and Hotelling's T2. It is shown that when the Simes' statistic is used for this purpose it ensures that the FDCR is controlled by the proposed procedure.

Stephen Fienberg
"Characterizing probability distributions associated with multi-dimensional contingency tables"

We review alternative ways to characterize probability distributions associated with two-way tables of counts (contingency tables) using marginals, conditionals, and odds ratios and their generalizations to higher dimensions. Partial specification of such distributions arises in a number of statistical contexts and usually involves the use of log-linear models or the dropping of components from complete specifications. We link both of these approaches to recent developments in algebraic geometry and discuss the insights and new tools that such linkages bring to statistical methodology. Practical statistical problems arising in disclosure limitation have provided ongoing motivation to these developments as well as an outlet for application.

Ulrich Stadtmuller
"Generalized linear models with functional data"

In the talk a generalized linear regression model for a regression situation where the response variable is a scalar and the predictor is a random function will be proposed. A linear predictor is obtained by forming the scalar product of the predictor function with a smooth parameter function, and the expected value of the response is related to this linear predictor via a link function. If, in addition, a variance function is specified, this leads to a functional estimating equation which corresponds to maximizing a functional quasi-likelihood. This general approach includes the special cases of the functional linear model, as well as functional Poisson regression and functional binomial regression. The latter leads to procedures for classification and discrimination of stochastic processes and functional data. We also consider the situation where link and variance functions are unknown and are estimated nonparametrically from the data. As an application, the classification of medflies in regard to their remaining longevity status depending on their fertility will be discussed.

Laurence Freedman
"A new method for dealing with measurement error in explanatory variables of regression models"

I will introduce a new method, moment reconstruction, of correcting for measurement error in covariates in regression models. The central idea is similar to regression calibration in that the values of the covariates that are measured with error are replaced by "adjusted" values. In regression calibration the adjusted value is the expectation of the true value conditional on the measured value. In moment reconstruction the adjusted value is a variance-preserving shrinkage estimate of the true value conditional on the outcome variable. The adjusted values have the same first two moments and the same covariance with the outcome variable as the unobserved "true" covariate values. Unlike regression calibration, moment reconstruction can deal with differential measurement error. For case-control studies with logistic regression and covariates that are normally distributed within cases and controls, the resulting estimates of the regression coefficients are consistent. In simulations of logistic regression, moment reconstruction carries less bias than regression calibration, and for case-control studies is superior in mean square error to the standard regression calibration approach. I will give an example of the use of moment reconstruction in linear discriminant analysis and a non-standard problem where we wish to adjust a classification tree for measurement error in the explanatory variables.

Dan Geiger
"A new software for genetic linkage analysis"

Genetic linkage analysis is a useful statistical tool for mapping disease genes and for associating functionality of genes to their location on the chromosome. I will describe a program, called SUPERLINK, for linkage analysis and demonstrate its performance. I will focus on the relevant combinatorial problems that need to be solved in order to optimize the running time and space requirements of these types of programs, and on some new capabilities of this software. The talk is intended for audience with no background in Genetics. Joint work with Ma'ayan Fishelson.

Vyacheslav Abramov
"Asymptotic methods for communication networks"

This talk is concerned with the study of non-Markovian queueing systems and networks, with applications to communication networks. Its main contribution consists in deriving results for non-Markovian systems that have been obtained so far only for Markovian queueing systems. We study large closed client/server communication networks and losses in single-server queueing systems, with an application to communication networks of loss queues. We apply stochastic calculus and the theory of martingales to the case when one of the client stations is a bottleneck in the limit, where the total number of tasks in the server increases to infinity. The main results of this study are (i) an explicit expression for the interrelation between the limiting non-stationary distributions in non-bottleneck client stations; thus when one distribution is found in a simulation, the others can be computed; (ii) derivation of diffusion and fluid approximations for the non-Markovian queue length in the bottleneck client station. For the loss networks considered, we find an asymptotic expression for the loss probability and other performance measures, as buffer capacity increases to infinity. We also find the changes in the loss probability when redundant packets are added to the messages. The application of martingale methods for the study of the asymptotic behavior of non-Markovian queueing systems seems to be new.

Daniel Yekutieli
"FDR confidence intervals"

Confidence intervals are often constructed only for parameters selected after viewing the data. The problem with this practice is that the selected intervals fail to provide the assumed coverage probability. To overcome this problem I will introduce the FCR - a measure of the intervals` coverage following selection, and a general procedure offering FCR control under any selection rule. I will discuss the new procedure and its relation to the Benjamini-Hochberg procedure and bring theoretical results for independent and positively dependent test statistics.

Albert Vexler
"Guaranteed maximum likelihood splitting tests of a linear regression model"

We propose and examine a class of generalized maximum likelihood asymptotic power one tests for detection of various types of changes in a linear regression model. In economic and epidemiologic studies, such segmented regression models often occur as threshold models, where it is assumed that the exposure has no influence on the response up to a possible unknown threshold. An important task of such studies is testing the existence and estimation of this threshold. Guaranteed non-asymptotic upper bounds for the significance levels of these tests are presented. We demonstrate how the proposed tests were applied toward solving an actual problem encountered with real data.

An application: According to one theoretical hypothesis, the revenue of establishments that engage in Research and Development in Israeli economy and have a small number of employed persons does not depend on the technological intensity of establishment's activities (e.g. Griliches and Regev (1999)). This observation might be associated with the fact that the criteria for categorizing a "small" firm are not clearly defined. However, if we do not reject the aforementioned hypothesis, it is possible to estimate threshold number of employed persons that define a firm as "small". To investigate this question, we used the proposed approach presented in this talk.

Yoav Dvir
"Local likelihood methods for nonparametric density estimation"

Local likelihood density estimation methods have the advantage that they provide high order bias reduction for multivariate density estimation. We review different methods of local likelihood and present analysis of the asymptotic behavior for the different methods. Then we propose two new estimators that rely on the current local likelihood methods. This work is done under the supervision of Prof. David Steinberg.

Inna Stainvas
"A generative probabilistic oriented wavelet model for texture segmentation"

This talk addresses image segmentation via a generative model approach. A Bayesian network (BNT) in the space of dyadic wavelet transform coefficients is introduced to model texture images. The model is similar to a Hidden Markov model (HMM), but with non-stationary transitive conditional probability distributions. It is composed of discrete hidden variables and observable Gaussian outputs for wavelet coefficients. In particular, the Gabor wavelet transform is considered. The introduced model is compared with the simplest joint Gaussian probabilistic model for Gabor wavelet coefficients for several textures from the Brodatz album. The comparison is based on cross-validation and includes probabilistic model ensembles instead of single models. In addition, the robustness of the models to cope with additive Gaussian noise is investigated. We further study the feasibility of the introduced generative model for image segmentation in the novelty detection framework. Two examples are considered: (i) sea surface pollution detection from intensity images and (ii) image segmentation of the still images with varying illumination across the scene.

Yuval Nov
"Modeling and analysis of protein design under resource constraints"

The potency, or fitness, of a protein-based drug can be enhanced by changing the sequence of its underlying protein. We present a novel stochastic model for the sequence--fitness relation, and estimate its four parameters from industrial data. Using this model, we formulate, analyze, and solve two variants of the protein design problem. In the single-period design problem, the designer needs to decide under capacity constraints which set of sequences to screen in order to maximize the expected fitness of the best sequence in the set. In the more general two-period design problem, the designer can afford two screening rounds, and needs to allocate resources optimally across the two periods to maximize the same objective function. Analytical and simulation results allow us to identify promising design strategies for various parameter regimes.

Joint work with Prof. Larry Wein.