

5 March 
James A. Evans, University of Chicago 
19 March 
Nalini Ravishanker, University of Connecticut 






4 June 
Giles Hooker, Cornell 
11 June 
Judith Somekh, Haifa University 






23 October 
Adam Kapelner, City University of New York 

Harmonizing Fully Optimal Designs with Classic Randomization in Fixed Trial Experiments 
6 November 
Daniel Nevo, TAU 

LAGO: The adaptive LearnAsyouGO design for multistage intervention studies 
27 November 
Liran Katzir, Final Ltd. 


25 December 
Bella VakulenkoLagun, Harvard 

Some methods to recover from selection bias in survival data 
1 January 
Meir Feder, TAU 


8 January 
Adi Berliner Senderey, Clalit 

Effective implementation of evidence based medicine in Healthcare 
















Seminars are held on Tuesdays, 10.30 am, Schreiber Building, 309 (see the TAU map ). The seminar organizer is Daniel Yekutieli.
To join the seminar mailing list or any other inquiries  please call (03)6409612 or email 12345yekutiel@post.tau.ac.il54321 (remove numbers unless you are a spammer…)
Seminars from previous years
ABSTRACTS
LAGO: The adaptive LearnAsyouGO design for multistage intervention studies
In largescale publichealth intervention studies, the intervention is a package consisting of multiple components. The intervention package is chosen in a small pilot study and then implemented in largescale setup. However, for various reasons I will discuss, this approach can lead the an implementation failure.
In this talk, I will present a new design, called the learnasyougo (LAGO) adaptive design. In the LAGO design, the intervention package is adapted in stages during the study
based on past outcomes. Typically, an effective intervention package is sought, while minimizing cost. The main complication when analyzing data from a LAGO is that interventions in later stages depend upon the outcomes in the previous stages. Under the setup of logistic regression, I will present asymptotic theory for LAGO studies and tools that can be used by researchers in practice. The LAGO design will be illustrated via application to the BetterBirth Study, which aimed to improve maternal and neonatal outcomes in India.
· Adam Kapelner, City University of New York
Harmonizing Fully Optimal Designs with Classic Randomization in Fixed Trial Experiments
There is a movement in design of experiments away from the classic randomization put forward by Fisher, Cochran and others to one based on optimization. In fixedsample trials comparing two groups, measurements of subjects are known in advance and subjects can be divided optimally into two groups based on a criterion of homogeneity or "imbalance" between the two groups. These designs are far from random. This talk seeks to understand the benefits and the costs over classic randomization in the context of different performance criterions such as Efron's worstcase analysis. In the criterion that we motivate, randomization beats optimization. However, the optimal design is shown to lie between these two extremes. Muchneeded further work will provide a procedure to find this optimal designs in different scenarios in practice. Until then, it is best to randomize.
· Liran Katzir, financial algorithms researcher at Final Ltd.
Social network size estimation via sampling
This presentation addresses the problem of estimating the number of users in online social networks. While such networks occasionally publish user numbers, there are good reasons to validate their reports. The proposed algorithm can also estimate the cardinality of network subpopulations. Since this information is seldom voluntarily divulged, algorithms must limit themselves to the social networks’ public APIs. No other external information can be assumed. Additionally, due to obvious traffic and privacy concerns, the number of API requests must also be severely limited. Thus, the main focus is on minimizing the number of API requests needed to achieve good estimates. Our approach is to view a social network as an undirected graph and use the public interface to produce a random walk. By counting the number of collisions, an estimate is produced using a nonuniform samples version of the birthday paradox. The algorithms are validated on several publicly available social network datasets.
· Bella VakulenkoLagun, Harvard
Some methods to recover from selection bias in survival data
We consider several study designs resulting in truncated survival data. First, we look at a study with delayed entry, where the left truncation time and the lifetime of interest are dependent. The critical assumption in using standard methods for truncated data is the assumption of quasiindependence or factorization. If this condition does not hold, the standard methods cannot be used. We address one specific scenario that can result in dependence between truncation and event times  this is covariatesinduced dependent truncation. While in regression models for timetoevent data this type of dependence does not present any problem, in nonparametric estimation of the lifetime distribution P(X), ignoring the dependence might cause bias. We propose two methods that are able to account for this dependence and allow consistent estimation of P(X).
Our estimators for dependently truncated data will be inefficient if we use them when there is no dependence between truncation and event times. Therefore it is important to test for independence. The common knowledge is that we can test for quasiindependence, that is "independence in the observable region". We derived two other conditions, called factorization conditions, which are indistinguishable from quasiindependence, given data at hand. This means that in the standard analysis of truncated data, when we assume quasiindependence, we ultimately make an untestable assumption in order to estimate the distribution of the target lifetime. This nonidentifiability problem has not been recognized before.
Finally, we consider retrospectively ascertained timetoevent data resulting in right truncation, and discuss estimation of regression coefficients in the Cox model. We suggest an approach that incorporates external information in order to solve the problem of nonpositivity that often happens with righttruncated data.
Universal Learning for Individual Data
Universal learning is considered from an information theoretic point of view following the universal prediction approachoriginated by Solomonoff, Kolmogorov, Rissanen, Cover, Ziv and others and developed in the 90's by F&Merhav. Interestingly, the extension to learning is not straightforward. In previous works we considered online learning and supervised learning in a stochastic setting. Yet, the most challenging case is batch learning where prediction is done on a test sample once the entire training data is observed, in the individual setting where the features and labels, both of the training and test, are specific individual quantities.
Our results provide schemes that for any individual data compete with a "genie" (or reference) that knows the true test label. We suggest design criteria and develop the corresponding universal learning schemes, where the main proposed scheme is termed Predictive Normalized Maximum Likelihood (pNML). We demonstrate that pNML learning and its variations provide robust, "stable" learning solutions that outperforms the current leading approach based on Empirical Risk Minimization (ERM). Furthermore, the pNML construction provides a pointwise indication for the learnability. This measure the uncertainty in learning the specific test challenge with the given training examples letting the learner know when it does not know.
Joint work with Yaniv Fogel and Koby Bibas
· Adi Berliner Senderey, Clalit
Effective implementation of evidence based medicine in Healthcare
Two projects illustrating use of data for determining effective treatment policies are presented.
1. Machine Learning in Healthcare – Shifting the Focus to Fairness – by Noam Barda
This project deals with an algorithm for improving fairness in predictive models. The method is meant to address concerns regarding potential unfairness of prediction models towards groups which are underrepresented in the training dataset and thus might receive uncalibrated scores. the algorithm was implemented on widely used risk models, including the ACC/AHA 2013 model for cardiovascular events and the FRAX model for osteoporotic fractures, and tested on a large real world sample. Based on a joint work with Noa Dagan, Guy Rothblum, Gal Yona, Ran Balicer and Eitan Bachmat.
2. Rates of Ischemic stroke, Death and Bleeding in Men and Women with NonValvular Atrial Fibrillation –by Adi Berliner Senderey
Data regarding the thromboembolic risk and differences in outcomes in men and women with nonvalvular atrial fibrillation (NVAF) are inconsistent. The aim of the present study is to evaluate differences in treatment strategies and risk of ischemic stroke, death, and bleeding between men and women in a large, populationbased cohort of individuals with nonvalvular AF (NVAF). Based on a joint work with Yoav Arnson, Moshe Hoshen, Adi Berliner Senderey, Orna Reges, Ran Balicer, Morton Leibowitz, Meytal Avgil Tsadok, Moti Haim