Book:
|
Statistical Theory: A Concise Introduction (Second Edition)
Abramovich, F. and Ritov, Y. (2013, 2023).
Chapman & Hall/CRC
Summary |
Link to publisher
|
Summary: Designed for a one-semester advanced undergraduate or graduate course, Statistical
Theory: A Concise Introduction clearly explains the underlying ideas and principles of major
statistical concepts, including parameter estimation, confidence intervals, hypothesis testing,
asymptotic analysis, Bayesian inference, linear models, nonparametric estimation, and elements of decision theory. It introduces these
topics on a clear intuitive level using illustrative examples in addition to the formal
definitions, theorems, and proofs.
Based on the authors’ lecture notes, the book is self-contained, which maintains a proper balance between the clarity and rigor of exposition.
In a few cases, the authors present a "sketched" version of a proof, explaining its main ideas rather than giving detailed technical mathematical
and probabilistic arguments.
|
Preprints:
|
Abramovich, F. (2023).
Statistical learning by sparse deep neural networks
arXiv:2311.08845
Abstract |
Link to paper on arXiv
|
Abstract: We consider a deep neural network estimator based on empirical risk minimization
with $l_1$-regularization. We derive a general bound for its excess risk in regression and
classification (including multiclass), and prove that it is adaptively nearly-minimax
(up to log-factors) simultaneously across the entire range of various function classes.
|
Abramovich, F. (2022).
Classification by sparse additive models
arXiv:2212.01792
Abstract |
Link to paper on arXiv
|
Abstract: We consider (nonparametric) sparse additive models (SpAM) for classification.
The design of a SpAM classifier is based on minimizing the logistic loss with a sparse group
Lasso/Slope-type penalties on the coefficients of univariate components' expansions in orthonormal series
(e.g., Fourier or wavelets). The resulting classifier is inherently adaptive to the unknown sparsity and smoothness.
We show that it is nearly-minimax (up to log-factors) within the entire range of analytic, Sobolev and Besov classes,
and illustrate its performance on the real-data example.
|
Published Research Papers:
|
Levy, T. and Abramovich, F. (2023).
Generalization error bounds for multiclass sparse linear classifiers
Journal of Machine Learning Research
24 (151), pp. 1-35.
Abstract |
Link to paper |
Link to journal |
Abstract: We consider high-dimensional multiclass classification by sparse multinomial logistic regression.
Unlike binary classification, in the multiclass setup one can think about an entire spectrum of possible notions of
sparsity associated with different structural assumptions on the regression coefficients matrix.
We propose a computationally feasible feature selection procedure based on penalized maximum likelihood with convex penalties
capturing a specific type of sparsity at hand. In particular, we consider global sparsity, double row-wise sparsity, and low-rank sparsity,
and show that with the properly chosen tuning parameters the derived plug-in classifiers attain the minimax generalization error bounds
(in terms of misclassification excess risk) within the corresponding classes of multiclass sparse linear classifiers.
The developed approach is general and can be adapted to other types of sparsity as well.
|
Abramovich, F., Grinshtein, V. and Levy, T. (2021).
Multiclass classification by sparse multinomial logistic regression
IEEE Transactions on Information Theory
67, pp. 4637-4646.
Abstract |
Link to paper |
Link to journal |
Abstract: In this paper we consider high-dimensional multi-class classification by sparse multinomial logistic regression.
We propose first a feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the
nonasymptotic bounds for misclassification excess risk of the resulting classifier. We establish also their tightness by deriving the corresponding
minimax lower bounds. In particular, we show that there is a phase transition between small and large number of classes. The bounds can be reduced
under the additional low noise condition. To find a penalized maximum likelihood solution with a complexity penalty requires, however, a
combinatorial search over all possible models. To design a feature selection procedure computationally feasible for high-dimensional data, we
propose multinomial logistic group Lasso and Slope classifiers and show that they also achieve the minimax order.
|
Abramovich, F. and Pensky, M. (2019).
Classification with many classes: challenges and pluses
Journal of Multivariate Analysis
174, 104536.
Abstract |
Link to paper |
Link to journal |
Abstract: The objective of the paper is to study accuracy of multi-class classification
in high-dimensional setting, where the number of classes is also large (``large $L$, large $p$, small $n$'' model).
While this problem arises in many practical applications and many techniques have been recently developed for its solution,
to the best of our knowledge nobody provided a rigorous theoretical analysis of this important setup. The purpose of the present paper is to
fill in this gap.
We consider one of the most common settings, classification of high-dimensional normal vectors where,
unlike standard assumptions, the number of classes could be large.
We derive non-asymptotic conditions on effects of significant
features, and the low and the upper bounds for distances between classes required for successful feature selection
and classification with a given accuracy. Furthermore, we study an asymptotic setup
where the number of classes is growing with the dimension of feature space and while
the number of samples per class is possibly limited.
We discover an interesting and, at first glance, somewhat counter-intuitive phenomenon that a large number of classes
may be a ``blessing'' rather than a ``curse'' since, in certain settings,
the precision of classification can improve as the number of classes grows. This is due to more accurate feature selection
since even weaker significant features, which are not sufficiently strong to be manifested in a coarse classification,
can nevertheless have a strong impact when the number of classes is large.
We supplement our theoretical investigation by a simulation study and a real data example where we again observe the above phenomenon.
|
Hochman, A., Saaroni, H., Abramovich, F. and Alpert, P. (2019).
Artificial detection of lower frequency periodicity in climatic studies by wavelet analysis demonstrated on synthetic time series
Journal of Applied Meteorology and Climatology
58, pp. 2077-2086.
Abstract |
Link to
paper |
Link to journal |
Abstract: The Continuous Wavelet Transform (CWT) is a frequently used tool to study periodicity in climate and other time series.
Periodicity plays a significant role in climate reconstruction and prediction. In numerous studies, the use of CWT revealed Dominant Periodicity (DP)
in climatic time series. Several studies suggested that these "natural oscillations" would even reverse global warming. It is shown here that the results
of wavelet analysis for detecting DPs can be miss-interpreted in the presence of local singularities that are manifested in lower frequencies.
This may lead to false DPs detection. In CWT analysis of synthetic and real-data climatic time series, with local singularities,
CWT indicates on a low frequency DP even if there is no true periodicity in the time series. It is argued that this is an inherent general property of CWT.
Hence, applying CWT to climatic time series should be re-evaluated and more careful analysis of the entire wavelet power spectrum is required,
focusing on high frequencies as well. Thus, a cone-like shape in the wavelet power spectrum most likely indicates the presence of a local singularity
in the time series rather than a DP, even if the local singularity has an observational or a physical basis. It is shown that analyzing the
derivatives of the time series may be helpful in interpreting the wavelet power spectrum. Nevertheless, this is only a partial remedy that does not
completely neutralize the effects caused by the presence of local singularities.
|
Abramovich, F. and Grinshtein, V. (2019).
High-dimensional classification by sparse logistic regression
IEEE Transactions on Information Theory
65, pp. 3068-3079.
Abstract |
Link to
paper |
Link to journal |
Abstract: We consider high-dimensional binary classification by sparse logistic regression.
We propose a model/feature selection procedure based on penalized maximum likelihood with a complexity penalty on
the model size and derive
the non-asymptotic bounds for the resulting misclassification excess risk.
The bounds can be reduced under the additional low-noise condition. The proposed
complexity penalty is remarkably related to the VC-dimension of a set of sparse linear classifiers.
Implementation of any complexity penalty-based criterion, however, requires a combinatorial search over all possible
models. To find a model selection procedure computationally feasible
for high-dimensional data, we extend the Slope estimator for logistic regression
and show that under an additional weighted restricted eigenvalue condition it is
rate-optimal in the minimax sense.
|
Abramovich, F., De Canditiis, D. and Pensky, M. (2018).
Solution of linear ill-posed problems by model selection and aggregation
Electronic Journal of Statistics
12, pp. 1822-1841.
Abstract |
Link to paper |
Link to journal |
Abstract: We consider a general statistical linear inverse problem,
where the solution is represented via a known (possibly overcomplete) dictionary
that allows its sparse representation. We propose two different approaches.
A model selection estimator selects a single model by
minimizing the penalized empirical risk over all possible models.
By contrast with direct problems, the penalty depends on the model
itself rather than on its size only as for complexity penalties.
A Q-aggregate estimator averages over the entire collection of
estimators with properly chosen weights. Under mild
conditions on the dictionary, we establish oracle inequalities both
with high probability and in expectation for the two estimators.
Moreover, for the latter estimator these inequalities are sharp.
The proposed procedures are implemented numerically and
their performance is assessed by a simulation study.
|
Abramovich, F. and Grinshtein, V. (2016).
Model selection and minimax estimation in generalized linear models
IEEE Transactions on Information Theory
62, pp. 3721-3730.
Abstract |
Link to
paper |
Link to journal |
Abstract: We consider model selection in generalized linear models (GLM) for
high-dimensional data and propose a wide class of model selection criteria based on penalized
maximum likelihood with a complexity penalty on the model size. We derive a general nonasymptotic
upper bound for the expected Kullback-Leibler divergence between the true distribution of the data
and that generated by a selected model, and establish the corresponding minimax lower bounds for
sparse GLM. For the properly chosen (nonlinear) penalty, the resulting penalized maximum likelihood
estimator is shown to be asymptotically minimax and adaptive to the unknown sparsity. We discuss
also possible extensions of the proposed approach to model selection in GLM under additional
structural constraints and aggregation. |
Abramovich, F. and Grinshtein, V. (2013).
Estimation of a sparse group of sparse vectors
Biometrika
100, pp. 355-370.
Abstract |
Link to paper |
Link to journal |
Abstract: We consider estimating a sparse group of sparse normal mean vectors, based on
penalized likelihood estimation with complexity penalties on the number of nonzero mean vectors and
the numbers of their significant components, which can be performed by a fast algorithm. The
resulting estimators are developed within a Bayesian framework and can be viewed as maximum a
posteriori estimators. We establish their adaptive minimaxity over a wide range of sparse and dense
settings. A simulation study demonstrates the efficiency of the proposed approach, which
successfully competes with the sparse group lasso estimator. |
Abramovich, F., Pensky, M. and Rozenholc, Y. (2013).
Laplace deconvolution with noisy observations
Electronic Journal of Statistics
7, pp. 1094-1128.
Abstract |
Link to paper |
Link to journal |
Abstract: In the present paper we consider Laplace deconvolution problem for discrete noisy
data observed on an interval whose length $T_n$ may increase with the sample size. Although this
problem arises in a variety of applications, to the best of our knowledge, it has been given very
little attention by the statistical community. Our objective is to fill the gap and provide
statistical analysis of Laplace deconvolution problem with noisy discrete data. The main
contribution of the paper is an explicit construction of an asymptotically rate-optimal (in the
minimax sense) Laplace deconvolution estimator which is adaptive to the regularity of the unknown
function. We show that the original Laplace deconvolution problem can be reduced to nonparametric
estimation of a regression function and its derivatives on the interval of growing length $T_n$.
Whereas the forms of the estimators remain standard, the choices of the parameters and the minimax
convergence rates, which are expressed in terms of$ T^2_n/n$ in this case, are affected by the
asymptotic growth of the length of the interval. We derive an adaptive kernel estimator of the
function of interest, and establish its asymptotic minimaxity over a range of Sobolev classes. We
illustrate the theory by examples of construction of explicit expressions of Laplace deconvolution
estimators. A simulation study shows that, in addition to providing asymptotic optimality as the
number of observations tends to infinity, the proposed estimator demonstrates good performance in
finite sample examples. |
Abramovich, F. and Grinshtein, V. (2013).
Model selection in regression under structural constraints
Electronic Journal of Statistics
7, pp. 480-498.
Abstract |
Link to paper |
Link to journal |
Abstract: The paper considers model selection in regression under the additional structural
constraints on admissible models where the number of potential predictors miht be even larger than
the available sample size. We develop a Bayesian formalism which is used as a natural tool for
generating a wide class of model selection criteria based on penalized least squares estimation
with various complexity penalties associated with a prior on a model size. The resulting criteria
are adaptive to structural constraints. We establish the upper bound for the quadratic risk of the
resulting MAP estimator and the corresponding lower bound for the minimax risk over a set of
admissible models of a given size. We then specify the class of priors (and, therefore, the class
of complexity penalties) where for the “nearly-orthogonal” design the MAP estimator is
asymptotically at least nearly-minimax (up to a log-factor) simultaneously over an entire range of
sparse and dense setups. Moreover, when the numbers of admissible models are “small” (e.g.,
ordered variable selection) or, on the opposite, for the case of complete variable selection, the
proposed estimator achieves the exact minimax rates. |
Abramovich, F. and Grinshtein, V. (2011).
Model selection in Gaussian regression for high-dimensional data
In
Inverse Problems and High-Dimensional Estimation (Eds. Alquier, P., Gautier, E. and
Stoltz, G.), Lecture Notes in Statistics
203, Springer Berlin Heidelberg, pp. 159-170.
Abstract |
Link to paper |
Link to journal |
Abstract: We consider model selection in Gaussian regression, where the number of predictors
might be even larger than the number of observations. The proposed procedure is based on penalized
least square criteria with a complexity penalty on a model size.We discuss asymptotic properties of
the resulting estimators corresponding to linear and so-called $2k \ln(p/k)$-type nonlinear
penalties for nearly-orthogonal and multicollinear designs. We show that any linear penalty cannot
be simultaneously adapted to both sparse and dense setups, while $2k \ln(p/k)$-type penalties
achieve the wide adaptivity range.We also present Bayesian perspective on the procedure that
provides an additional insight and can be used as a tool for obtaining a wide class of penalized
estimators associated with various complexity penalties. |
Abramovich, F. and Grinshtein, V. (2010).
MAP model selection in Gaussian regression
Electronic Journal of Statistics
4, pp. 932-949.
Abstract |
Link to paper |
Link to journal |
Abstract: We consider a Bayesian approach to model selection in Gaussian linear regression,
where the number of predictors might be much larger than the number of observations. From a
frequentist view, the proposed procedure results in the penalized least squares estimation with a
complexity penalty associated with a prior on the model size. We investigate the optimality
properties of the resulting model selector. We establish the oracle inequality and specify
conditions on the prior that imply its asymptotic minimaxity within a wide range of sparse and
dense settings for “nearly-orthogonal” and “multicollinear” designs. |
Abramovich, F., Grinshtein, V., Petsa, A. and Sapatinas, T. (2010).
On Bayesian testimation and its application to wavelet thresholding
Biometrika
97, pp. 181-198.
Abstract |
Link to paper |
Link to journal |
Abstract: We consider the problem of estimating the unknown response function in the
Gaussian white noise model. We first utilize the recently developed Bayesian maximum a posteriori
testimation procedure of Abramovich et al. (2007) for recovering an unknown high-dimensional
Gaussian mean vector. The existing results for its upper error bounds over various sparse
$l_p$-balls are extended to more general cases. We show that, for a properly chosen prior on the
number of nonzero entries of the mean vector, the corresponding adaptive estimator is
asymptotically minimax in a wide range of sparse and dense lp-balls. The proposed procedure is then
applied in a wavelet context to derive adaptive global and level-wise wavelet estimators of the
unknown response function in the Gaussian white noise model. These estimators are then proven to
be, respectively, asymptotically near-minimax and minimax in a wide range of Besov balls. These
results are also extended to the estimation of derivatives of the response function. Simulated
examples are conducted to illustrate the performance of the proposed level-wise wavelet estimator
in finite sample situations, and to compare it with several existing counterparts. |
Abramovich, F., De Feis, I. and Sapatinas, T. (2009).
Optimal testing for additivity in multiple nonparametric regression
Annals of the Institute of Statistical Mathematics
61, pp. 691-714.
Abstract |
Link to paper |
Link to journal |
Abstract: We consider the problem of testing for additivity in the standard multiple
nonparametric regression model. We derive optimal (in the minimax sense) nonadaptive and adaptive
hypothesis testing procedures for additivity against the composite nonparametric alternative that
the response function involves interactions of second or higher orders separated away from zero in
$L^2([0, 1]^d)$-norm and also possesses some smoothness properties. In order to shed some light on
the theoretical results obtained, we carry out a wide simulation study to examine the finite sample
performance of the proposed hypothesis testing procedures and compare them with a series of other
tests for additivity available in the literature. |
Abramovich, F., Antoniadis, A. and Pensky, M. (2007).
Estimation of piecewise-smooth functions by amalgamated bridge regression splines
Sankhyā: The Indian Journal of Statistics
69, pp. 1-27.
Abstract |
Link to paper |
Link to
journal |
Abstract: We onsider nonparametric estimation of a one-dimensional piecewise-smooth function
observed with white Gaussian noise on an interval. We propose a two-step estimation procedure,
where one first detects jump points by a wavelet-based procedure and then estimates the function on
each smooth segment separately by bridge regression splines. We prove the asymptotc optimality (in
the minimax sense) of the resulting amalgamated bridge regression spline estimator and demonstrate
its efficiency on simulated and real data examples. |
Abramovich, F., Grinshtein, V. and Pensky, M. (2007).
On optimality of Bayesian testimation in the normal means problem
The Annals of Statistics
35, pp. 2261-2286.
Abstract |
Link to paper |
Link to journal |
Abstract: We consider a problem of recovering a high-dimensional vector $\mu$ observed in
white noise, where the unknown vector $\mu$ is assumed to be sparse. The objective of the paper is
to develop a Bayesian formalism which gives rise to a family of $l_0$-type penalties. The penalties
are associated with various choices of the prior distributions $\pi_n(\cdot)$ on the number of
nonzero entries of $\mu$ and, hence, are easy to interpret. The resulting Bayesian estimators lead
to a general thresholding rule which accommodates many of the known thresholding and model
selection procedures as particular cases corresponding to specific choices of $\pi_n(\cdot)$.
Furthermore, they achieve optimality in a rather general setting under very mild conditions on the
prior. We also specify the class of priors $\pi_n(\cdot)$ for which the resulting estimator is
adaptively optimal (in the minimax sense) for a wide range of sparse sequences and consider several
examples of such priors. |
Abramovich, F., Angelini, C. and De Canditiis, D. (2006).
Pointwise optimality of Bayesian wavelet estimators
Annals of the Institute of Statistical Mathematics
59, pp. 425-434.
Abstract |
Link to paper |
Link to journal |
Abstract: We consider pointwise mean squared errors of several known Bayesian wavelet
estimators, namely, posterior mean, posterior median and Bayes Factor, where the prior imposed on
wavelet coefficients is a mixture of an atom of probability zero and a Gaussian density. We show
that for the properly chosen hyperparameters of the prior, all the three estimators are (up to a
log-factor) asymptotically minimax within any prescribed Besov ball $B^s_{p,q}(M)$. We discuss the
Bayesian paradox and compare the results for the pointwise squared risk with those for the global
mean squared error. |
Abramovich, F. and Angelini, C. (2006).
Bayesian maximum a posteriori multiple testing procedure
Sankhyā: The Indian Journal of Statistics
68, pp. 436-460.
Abstract |
Link to paper |
Link to
journal |
Abstract: We consider a Bayesian approach to multiple hypothesis testing. A hierarchical
prior model is based on imposing a prior distribution $\pi(k)$ on the number of hypotheses arising
from alternatives (false nulls). We then apply the maximum a posteriori (MAP) rule to find the most
likely configuration of null and alternative hypotheses. The resulting MAP procedure and its
closely related step-up and step-down versions compare ordered Bayes factors of individual
hypotheses with a sequence of critical values depending on the prior. We discuss the relations
between the proposed MAP procedure and the existing frequentist and Bayesian counterparts. A more
detailed analysis is given for the normal data, where we show, in particular, that by choosing a
specific $\pi(k)$, the MAP procedure can mimic several known familywise error (FWE) and false
discovery rate (FDR) controlling procedures. The performance of MAP procedures is illustrated on a
simulated example. |
Abramovich, F. and Angelini, C. (2006).
Testing in mixed-effects FANOVA models
Journal of Statistical Planning and Inference
136, pp. 4326-4348.
Abstract |
Link
to paper |
Link to journal |
Abstract: We consider the testing problem in the mixed-effects functional analysis of
variance models. We develop asymptotically optimal (minimax) testing procedures for testing the
significance of functional global trend and the functional fixed effects based on the empirical
wavelet coefficients of the data. Wavelet decompositions allow one to characterize various types of
assumed smoothness conditions on the response function under the nonparametric alternatives. The
distribution of the functional random-effects component is defined in the wavelet domain and
captures the sparseness of wavelet representation for a wide variety of functions. The simulation
study presented in the paper demonstrates the finite sample properties of the proposed testing
procedures. We also applied them to the real data from the physiological experiments. |
Abramovich, F., Benjamini, Y., Donoho, D. and Johnstone, I. (2006).
Adapting to unknown sparsity by controlling the false discovery rate
The Annals of Statistics
34, pp. 584-653.
Abstract |
Link to paper |
Link to journal |
Abstract: We attempt to recover an $n$-dimensional vector observed in white noise, where n
is large and the vector is known to be sparse, but the degree of sparsity is unknown. We consider
three different ways of defining sparsity of a vector: using the fraction of nonzero terms;
imposing power-law decay bounds on the ordered entries; and controlling the $l_p$ norm for $p$
small. We obtain a procedure which is asymptotically minimax for $l_r$ loss, simultaneously
throughout a range of such sparsity classes. The optimal procedure is a data-adaptive thresholding
scheme, driven by control of the false discovery rate (FDR). FDR control is a relatively recent
innovation in simultaneous testing, ensuring that at most a certain expected fraction of the
rejected null hypotheses will correspond to false rejections. In our treatment, the FDR control
parameter $q_n$ also plays a determining role in asymptotic minimaxity. If $q=\lim q_n \in [0,1/2]$
and also $q_n>\gamma/\log(n), we get sharp asymptotic minimaxity, simultaneously, over a wide
range of sparse parameter spaces and loss functions. On the other hand, $q=\lim q_n \in (1/2,1]$
forces the risk to exceed the minimax risk by a factor growing with $q$. To our knowledge, this
relation between ideas in simultaneous inference and asymptotic decision theory is new. Our work
provides a new perspective on a class of model selection rules which has been introduced recently
by several authors. These new rules impose complexity penalization of the form 2log(potential model
size/actual model sizes). We exhibit a close connection with FDR-controlling procedures under
stringent control of the false discovery rate. |
Abramovich, F. and Benjamini, Y. (2005).
False Discovery Rate
In
Encyclopedia of Statistical Sciences (Eds. Kotz, S., Read, C. B., Balakrishnan, N.,
Vidakovic, B. and Johnson, N. L.),
4, John Wiley & Sons, Inc., pp. 2240-2243.
Abstract |
Link to paper |
Abstract: For the multiple hypotheses testing problem consider the proportion of falsely
rejected hypotheses (false discoveries) among the total number of rejections. The expected value of
this proportion, called the False Discovery Rate (FDR), is a useful criterion to control as an
alternative to the traditional familywise error rate (FWE) that suffers from low power properties
when the number of tested hypotheses is large. In a way, controlling FDR is adaptively inbetween
ignoring multiplicity altogether and a conservative control of FWE. Several FDR controlling
procedures are presented and others are reviewed. Various extensions and applications of the FDR
are discussed. |
Abramovich, F. and Heller, R. (2005).
Local functional hypothesis testing
Mathematical Methods of Statistics
14, pp. 253.
Abstract |
Link to paper |
Link to journal |
Abstract: We consider a standard “signal+white noise” model on the unit interval and
want to test whether the signal is present on a subinterval $\Omega\Delta \subseteq [0,1]$ of
length $\Delta$. The composite alternative is that the unknown signal f is separated away from zero
in terms of its average power $\gamma(f)=\left \| f \right \|^2_\Delta/\Delta$ on $\Omega\Delta$
and also possesses some regularity properties. We evaluate the asymptotically optimal (minimax)
rates for testing the presence of a signal on $\Omega\Delta$, where both the noise level and the
interval length tend to zero. We derive corresponding rate-optimal tests for local signal
detection. |
Abramovich, F., Antoniadis, A., Sapatinas, T. and Vidakovic, B. (2004).
Optimal testing in a fixed-effects functional analysis of variance model
International Journal of Wavelets, Multiresolution and Information Processing
2, pp. 323-349.
Abstract |
Link to
paper |
Link to
journal |
Abstract: We consider the testing problem in a fixed-effects functional analysis of variance
model. We test the null hypotheses that the functional main effects and the functional interactions
are zeros against the composite nonparametric alternative hypotheses that they are separated away
from zero in L2-norm and also possess some smoothness properties. We adapt the optimal (minimax)
hypothesis testing procedures for testing a zero signal in a Gaussian "signal plus noise" model to
derive optimal (minimax) non-adaptive and adaptive hypothesis testing procedures for the functional
main effects and the functional interactions. The corresponding tests are based on the empirical
wavelet coefficients of the data. Wavelet decompositions allow one to characterize different types
of smoothness conditions assumed on the response function by means of its wavelet coefficients for
a wide range of function classes. In order to shed some light on the theoretical results obtained,
we carry out a simulation study to examine the finite sample performance of the proposed functional
hypothesis testing procedures. As an illustration, we also apply these tests to a real-life data
example arising from physiology. Concluding remarks and hints for possible extensions of the
proposed methodology are also given. |
Abramovich, F., Amato, U. and Angelini, C. (2004).
On Optimality of Bayesian Wavelet Estimators
Scandinavian Journal of Statistics
31, pp. 217-234.
Abstract |
Link to paper
|
Link to
journal |
Abstract: We investigate the asymptotic optimality of several Bayesian wavelet estimators,
namely, posterior mean, posterior median and Bayes Factor, where the prior imposed on wavelet
coefficients is a mixture of a mass function at zero and a Gaussian density. We show that in terms
of the mean squared error, for the properly chosen hyperparameters of the prior, all the three
resulting Bayesian wavelet estimators achieve optimal minimax rates within any prescribed Besov
space $B^s_{p,q}$ for $p \ge 2$. For $1 \le p \le 2$, the Bayes Factor is still optimal for
$(2s+2)/(2s+1) \le p \le 2$ and always outperforms the posterior mean and the posterior median that
can achieve only the best possible rates for linear estimators in this case. |
Abramovich, F., Besbeas, P. and Sapatinas, T. (2002).
Empirical Bayes approach to block wavelet function estimation
Computational Statistics & Data Analysis
39, pp. 435-451.
Abstract |
Link
to paper |
Link to journal |
Abstract: Wavelet methods have demonstrated considerable success in function estimation
through term-by-term thresholding of the empirical wavelet coefficients. However, it has been shown
that grouping the empirical wavelet coefficients into blocks and making simultaneous threshold
decisions about all the coefficients in each block has a number of advantages over term-by-term
wavelet thresholding, including asymptotic optimality and better mean squared error performance in
finite sample situations. An empirical Bayes approach to incorporating information on neighbouring
empirical wavelet coefficients into function estimation that results in block wavelet shrinkage and
block wavelet thresholding estimators is considered. Simulated examples are used to illustrate the
performance of the resulting estimators, and to compare these estimators with several existing
non-Bayesian block wavelet thresholding estimators. It is observed that the proposed empirical
Bayes block wavelet shrinkage and block wavelet thresholding estimators outperform the non-Bayesian
block wavelet thresholding estimators in finite sample situations. An application to a data set
that was collected in an anaesthesiological study is also presented. |
Abramovich, F., Sapatinas, T. and Silverman, B.W. (2000).
Stochastic expansions in an overcomplete wavelet dictionary
Probability Theory and Related Fields
117, pp. 133-144.
Abstract |
Link to paper
|
Link to journal |
Abstract: We consider random functions defined in terms of members of an overcomplete
wavelet dictionary. The function is modelled as a sum of wavelet components at arbitrary positions
and scales where the locations of the wavelet components and the magnitudes of their coefficients
are chosen with respect to a marked Poisson process model. The relationships between the parameters
of the model and the parameters of those Besov spaces within which realizations will fall are
investigated. The models allow functions with specified regularity properties to be generated. They
can potentially be used as priors in a Bayesian approach to curve estimation, extending current
standard wavelet methods to be free from the dyadic positions and scales of the basis
functions. |
Abramovich, F. and Grinshtein, V. (1999).
Derivation of equivalent kernel for general spline smoothing: a systematic
approach
Bernoulli
5, pp. 359-379.
Abstract |
Link to paper |
Link to journal |
Abstract: We consider first the spline smoothing nonparametric estimation with variable
smoothing parameter and arbitrary design density function and show that the corresponding
equivalent kernel can be approximated by the Green function of a certain linear differential
operator. Furthermore, we propose to use the standard (in applied mathematics and engineering)
method for asymptotic solution of linear differential equations, known as the
Wentzel-Kramers-Brillouin method, for systematic derivation of an asymptotically equivalent kernel
in this general case. The corresponding results for polynomial splines are a special case of the
general solution. Then, we show how these ideas can be directly extended to the very general
L-spline smoothing. |
Abramovich, F. and Sapatinas, T. (1999).
Bayesian approach to wavelet decomposition and shrinkage
In
Bayesian inference in wavelet-based models (Eds. Muller, P. and Vidakovic, B.), Lecture
Notes in Statistics
141, Springer New York, pp. 33-50.
Abstract |
Link to
paper |
Link to journal |
Abstract: We consider Bayesian approach to wavelet decomposition. We show how prior
knowledge about a function's regularity can be incorporated into a prior model for its wavelet
coefficients by establishing a relationship between the hyperparameters of the proposed model and
the parameters of those Besov spaces within which realizations from the prior will fall. Such a
relation may be seen as giving insight into the meaning of the Besov space parameters themselves.
Furthermore, we consider Bayesian wavelet-based function estimation that gives rise to different
types of wavelet shrinkage in non-parametric regression. Finally, we discuss an extension of the
proposed Bayesian model by considering random functions generated by an overcomplete wavelet
dictionary. |
Abramovich, F. and Silverman, B.W. (1998).
Wavelet decomposition approaches to statistical inverse problems
Biometrika
85, pp. 115-129.
Abstract |
Link to paper |
Link to journal |
Abstract: A wide variety of scientific settings involve indirect noisy measurements where
one faces a linear inverse problem in the presence of noise. Primary interest is in some function
$f(t)$ but data are accessible only about some linear transform corrupted by noise. The usual
linear methods for such inverse problems do not perform satisfactorily when $f(t)$ is spatially
inhomogeneous. One existing nonlinear alternative is the wavelet-vaguelette decomposition method,
based on the expansion of the unknown $f(t)$ in wavelet series. In the vaguelette-wavelet
decomposition method proposed here, the observed data are expanded directly in wavelet series. The
performances of various methods are compared through exact risk calculations, in the context of the
estimation of the derivative of a function observed subject to noise. A result is proved
demonstrating that, with a suitable universal threshold somewhat larger than that used for standard
denoising problems, both the wavelet-based approaches have an ideal spatial adaptivity
property. |
Abramovich, F. and Bayvel, P. (1997).
Some statistical remarks on the derivation of BER in amplified optical communication
systems
IEEE Transactions on Communications
45, pp. 1032-1034.
Abstract |
Link to paper |
Link to journal |
Abstract: We consider the signal detection problem in amplified optical transmission systems
as a statistical hypothesis testing procedure, and we show that the detected signal has a
well-known chi-squared distribution. In particular, this approach considerably simplifies the
derivation of bit-error rate (BER). Finally, we discuss the accuracy of the Gaussian approximations
to the exact distributions of the signal. |
Abramovich, F. and Steinberg, D. (1996).
Improved inference in nonparametric regression using L_k-smoothing splines
Journal of Statistical Planning and Inference
49, pp. 327-341.
Abstract |
Link to
paper |
Link to journal |
Abstract: Smoothing splines are one of the most popular approaches to nonparametric
regression. Wahba (J. Roy. Statist. Soc. Ser. B 40 (1978) 364–372; 45 (1983) 133–150) showed
that smoothing splines are also Bayes estimates and used the corresponding prior model to derive
interval estimates for the regression function. Although the interval estimates work well on a
global basis, they can have poor local properties. The source of this problem is the use of a
global smoothing parameter. We introduce the notion of $L_k$-smoothing splines. These splines allow
for a variable smoothing parameter and can substantially improve local inference. |
Abramovich, F. and Benjamini, Y. (1996).
Adaptive thresholding of wavelet coefficients
Computational Statistics & Data Analysis
22, pp. 351-361.
Abstract |
Link to
paper |
Link to journal |
Abstract: Wavelet techniques have become an attractive and efficient tool in function
estimation. Given noisy data, its discrete wavelet transform is an estimator of the wavelet
coefficients. It has been shown by Donoho and Johnstone (Biometrika 81 (1994) 425–455) that
thresholding the estimated coefficients and then reconstructing an estimated function reduces the
expected risk close to the possible minimum. They offered a global threshold $\lambda \sim \sigma
\sqrt{2\log{n}}$ for $j > j_0$, while the coefficients of the first coarse j0 levels are always
included. We demonstrate that the choice of $j_0$ may strongly affect the corresponding estimators.
Then, we use the connection between thresholding and hypotheses testing to construct a thresholding
procedure based on the false discovery rate (FDR) approach to multiple testing of Benjamini and
Hochberg (J. Roy. Statist. Soc. Ser. B 57 (1995) 289–300). The suggested procedure controls the
expected proportion of incorrectly included coefficients among those chosen for the wavelet
reconstruction. The resulting procedure is inherently adaptive, and responds to the complexity of
the estimated function and to the noise level. Finally, comparing the proposed FDR based procedure
with the fixed global threshold by evaluating the relative mean-square-error across the various
test-functions and noise levels, we find the FDR-estimator to enjoy robustness of
MSE-efficiency. |
Abramovich, F. and Benjamini, Y. (1995).
Thresholding of wavelet coefficients as multiple hypotheses testing procedure
In
Wavelets and Statistics (Eds. Antoniadis, A. and Oppenheim, G.), Lecture Notes in
Statistics
103, Springer--Verlag, pp. 5-14.
Abstract |
Link to
paper |
Link to journal |
Abstract: Given noisy signal, its finite discrete wavelet transform is an estimator of
signal's wavelet expansion coefficients. An appropriate thresholding of coefficients for further
reconstruction of de-noised signal plays a key-role in the wavelet decomposition/reconstruction
procedure. [DJ1] proposed a global threshold $\lambda = \sigma \sqrt{2\log{n}}$ and showed that
such a threshold asymptotically reduces the expected risk of the corresponding wavelet estimator
close to the possible minimum. To apply their threshold for finite samples they suggested to always
keep coefficients of the first coarse $j_0$ levels. We demonstrate that the choice of $j_0$ may
strongly affect the corresponding estimators. Then, we consider the thresholding of wavelet
coefficients as a multiple hypotheses testing problem and use the False Discovery Rate (FDR)
approach to multiple testing of [BH1]. The suggested procedure controls the expected proportion of
incorrectly kept coefficients among those chosen for the wavelet reconstruction. The resulting
procedure is inherently adaptive, and responds to the complexity of the estimated function.
Finally, comparing the proposed FDR-threshold with that fixed global of Donoho and Johnstone by
evaluating the relative Mean-Square-Error across the various test-functions and noise levels, we
find the FDR-estimator to enjoy robustness of MSE-efficiency. |
Abramovich, F. (1993).
The asymptotic mean squared error of L-smoothing splines
Statistics & Probability Letters
18, pp. 179-182.
Abstract |
Link to
paper |
Link
to journal |
Abstract: We establish the asymptotical equivalence between L-spline smoothing and kernel
estimation. The equivalent kernel is used to derive the asymptotic mean squared error of the
L-smoothing spline estimator. The paper extends the corresponding results for polynomial spline
smoothing. |
Abramovich, F. (1988).
Some remarks on the robustness of LL-type fuzzy linear algebraic systems
BUSEFAL
36, pp. 98-105.
Link to
paper |
Link
to journal |
Abramovich, F., Wagenknecht, M. and Khurgin, Y.I. (1988).
Solution of LR-type fuzzy systems of linear algebraic equations
BUSEFAL
35, pp. 86-99.
Abstract |
Link to
paper |
Link
to journal |
Abstract: Systems of linear algebraic equations with LR-type fuzzy coefficients are
considered in this article. The notion of solution of LR-type fuzzy system is discussed. It is
shown that in general case the exact solution may not exist, so it is offered to find as appoximate
one (quasi-solution). It appears that such problem may be reduced to an ordinary (non-fussy)
non-linear optimization problem. The numerical example of application of this method is
provided. |