back to Home Page

Publications:

Matching entries: 0
settings...

Book:

Book Cover Book Cover
Statistical Theory: A Concise Introduction (Second Edition)
Abramovich, F. and Ritov, Y. (2013, 2023).
Chapman & Hall/CRC
Summary: Designed for a one-semester advanced undergraduate or graduate course, Statistical Theory: A Concise Introduction clearly explains the underlying ideas and principles of major statistical concepts, including parameter estimation, confidence intervals, hypothesis testing, asymptotic analysis, Bayesian inference, linear models, nonparametric estimation, and elements of decision theory. It introduces these topics on a clear intuitive level using illustrative examples in addition to the formal definitions, theorems, and proofs. Based on the authors’ lecture notes, the book is self-contained, which maintains a proper balance between the clarity and rigor of exposition. In a few cases, the authors present a "sketched" version of a proof, explaining its main ideas rather than giving detailed technical mathematical and probabilistic arguments.

Preprints:

Abramovich, F. (2023). Statistical learning by sparse deep neural networks
arXiv:2311.08845
Abstract: We consider a deep neural network estimator based on empirical risk minimization with $l_1$-regularization. We derive a general bound for its excess risk in regression and classification (including multiclass), and prove that it is adaptively nearly-minimax (up to log-factors) simultaneously across the entire range of various function classes.
Abramovich, F. (2022). Classification by sparse additive models
arXiv:2212.01792
Abstract: We consider (nonparametric) sparse additive models (SpAM) for classification. The design of a SpAM classifier is based on minimizing the logistic loss with a sparse group Lasso/Slope-type penalties on the coefficients of univariate components' expansions in orthonormal series (e.g., Fourier or wavelets). The resulting classifier is inherently adaptive to the unknown sparsity and smoothness. We show that it is nearly-minimax (up to log-factors) within the entire range of analytic, Sobolev and Besov classes, and illustrate its performance on the real-data example.

Published Research Papers:

Levy, T. and Abramovich, F. (2023). Generalization error bounds for multiclass sparse linear classifiers
Journal of Machine Learning Research 24 (151), pp. 1-35.
Abstract: We consider high-dimensional multiclass classification by sparse multinomial logistic regression. Unlike binary classification, in the multiclass setup one can think about an entire spectrum of possible notions of sparsity associated with different structural assumptions on the regression coefficients matrix. We propose a computationally feasible feature selection procedure based on penalized maximum likelihood with convex penalties capturing a specific type of sparsity at hand. In particular, we consider global sparsity, double row-wise sparsity, and low-rank sparsity, and show that with the properly chosen tuning parameters the derived plug-in classifiers attain the minimax generalization error bounds (in terms of misclassification excess risk) within the corresponding classes of multiclass sparse linear classifiers. The developed approach is general and can be adapted to other types of sparsity as well.
Abramovich, F., Grinshtein, V. and Levy, T. (2021). Multiclass classification by sparse multinomial logistic regression
IEEE Transactions on Information Theory 67, pp. 4637-4646.
Abstract: In this paper we consider high-dimensional multi-class classification by sparse multinomial logistic regression. We propose first a feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the nonasymptotic bounds for misclassification excess risk of the resulting classifier. We establish also their tightness by deriving the corresponding minimax lower bounds. In particular, we show that there is a phase transition between small and large number of classes. The bounds can be reduced under the additional low noise condition. To find a penalized maximum likelihood solution with a complexity penalty requires, however, a combinatorial search over all possible models. To design a feature selection procedure computationally feasible for high-dimensional data, we propose multinomial logistic group Lasso and Slope classifiers and show that they also achieve the minimax order.
Abramovich, F. and Pensky, M. (2019). Classification with many classes: challenges and pluses
Journal of Multivariate Analysis 174, 104536.
Abstract: The objective of the paper is to study accuracy of multi-class classification in high-dimensional setting, where the number of classes is also large (``large $L$, large $p$, small $n$'' model). While this problem arises in many practical applications and many techniques have been recently developed for its solution, to the best of our knowledge nobody provided a rigorous theoretical analysis of this important setup. The purpose of the present paper is to fill in this gap.

We consider one of the most common settings, classification of high-dimensional normal vectors where, unlike standard assumptions, the number of classes could be large. We derive non-asymptotic conditions on effects of significant features, and the low and the upper bounds for distances between classes required for successful feature selection and classification with a given accuracy. Furthermore, we study an asymptotic setup where the number of classes is growing with the dimension of feature space and while the number of samples per class is possibly limited. We discover an interesting and, at first glance, somewhat counter-intuitive phenomenon that a large number of classes may be a ``blessing'' rather than a ``curse'' since, in certain settings, the precision of classification can improve as the number of classes grows. This is due to more accurate feature selection since even weaker significant features, which are not sufficiently strong to be manifested in a coarse classification, can nevertheless have a strong impact when the number of classes is large. We supplement our theoretical investigation by a simulation study and a real data example where we again observe the above phenomenon.

Hochman, A., Saaroni, H., Abramovich, F. and Alpert, P. (2019). Artificial detection of lower frequency periodicity in climatic studies by wavelet analysis demonstrated on synthetic time series
Journal of Applied Meteorology and Climatology 58, pp. 2077-2086.
Abstract: The Continuous Wavelet Transform (CWT) is a frequently used tool to study periodicity in climate and other time series. Periodicity plays a significant role in climate reconstruction and prediction. In numerous studies, the use of CWT revealed Dominant Periodicity (DP) in climatic time series. Several studies suggested that these "natural oscillations" would even reverse global warming. It is shown here that the results of wavelet analysis for detecting DPs can be miss-interpreted in the presence of local singularities that are manifested in lower frequencies. This may lead to false DPs detection. In CWT analysis of synthetic and real-data climatic time series, with local singularities, CWT indicates on a low frequency DP even if there is no true periodicity in the time series. It is argued that this is an inherent general property of CWT. Hence, applying CWT to climatic time series should be re-evaluated and more careful analysis of the entire wavelet power spectrum is required, focusing on high frequencies as well. Thus, a cone-like shape in the wavelet power spectrum most likely indicates the presence of a local singularity in the time series rather than a DP, even if the local singularity has an observational or a physical basis. It is shown that analyzing the derivatives of the time series may be helpful in interpreting the wavelet power spectrum. Nevertheless, this is only a partial remedy that does not completely neutralize the effects caused by the presence of local singularities.
Abramovich, F. and Grinshtein, V. (2019). High-dimensional classification by sparse logistic regression
IEEE Transactions on Information Theory 65, pp. 3068-3079.
Abstract: We consider high-dimensional binary classification by sparse logistic regression. We propose a model/feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the non-asymptotic bounds for the resulting misclassification excess risk. The bounds can be reduced under the additional low-noise condition. The proposed complexity penalty is remarkably related to the VC-dimension of a set of sparse linear classifiers. Implementation of any complexity penalty-based criterion, however, requires a combinatorial search over all possible models. To find a model selection procedure computationally feasible for high-dimensional data, we extend the Slope estimator for logistic regression and show that under an additional weighted restricted eigenvalue condition it is rate-optimal in the minimax sense.
Abramovich, F., De Canditiis, D. and Pensky, M. (2018). Solution of linear ill-posed problems by model selection and aggregation
Electronic Journal of Statistics 12, pp. 1822-1841.
Abstract: We consider a general statistical linear inverse problem, where the solution is represented via a known (possibly overcomplete) dictionary that allows its sparse representation. We propose two different approaches. A model selection estimator selects a single model by minimizing the penalized empirical risk over all possible models. By contrast with direct problems, the penalty depends on the model itself rather than on its size only as for complexity penalties. A Q-aggregate estimator averages over the entire collection of estimators with properly chosen weights. Under mild conditions on the dictionary, we establish oracle inequalities both with high probability and in expectation for the two estimators. Moreover, for the latter estimator these inequalities are sharp. The proposed procedures are implemented numerically and their performance is assessed by a simulation study.
Abramovich, F. and Grinshtein, V. (2016). Model selection and minimax estimation in generalized linear models
IEEE Transactions on Information Theory 62, pp. 3721-3730.
Abstract: We consider model selection in generalized linear models (GLM) for high-dimensional data and propose a wide class of model selection criteria based on penalized maximum likelihood with a complexity penalty on the model size. We derive a general nonasymptotic upper bound for the expected Kullback-Leibler divergence between the true distribution of the data and that generated by a selected model, and establish the corresponding minimax lower bounds for sparse GLM. For the properly chosen (nonlinear) penalty, the resulting penalized maximum likelihood estimator is shown to be asymptotically minimax and adaptive to the unknown sparsity. We discuss also possible extensions of the proposed approach to model selection in GLM under additional structural constraints and aggregation.
Abramovich, F. and Lahav, T. (2015). Sparse additive regression on a regular lattice
Journal of the Royal Statistical Society: Series B (Statistical Methodology) 77, pp. 443-459.
Abstract: We consider estimation in a sparse additive regression model with the design pointson a regular lattice. We establish the minimax convergence rates over Sobolev classes andpropose a Fourier-based rate optimal estimator which is adaptive to the unknown sparsity andsmoothness of the response function. The estimator is derived within a Bayesian formalismbut can be naturally viewed as a penalized maximum likelihood estimator with the complexitypenalties on the number of non-zero univariate additive components of the response and onthe numbers of the non-zero coefficients of their Fourer expansions. We compare it with severalexisting counterparts and perform a short simulation study to demonstrate its performance
Abramovich, F. and Grinshtein, V. (2013). Estimation of a sparse group of sparse vectors
Biometrika 100, pp. 355-370.
Abstract: We consider estimating a sparse group of sparse normal mean vectors, based on penalized likelihood estimation with complexity penalties on the number of nonzero mean vectors and the numbers of their significant components, which can be performed by a fast algorithm. The resulting estimators are developed within a Bayesian framework and can be viewed as maximum a posteriori estimators. We establish their adaptive minimaxity over a wide range of sparse and dense settings. A simulation study demonstrates the efficiency of the proposed approach, which successfully competes with the sparse group lasso estimator.
Abramovich, F., Pensky, M. and Rozenholc, Y. (2013). Laplace deconvolution with noisy observations
Electronic Journal of Statistics 7, pp. 1094-1128.
Abstract: In the present paper we consider Laplace deconvolution problem for discrete noisy data observed on an interval whose length $T_n$ may increase with the sample size. Although this problem arises in a variety of applications, to the best of our knowledge, it has been given very little attention by the statistical community. Our objective is to fill the gap and provide statistical analysis of Laplace deconvolution problem with noisy discrete data. The main contribution of the paper is an explicit construction of an asymptotically rate-optimal (in the minimax sense) Laplace deconvolution estimator which is adaptive to the regularity of the unknown function. We show that the original Laplace deconvolution problem can be reduced to nonparametric estimation of a regression function and its derivatives on the interval of growing length $T_n$. Whereas the forms of the estimators remain standard, the choices of the parameters and the minimax convergence rates, which are expressed in terms of$ T^2_n/n$ in this case, are affected by the asymptotic growth of the length of the interval. We derive an adaptive kernel estimator of the function of interest, and establish its asymptotic minimaxity over a range of Sobolev classes. We illustrate the theory by examples of construction of explicit expressions of Laplace deconvolution estimators. A simulation study shows that, in addition to providing asymptotic optimality as the number of observations tends to infinity, the proposed estimator demonstrates good performance in finite sample examples.
Abramovich, F. and Grinshtein, V. (2013). Model selection in regression under structural constraints
Electronic Journal of Statistics 7, pp. 480-498.
Abstract: The paper considers model selection in regression under the additional structural constraints on admissible models where the number of potential predictors miht be even larger than the available sample size. We develop a Bayesian formalism which is used as a natural tool for generating a wide class of model selection criteria based on penalized least squares estimation with various complexity penalties associated with a prior on a model size. The resulting criteria are adaptive to structural constraints. We establish the upper bound for the quadratic risk of the resulting MAP estimator and the corresponding lower bound for the minimax risk over a set of admissible models of a given size. We then specify the class of priors (and, therefore, the class of complexity penalties) where for the “nearly-orthogonal” design the MAP estimator is asymptotically at least nearly-minimax (up to a log-factor) simultaneously over an entire range of sparse and dense setups. Moreover, when the numbers of admissible models are “small” (e.g., ordered variable selection) or, on the opposite, for the case of complete variable selection, the proposed estimator achieves the exact minimax rates.
Abramovich, F. and Grinshtein, V. (2011). Model selection in Gaussian regression for high-dimensional data
In Inverse Problems and High-Dimensional Estimation (Eds. Alquier, P., Gautier, E. and Stoltz, G.), Lecture Notes in Statistics 203, Springer Berlin Heidelberg, pp. 159-170.
Abstract: We consider model selection in Gaussian regression, where the number of predictors might be even larger than the number of observations. The proposed procedure is based on penalized least square criteria with a complexity penalty on a model size.We discuss asymptotic properties of the resulting estimators corresponding to linear and so-called $2k \ln(p/k)$-type nonlinear penalties for nearly-orthogonal and multicollinear designs. We show that any linear penalty cannot be simultaneously adapted to both sparse and dense setups, while $2k \ln(p/k)$-type penalties achieve the wide adaptivity range.We also present Bayesian perspective on the procedure that provides an additional insight and can be used as a tool for obtaining a wide class of penalized estimators associated with various complexity penalties.
Abramovich, F. and Grinshtein, V. (2010). MAP model selection in Gaussian regression
Electronic Journal of Statistics 4, pp. 932-949.
Abstract: We consider a Bayesian approach to model selection in Gaussian linear regression, where the number of predictors might be much larger than the number of observations. From a frequentist view, the proposed procedure results in the penalized least squares estimation with a complexity penalty associated with a prior on the model size. We investigate the optimality properties of the resulting model selector. We establish the oracle inequality and specify conditions on the prior that imply its asymptotic minimaxity within a wide range of sparse and dense settings for “nearly-orthogonal” and “multicollinear” designs.
Abramovich, F., Grinshtein, V., Petsa, A. and Sapatinas, T. (2010). On Bayesian testimation and its application to wavelet thresholding
Biometrika 97, pp. 181-198.
Abstract: We consider the problem of estimating the unknown response function in the Gaussian white noise model. We first utilize the recently developed Bayesian maximum a posteriori testimation procedure of Abramovich et al. (2007) for recovering an unknown high-dimensional Gaussian mean vector. The existing results for its upper error bounds over various sparse $l_p$-balls are extended to more general cases. We show that, for a properly chosen prior on the number of nonzero entries of the mean vector, the corresponding adaptive estimator is asymptotically minimax in a wide range of sparse and dense lp-balls. The proposed procedure is then applied in a wavelet context to derive adaptive global and level-wise wavelet estimators of the unknown response function in the Gaussian white noise model. These estimators are then proven to be, respectively, asymptotically near-minimax and minimax in a wide range of Besov balls. These results are also extended to the estimation of derivatives of the response function. Simulated examples are conducted to illustrate the performance of the proposed level-wise wavelet estimator in finite sample situations, and to compare it with several existing counterparts.
Abramovich, F., De Feis, I. and Sapatinas, T. (2009). Optimal testing for additivity in multiple nonparametric regression
Annals of the Institute of Statistical Mathematics 61, pp. 691-714.
Abstract: We consider the problem of testing for additivity in the standard multiple nonparametric regression model. We derive optimal (in the minimax sense) nonadaptive and adaptive hypothesis testing procedures for additivity against the composite nonparametric alternative that the response function involves interactions of second or higher orders separated away from zero in $L^2([0, 1]^d)$-norm and also possesses some smoothness properties. In order to shed some light on the theoretical results obtained, we carry out a wide simulation study to examine the finite sample performance of the proposed hypothesis testing procedures and compare them with a series of other tests for additivity available in the literature.
Abramovich, F., Antoniadis, A. and Pensky, M. (2007). Estimation of piecewise-smooth functions by amalgamated bridge regression splines
Sankhyā: The Indian Journal of Statistics 69, pp. 1-27.
Abstract: We onsider nonparametric estimation of a one-dimensional piecewise-smooth function observed with white Gaussian noise on an interval. We propose a two-step estimation procedure, where one first detects jump points by a wavelet-based procedure and then estimates the function on each smooth segment separately by bridge regression splines. We prove the asymptotc optimality (in the minimax sense) of the resulting amalgamated bridge regression spline estimator and demonstrate its efficiency on simulated and real data examples.
Abramovich, F., Grinshtein, V. and Pensky, M. (2007). On optimality of Bayesian testimation in the normal means problem
The Annals of Statistics 35, pp. 2261-2286.
Abstract: We consider a problem of recovering a high-dimensional vector $\mu$ observed in white noise, where the unknown vector $\mu$ is assumed to be sparse. The objective of the paper is to develop a Bayesian formalism which gives rise to a family of $l_0$-type penalties. The penalties are associated with various choices of the prior distributions $\pi_n(\cdot)$ on the number of nonzero entries of $\mu$ and, hence, are easy to interpret. The resulting Bayesian estimators lead to a general thresholding rule which accommodates many of the known thresholding and model selection procedures as particular cases corresponding to specific choices of $\pi_n(\cdot)$. Furthermore, they achieve optimality in a rather general setting under very mild conditions on the prior. We also specify the class of priors $\pi_n(\cdot)$ for which the resulting estimator is adaptively optimal (in the minimax sense) for a wide range of sparse sequences and consider several examples of such priors.
Abramovich, F., Angelini, C. and De Canditiis, D. (2006). Pointwise optimality of Bayesian wavelet estimators
Annals of the Institute of Statistical Mathematics 59, pp. 425-434.
Abstract: We consider pointwise mean squared errors of several known Bayesian wavelet estimators, namely, posterior mean, posterior median and Bayes Factor, where the prior imposed on wavelet coefficients is a mixture of an atom of probability zero and a Gaussian density. We show that for the properly chosen hyperparameters of the prior, all the three estimators are (up to a log-factor) asymptotically minimax within any prescribed Besov ball $B^s_{p,q}(M)$. We discuss the Bayesian paradox and compare the results for the pointwise squared risk with those for the global mean squared error.
Abramovich, F. and Angelini, C. (2006). Bayesian maximum a posteriori multiple testing procedure
Sankhyā: The Indian Journal of Statistics 68, pp. 436-460.
Abstract: We consider a Bayesian approach to multiple hypothesis testing. A hierarchical prior model is based on imposing a prior distribution $\pi(k)$ on the number of hypotheses arising from alternatives (false nulls). We then apply the maximum a posteriori (MAP) rule to find the most likely configuration of null and alternative hypotheses. The resulting MAP procedure and its closely related step-up and step-down versions compare ordered Bayes factors of individual hypotheses with a sequence of critical values depending on the prior. We discuss the relations between the proposed MAP procedure and the existing frequentist and Bayesian counterparts. A more detailed analysis is given for the normal data, where we show, in particular, that by choosing a specific $\pi(k)$, the MAP procedure can mimic several known familywise error (FWE) and false discovery rate (FDR) controlling procedures. The performance of MAP procedures is illustrated on a simulated example.
Abramovich, F. and Angelini, C. (2006). Testing in mixed-effects FANOVA models
Journal of Statistical Planning and Inference 136, pp. 4326-4348.
Abstract: We consider the testing problem in the mixed-effects functional analysis of variance models. We develop asymptotically optimal (minimax) testing procedures for testing the significance of functional global trend and the functional fixed effects based on the empirical wavelet coefficients of the data. Wavelet decompositions allow one to characterize various types of assumed smoothness conditions on the response function under the nonparametric alternatives. The distribution of the functional random-effects component is defined in the wavelet domain and captures the sparseness of wavelet representation for a wide variety of functions. The simulation study presented in the paper demonstrates the finite sample properties of the proposed testing procedures. We also applied them to the real data from the physiological experiments.
Abramovich, F., Benjamini, Y., Donoho, D. and Johnstone, I. (2006). Adapting to unknown sparsity by controlling the false discovery rate
The Annals of Statistics 34, pp. 584-653.
Abstract: We attempt to recover an $n$-dimensional vector observed in white noise, where n is large and the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing power-law decay bounds on the ordered entries; and controlling the $l_p$ norm for $p$ small. We obtain a procedure which is asymptotically minimax for $l_r$ loss, simultaneously throughout a range of such sparsity classes. The optimal procedure is a data-adaptive thresholding scheme, driven by control of the false discovery rate (FDR). FDR control is a relatively recent innovation in simultaneous testing, ensuring that at most a certain expected fraction of the rejected null hypotheses will correspond to false rejections. In our treatment, the FDR control parameter $q_n$ also plays a determining role in asymptotic minimaxity. If $q=\lim q_n \in [0,1/2]$ and also $q_n>\gamma/\log(n), we get sharp asymptotic minimaxity, simultaneously, over a wide range of sparse parameter spaces and loss functions. On the other hand, $q=\lim q_n \in (1/2,1]$ forces the risk to exceed the minimax risk by a factor growing with $q$. To our knowledge, this relation between ideas in simultaneous inference and asymptotic decision theory is new. Our work provides a new perspective on a class of model selection rules which has been introduced recently by several authors. These new rules impose complexity penalization of the form 2log(potential model size/actual model sizes). We exhibit a close connection with FDR-controlling procedures under stringent control of the false discovery rate.
Abramovich, F. and Benjamini, Y. (2005). False Discovery Rate
In Encyclopedia of Statistical Sciences (Eds. Kotz, S., Read, C. B., Balakrishnan, N., Vidakovic, B. and Johnson, N. L.), 4, John Wiley & Sons, Inc., pp. 2240-2243.
Abstract: For the multiple hypotheses testing problem consider the proportion of falsely rejected hypotheses (false discoveries) among the total number of rejections. The expected value of this proportion, called the False Discovery Rate (FDR), is a useful criterion to control as an alternative to the traditional familywise error rate (FWE) that suffers from low power properties when the number of tested hypotheses is large. In a way, controlling FDR is adaptively inbetween ignoring multiplicity altogether and a conservative control of FWE. Several FDR controlling procedures are presented and others are reviewed. Various extensions and applications of the FDR are discussed.
Abramovich, F. and Heller, R. (2005). Local functional hypothesis testing
Mathematical Methods of Statistics 14, pp. 253.
Abstract: We consider a standard “signal+white noise” model on the unit interval and want to test whether the signal is present on a subinterval $\Omega\Delta \subseteq [0,1]$ of length $\Delta$. The composite alternative is that the unknown signal f is separated away from zero in terms of its average power $\gamma(f)=\left \| f \right \|^2_\Delta/\Delta$ on $\Omega\Delta$ and also possesses some regularity properties. We evaluate the asymptotically optimal (minimax) rates for testing the presence of a signal on $\Omega\Delta$, where both the noise level and the interval length tend to zero. We derive corresponding rate-optimal tests for local signal detection.
Abramovich, F., Antoniadis, A., Sapatinas, T. and Vidakovic, B. (2004). Optimal testing in a fixed-effects functional analysis of variance model
International Journal of Wavelets, Multiresolution and Information Processing 2, pp. 323-349.
Abstract: We consider the testing problem in a fixed-effects functional analysis of variance model. We test the null hypotheses that the functional main effects and the functional interactions are zeros against the composite nonparametric alternative hypotheses that they are separated away from zero in L2-norm and also possess some smoothness properties. We adapt the optimal (minimax) hypothesis testing procedures for testing a zero signal in a Gaussian "signal plus noise" model to derive optimal (minimax) non-adaptive and adaptive hypothesis testing procedures for the functional main effects and the functional interactions. The corresponding tests are based on the empirical wavelet coefficients of the data. Wavelet decompositions allow one to characterize different types of smoothness conditions assumed on the response function by means of its wavelet coefficients for a wide range of function classes. In order to shed some light on the theoretical results obtained, we carry out a simulation study to examine the finite sample performance of the proposed functional hypothesis testing procedures. As an illustration, we also apply these tests to a real-life data example arising from physiology. Concluding remarks and hints for possible extensions of the proposed methodology are also given.
Abramovich, F., Amato, U. and Angelini, C. (2004). On Optimality of Bayesian Wavelet Estimators
Scandinavian Journal of Statistics 31, pp. 217-234.
Abstract: We investigate the asymptotic optimality of several Bayesian wavelet estimators, namely, posterior mean, posterior median and Bayes Factor, where the prior imposed on wavelet coefficients is a mixture of a mass function at zero and a Gaussian density. We show that in terms of the mean squared error, for the properly chosen hyperparameters of the prior, all the three resulting Bayesian wavelet estimators achieve optimal minimax rates within any prescribed Besov space $B^s_{p,q}$ for $p \ge 2$. For $1 \le p \le 2$, the Bayes Factor is still optimal for $(2s+2)/(2s+1) \le p \le 2$ and always outperforms the posterior mean and the posterior median that can achieve only the best possible rates for linear estimators in this case.
Abramovich, F. (2004). Discussion on the meeting on 'Statistical approaches to inverse problems'
Journal of the Royal Statistical Society: Series B (Statistical Methodology) 66, pp. 627-652.
Abramovich, F., Besbeas, P. and Sapatinas, T. (2002). Empirical Bayes approach to block wavelet function estimation
Computational Statistics & Data Analysis 39, pp. 435-451.
Abstract: Wavelet methods have demonstrated considerable success in function estimation through term-by-term thresholding of the empirical wavelet coefficients. However, it has been shown that grouping the empirical wavelet coefficients into blocks and making simultaneous threshold decisions about all the coefficients in each block has a number of advantages over term-by-term wavelet thresholding, including asymptotic optimality and better mean squared error performance in finite sample situations. An empirical Bayes approach to incorporating information on neighbouring empirical wavelet coefficients into function estimation that results in block wavelet shrinkage and block wavelet thresholding estimators is considered. Simulated examples are used to illustrate the performance of the resulting estimators, and to compare these estimators with several existing non-Bayesian block wavelet thresholding estimators. It is observed that the proposed empirical Bayes block wavelet shrinkage and block wavelet thresholding estimators outperform the non-Bayesian block wavelet thresholding estimators in finite sample situations. An application to a data set that was collected in an anaesthesiological study is also presented.
Abramovich, F., Sapatinas, T. and Silverman, B.W. (2000). Stochastic expansions in an overcomplete wavelet dictionary
Probability Theory and Related Fields 117, pp. 133-144.
Abstract: We consider random functions defined in terms of members of an overcomplete wavelet dictionary. The function is modelled as a sum of wavelet components at arbitrary positions and scales where the locations of the wavelet components and the magnitudes of their coefficients are chosen with respect to a marked Poisson process model. The relationships between the parameters of the model and the parameters of those Besov spaces within which realizations will fall are investigated. The models allow functions with specified regularity properties to be generated. They can potentially be used as priors in a Bayesian approach to curve estimation, extending current standard wavelet methods to be free from the dyadic positions and scales of the basis functions.
Abramovich, F., Bailey, T. C. and Sapatinas, T. (2000). Wavelet analysis and its statistical applications
Journal of the Royal Statistical Society: Series D (The Statistician) 49, pp. 1-29.
Abstract: In recent years there has been a considerable development in the use of wavelet methods in statistics. As a result, we are now at the stage where it is reasonable to consider such methods to be another standard tool of the applied statistician rather than a research novelty. With that in mind, this paper gives a relatively accessible introduction to standard wavelet analysis and provides a review of some common uses of wavelet methods in statistical applications. It is primarily orientated towards the general statistical audience who may be involved in analysing data where the use of wavelets might be effective, rather than to researchers who are already familiar with the field. Given that objective, we do not emphasize mathematical generality or rigour in our exposition of wavelets and we restrict our discussion to the more frequently employed wavelet methods in statistics. We provide extensive references where the ideas and concepts discussed can be followed up in greater detail and generality if required. The paper first establishes some necessary basic mathematical background and terminology relating to wavelets. It then reviews the more well-established applications of wavelets in statistics including their use in nonparametric regression, density estimation, inverse problems, changepoint problems and in some specialized aspects of time series analysis. Possible extensions to the uses of wavelets in statistics are then considered. The paper concludes with a brief reference to readily available software packages for wavelet analysis.
Abramovich, F. and Grinshtein, V. (1999). Derivation of equivalent kernel for general spline smoothing: a systematic approach
Bernoulli 5, pp. 359-379.
Abstract: We consider first the spline smoothing nonparametric estimation with variable smoothing parameter and arbitrary design density function and show that the corresponding equivalent kernel can be approximated by the Green function of a certain linear differential operator. Furthermore, we propose to use the standard (in applied mathematics and engineering) method for asymptotic solution of linear differential equations, known as the Wentzel-Kramers-Brillouin method, for systematic derivation of an asymptotically equivalent kernel in this general case. The corresponding results for polynomial splines are a special case of the general solution. Then, we show how these ideas can be directly extended to the very general L-spline smoothing.
Abramovich, F. and Sapatinas, T. (1999). Bayesian approach to wavelet decomposition and shrinkage
In Bayesian inference in wavelet-based models (Eds. Muller, P. and Vidakovic, B.), Lecture Notes in Statistics 141, Springer New York, pp. 33-50.
Abstract: We consider Bayesian approach to wavelet decomposition. We show how prior knowledge about a function's regularity can be incorporated into a prior model for its wavelet coefficients by establishing a relationship between the hyperparameters of the proposed model and the parameters of those Besov spaces within which realizations from the prior will fall. Such a relation may be seen as giving insight into the meaning of the Besov space parameters themselves. Furthermore, we consider Bayesian wavelet-based function estimation that gives rise to different types of wavelet shrinkage in non-parametric regression. Finally, we discuss an extension of the proposed Bayesian model by considering random functions generated by an overcomplete wavelet dictionary.
Abramovich, F., Sapatinas, T. and Silverman, B.W. (1998). Wavelet thresholding via a Bayesian approach
Journal of the Royal Statistical Society: Series B (Statistical Methodology) 60, pp. 725-749.
Abstract: We discuss a Bayesian formalism which gives rise to a type of wavelet threshold estimation in nonparametric regression. A prior distribution is imposed on the wavelet coefficients of the unknown response function, designed to capture the sparseness of wavelet expansion that is common to most applications. For the prior specified, the posterior median yields a thresholding procedure. Our prior model for the underlying function can be adjusted to give functions falling in any specific Besov space. We establish a relationship between the hyperparameters of the prior model and the parameters of those Besov spaces within which realizations from the prior will fall. Such a relationship gives insight into the meaning of the Besov space parameters. Moreover, the relationship established makes it possible in principle to incorporate prior knowledge about the function's regularity properties into the prior model for its wavelet coefficients. However, prior knowledge about a function's regularity properties might be difficult to elicit; with this in mind, we propose a standard choice of prior hyperparameters that works well in our examples. Several simulated examples are used to illustrate our method, and comparisons are made with other thresholding methods. We also present an application to a data set that was collected in an anaesthesiological study.
Abramovich, F. and Silverman, B.W. (1998). Wavelet decomposition approaches to statistical inverse problems
Biometrika 85, pp. 115-129.
Abstract: A wide variety of scientific settings involve indirect noisy measurements where one faces a linear inverse problem in the presence of noise. Primary interest is in some function $f(t)$ but data are accessible only about some linear transform corrupted by noise. The usual linear methods for such inverse problems do not perform satisfactorily when $f(t)$ is spatially inhomogeneous. One existing nonlinear alternative is the wavelet-vaguelette decomposition method, based on the expansion of the unknown $f(t)$ in wavelet series. In the vaguelette-wavelet decomposition method proposed here, the observed data are expanded directly in wavelet series. The performances of various methods are compared through exact risk calculations, in the context of the estimation of the derivative of a function observed subject to noise. A result is proved demonstrating that, with a suitable universal threshold somewhat larger than that used for standard denoising problems, both the wavelet-based approaches have an ideal spatial adaptivity property.
Abramovich, F. and Bayvel, P. (1997). Some statistical remarks on the derivation of BER in amplified optical communication systems
IEEE Transactions on Communications 45, pp. 1032-1034.
Abstract: We consider the signal detection problem in amplified optical transmission systems as a statistical hypothesis testing procedure, and we show that the detected signal has a well-known chi-squared distribution. In particular, this approach considerably simplifies the derivation of bit-error rate (BER). Finally, we discuss the accuracy of the Gaussian approximations to the exact distributions of the signal.
Abramovich, F. and Steinberg, D. (1996). Improved inference in nonparametric regression using L_k-smoothing splines
Journal of Statistical Planning and Inference 49, pp. 327-341.
Abstract: Smoothing splines are one of the most popular approaches to nonparametric regression. Wahba (J. Roy. Statist. Soc. Ser. B 40 (1978) 364–372; 45 (1983) 133–150) showed that smoothing splines are also Bayes estimates and used the corresponding prior model to derive interval estimates for the regression function. Although the interval estimates work well on a global basis, they can have poor local properties. The source of this problem is the use of a global smoothing parameter. We introduce the notion of $L_k$-smoothing splines. These splines allow for a variable smoothing parameter and can substantially improve local inference.
Abramovich, F. and Benjamini, Y. (1996). Adaptive thresholding of wavelet coefficients
Computational Statistics & Data Analysis 22, pp. 351-361.
Abstract: Wavelet techniques have become an attractive and efficient tool in function estimation. Given noisy data, its discrete wavelet transform is an estimator of the wavelet coefficients. It has been shown by Donoho and Johnstone (Biometrika 81 (1994) 425–455) that thresholding the estimated coefficients and then reconstructing an estimated function reduces the expected risk close to the possible minimum. They offered a global threshold $\lambda \sim \sigma \sqrt{2\log{n}}$ for $j > j_0$, while the coefficients of the first coarse j0 levels are always included. We demonstrate that the choice of $j_0$ may strongly affect the corresponding estimators. Then, we use the connection between thresholding and hypotheses testing to construct a thresholding procedure based on the false discovery rate (FDR) approach to multiple testing of Benjamini and Hochberg (J. Roy. Statist. Soc. Ser. B 57 (1995) 289–300). The suggested procedure controls the expected proportion of incorrectly included coefficients among those chosen for the wavelet reconstruction. The resulting procedure is inherently adaptive, and responds to the complexity of the estimated function and to the noise level. Finally, comparing the proposed FDR based procedure with the fixed global threshold by evaluating the relative mean-square-error across the various test-functions and noise levels, we find the FDR-estimator to enjoy robustness of MSE-efficiency.
Abramovich, F. and Benjamini, Y. (1995). Thresholding of wavelet coefficients as multiple hypotheses testing procedure
In Wavelets and Statistics (Eds. Antoniadis, A. and Oppenheim, G.), Lecture Notes in Statistics 103, Springer--Verlag, pp. 5-14.
Abstract: Given noisy signal, its finite discrete wavelet transform is an estimator of signal's wavelet expansion coefficients. An appropriate thresholding of coefficients for further reconstruction of de-noised signal plays a key-role in the wavelet decomposition/reconstruction procedure. [DJ1] proposed a global threshold $\lambda = \sigma \sqrt{2\log{n}}$ and showed that such a threshold asymptotically reduces the expected risk of the corresponding wavelet estimator close to the possible minimum. To apply their threshold for finite samples they suggested to always keep coefficients of the first coarse $j_0$ levels. We demonstrate that the choice of $j_0$ may strongly affect the corresponding estimators. Then, we consider the thresholding of wavelet coefficients as a multiple hypotheses testing problem and use the False Discovery Rate (FDR) approach to multiple testing of [BH1]. The suggested procedure controls the expected proportion of incorrectly kept coefficients among those chosen for the wavelet reconstruction. The resulting procedure is inherently adaptive, and responds to the complexity of the estimated function. Finally, comparing the proposed FDR-threshold with that fixed global of Donoho and Johnstone by evaluating the relative Mean-Square-Error across the various test-functions and noise levels, we find the FDR-estimator to enjoy robustness of MSE-efficiency.
Abramovich, F. (1993). The asymptotic mean squared error of L-smoothing splines
Statistics & Probability Letters 18, pp. 179-182.
Abstract: We establish the asymptotical equivalence between L-spline smoothing and kernel estimation. The equivalent kernel is used to derive the asymptotic mean squared error of the L-smoothing spline estimator. The paper extends the corresponding results for polynomial spline smoothing.
Abramovich, F. (1988). Some remarks on the robustness of LL-type fuzzy linear algebraic systems
BUSEFAL 36, pp. 98-105.
Abramovich, F., Wagenknecht, M. and Khurgin, Y.I. (1988). Solution of LR-type fuzzy systems of linear algebraic equations
BUSEFAL 35, pp. 86-99.
Abstract: Systems of linear algebraic equations with LR-type fuzzy coefficients are considered in this article. The notion of solution of LR-type fuzzy system is discussed. It is shown that in general case the exact solution may not exist, so it is offered to find as appoximate one (quasi-solution). It appears that such problem may be reduced to an ordinary (non-fussy) non-linear optimization problem. The numerical example of application of this method is provided.