Book: 
Abramovich, F. and Ritov, Y. (2013).
Statistical Theory: A Concise Introduction
Chapman & Hall/CRC Summary  Errata  Review  Link to publisher 
Summary: Designed for a onesemester advanced undergraduate or graduate course, Statistical Theory: A Concise Introduction clearly explains the underlying ideas and principles of major statistical concepts, including parameter estimation, confidence intervals, hypothesis testing, asymptotic analysis, Bayesian inference, and elements of decision theory. It introduces these topics on a clear intuitive level using illustrative examples in addition to the formal definitions, theorems, and proofs. Based on the authors' lecture notes, this studentoriented, selfcontained book maintains a proper balance between the clarity and rigor of exposition. In a few cases, the authors present a "sketched" version of a proof, explaining its main ideas rather than giving detailed technical mathematical and probabilistic arguments. Chapters and sections marked by asterisks contain more advanced topics and may be omitted. A special chapter on linear models shows how the main theoretical concepts can be applied to the wellknown and frequently used statistical tool of linear regression. Requiring no heavy calculus, simple questions throughout the text help students check their understanding of the material. Each chapter also includes a set of exercises that range in level of difficulty. 
Preprints: 
Hochman, A., Saaroni, H., Abramovich, F. and Alpert, P.
Can Wavelet Analysis detect low frequency periodicities in climatic time series ?

Abstract: Wavelet Analysis (WA) is a powerful tool frequently used to study periodicity in climate Time Series (TS). Periodicity plays a significant role in climate reconstruction and in the prediction of future climate, both regional, and global. In many studies the use of WA revealed a Dominant Low Frequency Periodicity (DLFP) in TS. These have been recognized as significant components in predicting and recommending for adaptation to climate change. These periodicities are analyzed here, using a threestep approach: A) Reviewing several climate papers which involved WA; B) Performing WA on true climatic TS; and C) performing WA on 40 random synthetic TS of varying lengths. It is shown, that while using Morlet 6, which is the commonly used "mother" wavelet in climatic research, padding the TS with zeros and using white noise as background for significance checking, every (100\%) application of WA, on either true climate TS or on synthetic random TS, demonstrated a DLFP among other periodicities. It is asserted that these claimed periodicities are a methodological artifact caused by misinterpretation of WA. In fact, wavelets are not originally designed to detect periodicities but to capture local abrupt changes of a signal. 
Abramovich, F. and Pensky, M.
Classification with many classes: challenges and pluses

Abstract:The objective of the paper is to study accuracy of multiclass classification
in highdimensional setting, where the number of classes is also large (``large $L$, large $p$, small $n$'' model).
While this problem arises in many practical applications and many techniques have been recently developed for its solution,
to the best of our knowledge nobody provided a rigorous theoretical analysis of this important setup. The purpose of the present paper is to
fill in this gap.
We consider one of the most common settings, classification of highdimensional normal vectors where, unlike standard assumptions, the number of classes could be large. We derive nonasymptotic conditions on effects of significant features, and the low and the upper bounds for distances between classes required for successful feature selection and classification with a given accuracy. Furthermore, we study an asymptotic setup where the number of classes is growing with the dimension of feature space and while the number of samples per class is possibly limited. We discover an interesting and, at first glance, somewhat counterintuitive phenomenon that a large number of classes may be a ``blessing'' rather than a ``curse'' since, in certain settings, the precision of classification can improve as the number of classes grows. This is due to more accurate feature selection since even weaker significant features, which are not sufficiently strong to be manifested in a coarse classification, can nevertheless have a strong impact when the number of classes is large. We supplement our theoretical investigation by a simulation study and a real data example where we again observe the above phenomenon. 
Published Research Papers: 
Abramovich, F. and Grinshtein, V.
Highdimensional classification by sparse logistic regression IEEE Transactions on Information Theory, to appear 
Abstract: We consider highdimensional binary classification by sparse logistic regression. We propose a model/feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the nonasymptotic bounds for the resulting misclassification excess risk. The bounds can be reduced under the additional lownoise condition. The proposed complexity penalty is remarkably related to the VCdimension of a set of sparse linear classifiers. Implementation of any complexity penaltybased criterion, however, requires a combinatorial search over all possible models. To find a model selection procedure computationally feasible for highdimensional data, we extend the Slope estimator for logistic regression and show that under an additional weighted restricted eigenvalue condition it is rateoptimal in the minimax sense. 
Abramovich, F., De Canditiis, D. and Pensky, M. (2018).
Solution of linear illposed problems by model selection and aggregation Electronic Journal of Statistics 12, pp. 18221841. 
Abstract: We consider a general statistical linear inverse problem, where the solution is represented via a known (possibly overcomplete) dictionary that allows its sparse representation. We propose two different approaches. A model selection estimator selects a single model by minimizing the penalized empirical risk over all possible models. By contrast with direct problems, the penalty depends on the model itself rather than on its size only as for complexity penalties. A Qaggregate estimator averages over the entire collection of estimators with properly chosen weights. Under mild conditions on the dictionary, we establish oracle inequalities both with high probability and in expectation for the two estimators. Moreover, for the latter estimator these inequalities are sharp. The proposed procedures are implemented numerically and their performance is assessed by a simulation study. 
Abramovich, F. and Grinshtein, V. (2016).
Model selection and minimax estimation in generalized linear models
IEEE Transactions on Information Theory 62, pp. 37213730. 
Abstract: We consider model selection in generalized linear models (GLM) for highdimensional data and propose a wide class of model selection criteria based on penalized maximum likelihood with a complexity penalty on the model size. We derive a general nonasymptotic upper bound for the expected KullbackLeibler divergence between the true distribution of the data and that generated by a selected model, and establish the corresponding minimax lower bounds for sparse GLM. For the properly chosen (nonlinear) penalty, the resulting penalized maximum likelihood estimator is shown to be asymptotically minimax and adaptive to the unknown sparsity. We discuss also possible extensions of the proposed approach to model selection in GLM under additional structural constraints and aggregation. 
Abramovich, F. and Lahav, T. (2015).
Sparse additive regression on a regular lattice
Journal of the Royal Statistical Society: Series B (Statistical Methodology) 77, pp. 443459. 
Abstract: We consider estimation in a sparse additive regression model with the design pointson a regular lattice. We establish the minimax convergence rates over Sobolev classes andpropose a Fourierbased rate optimal estimator which is adaptive to the unknown sparsity andsmoothness of the response function. The estimator is derived within a Bayesian formalismbut can be naturally viewed as a penalized maximum likelihood estimator with the complexitypenalties on the number of nonzero univariate additive components of the response and onthe numbers of the nonzero coefficients of their Fourer expansions. We compare it with severalexisting counterparts and perform a short simulation study to demonstrate its performance 
Abramovich, F. and Grinshtein, V. (2013).
Estimation of a sparse group of sparse vectors
Biometrika 100, pp. 355370. 
Abstract: We consider estimating a sparse group of sparse normal mean vectors, based on penalized likelihood estimation with complexity penalties on the number of nonzero mean vectors and the numbers of their significant components, which can be performed by a fast algorithm. The resulting estimators are developed within a Bayesian framework and can be viewed as maximum a posteriori estimators. We establish their adaptive minimaxity over a wide range of sparse and dense settings. A simulation study demonstrates the efficiency of the proposed approach, which successfully competes with the sparse group lasso estimator. 
Abramovich, F., Pensky, M. and Rozenholc, Y. (2013).
Laplace deconvolution with noisy observations
Electronic Journal of Statistics 7, pp. 10941128. 
Abstract: In the present paper we consider Laplace deconvolution problem for discrete noisy data observed on an interval whose length $T_n$ may increase with the sample size. Although this problem arises in a variety of applications, to the best of our knowledge, it has been given very little attention by the statistical community. Our objective is to fill the gap and provide statistical analysis of Laplace deconvolution problem with noisy discrete data. The main contribution of the paper is an explicit construction of an asymptotically rateoptimal (in the minimax sense) Laplace deconvolution estimator which is adaptive to the regularity of the unknown function. We show that the original Laplace deconvolution problem can be reduced to nonparametric estimation of a regression function and its derivatives on the interval of growing length $T_n$. Whereas the forms of the estimators remain standard, the choices of the parameters and the minimax convergence rates, which are expressed in terms of$ T^2_n/n$ in this case, are affected by the asymptotic growth of the length of the interval. We derive an adaptive kernel estimator of the function of interest, and establish its asymptotic minimaxity over a range of Sobolev classes. We illustrate the theory by examples of construction of explicit expressions of Laplace deconvolution estimators. A simulation study shows that, in addition to providing asymptotic optimality as the number of observations tends to infinity, the proposed estimator demonstrates good performance in finite sample examples. 
Abramovich, F. and Grinshtein, V. (2013).
Model selection in regression under structural constraints
Electronic Journal of Statistics 7, pp. 480498. 
Abstract: The paper considers model selection in regression under the additional structural constraints on admissible models where the number of potential predictors miht be even larger than the available sample size. We develop a Bayesian formalism which is used as a natural tool for generating a wide class of model selection criteria based on penalized least squares estimation with various complexity penalties associated with a prior on a model size. The resulting criteria are adaptive to structural constraints. We establish the upper bound for the quadratic risk of the resulting MAP estimator and the corresponding lower bound for the minimax risk over a set of admissible models of a given size. We then specify the class of priors (and, therefore, the class of complexity penalties) where for the “nearlyorthogonal” design the MAP estimator is asymptotically at least nearlyminimax (up to a logfactor) simultaneously over an entire range of sparse and dense setups. Moreover, when the numbers of admissible models are “small” (e.g., ordered variable selection) or, on the opposite, for the case of complete variable selection, the proposed estimator achieves the exact minimax rates. 
Abramovich, F. and Grinshtein, V. (2011).
Model selection in Gaussian regression for highdimensional data
In Inverse Problems and HighDimensional Estimation (Eds. Alquier, P., Gautier, E. and Stoltz, G.), Lecture Notes in Statistics 203, Springer Berlin Heidelberg, pp. 159170. 
Abstract: We consider model selection in Gaussian regression, where the number of predictors might be even larger than the number of observations. The proposed procedure is based on penalized least square criteria with a complexity penalty on a model size.We discuss asymptotic properties of the resulting estimators corresponding to linear and socalled $2k \ln(p/k)$type nonlinear penalties for nearlyorthogonal and multicollinear designs. We show that any linear penalty cannot be simultaneously adapted to both sparse and dense setups, while $2k \ln(p/k)$type penalties achieve the wide adaptivity range.We also present Bayesian perspective on the procedure that provides an additional insight and can be used as a tool for obtaining a wide class of penalized estimators associated with various complexity penalties. 
Abramovich, F. and Grinshtein, V. (2010).
MAP model selection in Gaussian regression
Electronic Journal of Statistics 4, pp. 932949. 
Abstract: We consider a Bayesian approach to model selection in Gaussian linear regression, where the number of predictors might be much larger than the number of observations. From a frequentist view, the proposed procedure results in the penalized least squares estimation with a complexity penalty associated with a prior on the model size. We investigate the optimality properties of the resulting model selector. We establish the oracle inequality and specify conditions on the prior that imply its asymptotic minimaxity within a wide range of sparse and dense settings for “nearlyorthogonal” and “multicollinear” designs. 
Abramovich, F., Grinshtein, V., Petsa, A. and Sapatinas, T. (2010).
On Bayesian testimation and its application to wavelet thresholding
Biometrika 97, pp. 181198. 
Abstract: We consider the problem of estimating the unknown response function in the Gaussian white noise model. We first utilize the recently developed Bayesian maximum a posteriori testimation procedure of Abramovich et al. (2007) for recovering an unknown highdimensional Gaussian mean vector. The existing results for its upper error bounds over various sparse $l_p$balls are extended to more general cases. We show that, for a properly chosen prior on the number of nonzero entries of the mean vector, the corresponding adaptive estimator is asymptotically minimax in a wide range of sparse and dense lpballs. The proposed procedure is then applied in a wavelet context to derive adaptive global and levelwise wavelet estimators of the unknown response function in the Gaussian white noise model. These estimators are then proven to be, respectively, asymptotically nearminimax and minimax in a wide range of Besov balls. These results are also extended to the estimation of derivatives of the response function. Simulated examples are conducted to illustrate the performance of the proposed levelwise wavelet estimator in finite sample situations, and to compare it with several existing counterparts. 
Abramovich, F., De Feis, I. and Sapatinas, T. (2009).
Optimal testing for additivity in multiple nonparametric regression
Annals of the Institute of Statistical Mathematics 61, pp. 691714. 
Abstract: We consider the problem of testing for additivity in the standard multiple nonparametric regression model. We derive optimal (in the minimax sense) nonadaptive and adaptive hypothesis testing procedures for additivity against the composite nonparametric alternative that the response function involves interactions of second or higher orders separated away from zero in $L^2([0, 1]^d)$norm and also possesses some smoothness properties. In order to shed some light on the theoretical results obtained, we carry out a wide simulation study to examine the finite sample performance of the proposed hypothesis testing procedures and compare them with a series of other tests for additivity available in the literature. 
Abramovich, F., Antoniadis, A. and Pensky, M. (2007).
Estimation of piecewisesmooth functions by amalgamated bridge regression splines
Sankhyā: The Indian Journal of Statistics 69, pp. 127. 
Abstract: We onsider nonparametric estimation of a onedimensional piecewisesmooth function observed with white Gaussian noise on an interval. We propose a twostep estimation procedure, where one first detects jump points by a waveletbased procedure and then estimates the function on each smooth segment separately by bridge regression splines. We prove the asymptotc optimality (in the minimax sense) of the resulting amalgamated bridge regression spline estimator and demonstrate its efficiency on simulated and real data examples. 
Abramovich, F., Grinshtein, V. and Pensky, M. (2007).
On optimality of Bayesian testimation in the normal means problem
The Annals of Statistics 35, pp. 22612286. 
Abstract: We consider a problem of recovering a highdimensional vector $\mu$ observed in white noise, where the unknown vector $\mu$ is assumed to be sparse. The objective of the paper is to develop a Bayesian formalism which gives rise to a family of $l_0$type penalties. The penalties are associated with various choices of the prior distributions $\pi_n(\cdot)$ on the number of nonzero entries of $\mu$ and, hence, are easy to interpret. The resulting Bayesian estimators lead to a general thresholding rule which accommodates many of the known thresholding and model selection procedures as particular cases corresponding to specific choices of $\pi_n(\cdot)$. Furthermore, they achieve optimality in a rather general setting under very mild conditions on the prior. We also specify the class of priors $\pi_n(\cdot)$ for which the resulting estimator is adaptively optimal (in the minimax sense) for a wide range of sparse sequences and consider several examples of such priors. 
Abramovich, F., Angelini, C. and De Canditiis, D. (2006).
Pointwise optimality of Bayesian wavelet estimators
Annals of the Institute of Statistical Mathematics 59, pp. 425434. 
Abstract: We consider pointwise mean squared errors of several known Bayesian wavelet estimators, namely, posterior mean, posterior median and Bayes Factor, where the prior imposed on wavelet coefficients is a mixture of an atom of probability zero and a Gaussian density. We show that for the properly chosen hyperparameters of the prior, all the three estimators are (up to a logfactor) asymptotically minimax within any prescribed Besov ball $B^s_{p,q}(M)$. We discuss the Bayesian paradox and compare the results for the pointwise squared risk with those for the global mean squared error. 
Abramovich, F. and Angelini, C. (2006).
Bayesian maximum a posteriori multiple testing procedure
Sankhyā: The Indian Journal of Statistics 68, pp. 436460. 
Abstract: We consider a Bayesian approach to multiple hypothesis testing. A hierarchical prior model is based on imposing a prior distribution $\pi(k)$ on the number of hypotheses arising from alternatives (false nulls). We then apply the maximum a posteriori (MAP) rule to find the most likely configuration of null and alternative hypotheses. The resulting MAP procedure and its closely related stepup and stepdown versions compare ordered Bayes factors of individual hypotheses with a sequence of critical values depending on the prior. We discuss the relations between the proposed MAP procedure and the existing frequentist and Bayesian counterparts. A more detailed analysis is given for the normal data, where we show, in particular, that by choosing a specific $\pi(k)$, the MAP procedure can mimic several known familywise error (FWE) and false discovery rate (FDR) controlling procedures. The performance of MAP procedures is illustrated on a simulated example. 
Abramovich, F. and Angelini, C. (2006).
Testing in mixedeffects FANOVA models
Journal of Statistical Planning and Inference 136, pp. 43264348. 
Abstract: We consider the testing problem in the mixedeffects functional analysis of variance models. We develop asymptotically optimal (minimax) testing procedures for testing the significance of functional global trend and the functional fixed effects based on the empirical wavelet coefficients of the data. Wavelet decompositions allow one to characterize various types of assumed smoothness conditions on the response function under the nonparametric alternatives. The distribution of the functional randomeffects component is defined in the wavelet domain and captures the sparseness of wavelet representation for a wide variety of functions. The simulation study presented in the paper demonstrates the finite sample properties of the proposed testing procedures. We also applied them to the real data from the physiological experiments. 
Abramovich, F., Benjamini, Y., Donoho, D. and Johnstone, I. (2006).
Adapting to unknown sparsity by controlling the false discovery rate
The Annals of Statistics 34, pp. 584653. 
Abstract: We attempt to recover an $n$dimensional vector observed in white noise, where n is large and the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing powerlaw decay bounds on the ordered entries; and controlling the $l_p$ norm for $p$ small. We obtain a procedure which is asymptotically minimax for $l_r$ loss, simultaneously throughout a range of such sparsity classes. The optimal procedure is a dataadaptive thresholding scheme, driven by control of the false discovery rate (FDR). FDR control is a relatively recent innovation in simultaneous testing, ensuring that at most a certain expected fraction of the rejected null hypotheses will correspond to false rejections. In our treatment, the FDR control parameter $q_n$ also plays a determining role in asymptotic minimaxity. If $q=\lim q_n \in [0,1/2]$ and also $q_n>\gamma/\log(n), we get sharp asymptotic minimaxity, simultaneously, over a wide range of sparse parameter spaces and loss functions. On the other hand, $q=\lim q_n \in (1/2,1]$ forces the risk to exceed the minimax risk by a factor growing with $q$. To our knowledge, this relation between ideas in simultaneous inference and asymptotic decision theory is new. Our work provides a new perspective on a class of model selection rules which has been introduced recently by several authors. These new rules impose complexity penalization of the form 2log(potential model size/actual model sizes). We exhibit a close connection with FDRcontrolling procedures under stringent control of the false discovery rate. 
Abramovich, F. and Benjamini, Y. (2005).
False Discovery Rate
In Encyclopedia of Statistical Sciences (Eds. Kotz, S., Read, C. B., Balakrishnan, N., Vidakovic, B. and Johnson, N. L.), 4, John Wiley & Sons, Inc., pp. 22402243. 
Abstract: For the multiple hypotheses testing problem consider the proportion of falsely rejected hypotheses (false discoveries) among the total number of rejections. The expected value of this proportion, called the False Discovery Rate (FDR), is a useful criterion to control as an alternative to the traditional familywise error rate (FWE) that suffers from low power properties when the number of tested hypotheses is large. In a way, controlling FDR is adaptively inbetween ignoring multiplicity altogether and a conservative control of FWE. Several FDR controlling procedures are presented and others are reviewed. Various extensions and applications of the FDR are discussed. 
Abramovich, F. and Heller, R. (2005).
Local functional hypothesis testing
Mathematical Methods of Statistics 14, pp. 253. 
Abstract: We consider a standard “signal+white noise” model on the unit interval and want to test whether the signal is present on a subinterval $\Omega\Delta \subseteq [0,1]$ of length $\Delta$. The composite alternative is that the unknown signal f is separated away from zero in terms of its average power $\gamma(f)=\left \ f \right \^2_\Delta/\Delta$ on $\Omega\Delta$ and also possesses some regularity properties. We evaluate the asymptotically optimal (minimax) rates for testing the presence of a signal on $\Omega\Delta$, where both the noise level and the interval length tend to zero. We derive corresponding rateoptimal tests for local signal detection. 
Abramovich, F., Antoniadis, A., Sapatinas, T. and Vidakovic, B. (2004).
Optimal testing in a fixedeffects functional analysis of variance model
International Journal of Wavelets, Multiresolution and Information Processing 2, pp. 323349. 
Abstract: We consider the testing problem in a fixedeffects functional analysis of variance model. We test the null hypotheses that the functional main effects and the functional interactions are zeros against the composite nonparametric alternative hypotheses that they are separated away from zero in L2norm and also possess some smoothness properties. We adapt the optimal (minimax) hypothesis testing procedures for testing a zero signal in a Gaussian "signal plus noise" model to derive optimal (minimax) nonadaptive and adaptive hypothesis testing procedures for the functional main effects and the functional interactions. The corresponding tests are based on the empirical wavelet coefficients of the data. Wavelet decompositions allow one to characterize different types of smoothness conditions assumed on the response function by means of its wavelet coefficients for a wide range of function classes. In order to shed some light on the theoretical results obtained, we carry out a simulation study to examine the finite sample performance of the proposed functional hypothesis testing procedures. As an illustration, we also apply these tests to a reallife data example arising from physiology. Concluding remarks and hints for possible extensions of the proposed methodology are also given. 
Abramovich, F., Amato, U. and Angelini, C. (2004).
On Optimality of Bayesian Wavelet Estimators
Scandinavian Journal of Statistics 31, pp. 217234. 
Abstract: We investigate the asymptotic optimality of several Bayesian wavelet estimators, namely, posterior mean, posterior median and Bayes Factor, where the prior imposed on wavelet coefficients is a mixture of a mass function at zero and a Gaussian density. We show that in terms of the mean squared error, for the properly chosen hyperparameters of the prior, all the three resulting Bayesian wavelet estimators achieve optimal minimax rates within any prescribed Besov space $B^s_{p,q}$ for $p \ge 2$. For $1 \le p \le 2$, the Bayes Factor is still optimal for $(2s+2)/(2s+1) \le p \le 2$ and always outperforms the posterior mean and the posterior median that can achieve only the best possible rates for linear estimators in this case. 
Abramovich, F. (2004).
Discussion on the meeting on 'Statistical approaches to inverse problems'
Journal of the Royal Statistical Society: Series B (Statistical Methodology) 66, pp. 627652. 
Abramovich, F., Besbeas, P. and Sapatinas, T. (2002).
Empirical Bayes approach to block wavelet function estimation
Computational Statistics & Data Analysis 39, pp. 435451. 
Abstract: Wavelet methods have demonstrated considerable success in function estimation through termbyterm thresholding of the empirical wavelet coefficients. However, it has been shown that grouping the empirical wavelet coefficients into blocks and making simultaneous threshold decisions about all the coefficients in each block has a number of advantages over termbyterm wavelet thresholding, including asymptotic optimality and better mean squared error performance in finite sample situations. An empirical Bayes approach to incorporating information on neighbouring empirical wavelet coefficients into function estimation that results in block wavelet shrinkage and block wavelet thresholding estimators is considered. Simulated examples are used to illustrate the performance of the resulting estimators, and to compare these estimators with several existing nonBayesian block wavelet thresholding estimators. It is observed that the proposed empirical Bayes block wavelet shrinkage and block wavelet thresholding estimators outperform the nonBayesian block wavelet thresholding estimators in finite sample situations. An application to a data set that was collected in an anaesthesiological study is also presented. 
Abramovich, F., Sapatinas, T. and Silverman, B.W. (2000).
Stochastic expansions in an overcomplete wavelet dictionary
Probability Theory and Related Fields 117, pp. 133144. 
Abstract: We consider random functions defined in terms of members of an overcomplete wavelet dictionary. The function is modelled as a sum of wavelet components at arbitrary positions and scales where the locations of the wavelet components and the magnitudes of their coefficients are chosen with respect to a marked Poisson process model. The relationships between the parameters of the model and the parameters of those Besov spaces within which realizations will fall are investigated. The models allow functions with specified regularity properties to be generated. They can potentially be used as priors in a Bayesian approach to curve estimation, extending current standard wavelet methods to be free from the dyadic positions and scales of the basis functions. 
Abramovich, F., Bailey, T. C. and Sapatinas, T. (2000).
Wavelet analysis and its statistical applications
Journal of the Royal Statistical Society: Series D (The Statistician) 49, pp. 129. 
Abstract: In recent years there has been a considerable development in the use of wavelet methods in statistics. As a result, we are now at the stage where it is reasonable to consider such methods to be another standard tool of the applied statistician rather than a research novelty. With that in mind, this paper gives a relatively accessible introduction to standard wavelet analysis and provides a review of some common uses of wavelet methods in statistical applications. It is primarily orientated towards the general statistical audience who may be involved in analysing data where the use of wavelets might be effective, rather than to researchers who are already familiar with the field. Given that objective, we do not emphasize mathematical generality or rigour in our exposition of wavelets and we restrict our discussion to the more frequently employed wavelet methods in statistics. We provide extensive references where the ideas and concepts discussed can be followed up in greater detail and generality if required. The paper first establishes some necessary basic mathematical background and terminology relating to wavelets. It then reviews the more wellestablished applications of wavelets in statistics including their use in nonparametric regression, density estimation, inverse problems, changepoint problems and in some specialized aspects of time series analysis. Possible extensions to the uses of wavelets in statistics are then considered. The paper concludes with a brief reference to readily available software packages for wavelet analysis. 
Abramovich, F. and Grinshtein, V. (1999).
Derivation of equivalent kernel for general spline smoothing: a systematic
approach
Bernoulli 5, pp. 359379. 
Abstract: We consider first the spline smoothing nonparametric estimation with variable smoothing parameter and arbitrary design density function and show that the corresponding equivalent kernel can be approximated by the Green function of a certain linear differential operator. Furthermore, we propose to use the standard (in applied mathematics and engineering) method for asymptotic solution of linear differential equations, known as the WentzelKramersBrillouin method, for systematic derivation of an asymptotically equivalent kernel in this general case. The corresponding results for polynomial splines are a special case of the general solution. Then, we show how these ideas can be directly extended to the very general Lspline smoothing. 
Abramovich, F. and Sapatinas, T. (1999).
Bayesian Approach to Wavelet Decomposition and Shrinkage
In Bayesian inference in waveletbased models (Eds. Muller, P. and Vidakovic, B.), Lecture Notes in Statistics 141, Springer New York, pp. 3350. 
Abstract: We consider Bayesian approach to wavelet decomposition. We show how prior knowledge about a function's regularity can be incorporated into a prior model for its wavelet coefficients by establishing a relationship between the hyperparameters of the proposed model and the parameters of those Besov spaces within which realizations from the prior will fall. Such a relation may be seen as giving insight into the meaning of the Besov space parameters themselves. Furthermore, we consider Bayesian waveletbased function estimation that gives rise to different types of wavelet shrinkage in nonparametric regression. Finally, we discuss an extension of the proposed Bayesian model by considering random functions generated by an overcomplete wavelet dictionary. 
Abramovich, F., Sapatinas, T. and Silverman, B.W. (1998).
Wavelet thresholding via a Bayesian approach
Journal of the Royal Statistical Society: Series B (Statistical Methodology) 60, pp. 725749. 
Abstract: We discuss a Bayesian formalism which gives rise to a type of wavelet threshold estimation in nonparametric regression. A prior distribution is imposed on the wavelet coefficients of the unknown response function, designed to capture the sparseness of wavelet expansion that is common to most applications. For the prior specified, the posterior median yields a thresholding procedure. Our prior model for the underlying function can be adjusted to give functions falling in any specific Besov space. We establish a relationship between the hyperparameters of the prior model and the parameters of those Besov spaces within which realizations from the prior will fall. Such a relationship gives insight into the meaning of the Besov space parameters. Moreover, the relationship established makes it possible in principle to incorporate prior knowledge about the function's regularity properties into the prior model for its wavelet coefficients. However, prior knowledge about a function's regularity properties might be difficult to elicit; with this in mind, we propose a standard choice of prior hyperparameters that works well in our examples. Several simulated examples are used to illustrate our method, and comparisons are made with other thresholding methods. We also present an application to a data set that was collected in an anaesthesiological study. 
Abramovich, F. and Silverman, B.W. (1998).
Wavelet decomposition approaches to statistical inverse problems
Biometrika 85, pp. 115129. 
Abstract: A wide variety of scientific settings involve indirect noisy measurements where one faces a linear inverse problem in the presence of noise. Primary interest is in some function $f(t)$ but data are accessible only about some linear transform corrupted by noise. The usual linear methods for such inverse problems do not perform satisfactorily when $f(t)$ is spatially inhomogeneous. One existing nonlinear alternative is the waveletvaguelette decomposition method, based on the expansion of the unknown $f(t)$ in wavelet series. In the vaguelettewavelet decomposition method proposed here, the observed data are expanded directly in wavelet series. The performances of various methods are compared through exact risk calculations, in the context of the estimation of the derivative of a function observed subject to noise. A result is proved demonstrating that, with a suitable universal threshold somewhat larger than that used for standard denoising problems, both the waveletbased approaches have an ideal spatial adaptivity property. 
Abramovich, F. and Bayvel, P. (1997).
Some statistical remarks on the derivation of BER in amplified optical communication
systems
IEEE Transactions on Communications 45, pp. 10321034. 
Abstract: We consider the signal detection problem in amplified optical transmission systems as a statistical hypothesis testing procedure, and we show that the detected signal has a wellknown chisquared distribution. In particular, this approach considerably simplifies the derivation of biterror rate (BER). Finally, we discuss the accuracy of the Gaussian approximations to the exact distributions of the signal. 
Abramovich, F. and Steinberg, D. (1996).
Improved inference in nonparametric regression using L_ksmoothing splines
Journal of Statistical Planning and Inference 49, pp. 327341. 
Abstract: Smoothing splines are one of the most popular approaches to nonparametric regression. Wahba (J. Roy. Statist. Soc. Ser. B 40 (1978) 364–372; 45 (1983) 133–150) showed that smoothing splines are also Bayes estimates and used the corresponding prior model to derive interval estimates for the regression function. Although the interval estimates work well on a global basis, they can have poor local properties. The source of this problem is the use of a global smoothing parameter. We introduce the notion of $L_k$smoothing splines. These splines allow for a variable smoothing parameter and can substantially improve local inference. 
Abramovich, F. and Benjamini, Y. (1996).
Adaptive thresholding of wavelet coefficients
Computational Statistics & Data Analysis 22, pp. 351361. 
Abstract: Wavelet techniques have become an attractive and efficient tool in function estimation. Given noisy data, its discrete wavelet transform is an estimator of the wavelet coefficients. It has been shown by Donoho and Johnstone (Biometrika 81 (1994) 425–455) that thresholding the estimated coefficients and then reconstructing an estimated function reduces the expected risk close to the possible minimum. They offered a global threshold $\lambda \sim \sigma \sqrt{2\log{n}}$ for $j > j_0$, while the coefficients of the first coarse j0 levels are always included. We demonstrate that the choice of $j_0$ may strongly affect the corresponding estimators. Then, we use the connection between thresholding and hypotheses testing to construct a thresholding procedure based on the false discovery rate (FDR) approach to multiple testing of Benjamini and Hochberg (J. Roy. Statist. Soc. Ser. B 57 (1995) 289–300). The suggested procedure controls the expected proportion of incorrectly included coefficients among those chosen for the wavelet reconstruction. The resulting procedure is inherently adaptive, and responds to the complexity of the estimated function and to the noise level. Finally, comparing the proposed FDR based procedure with the fixed global threshold by evaluating the relative meansquareerror across the various testfunctions and noise levels, we find the FDRestimator to enjoy robustness of MSEefficiency. 
Abramovich, F. and Benjamini, Y. (1995).
Thresholding of wavelet coefficients as multiple hypotheses testing procedure
In Wavelets and Statistics (Eds. Antoniadis, A. and Oppenheim, G.), Lecture Notes in Statistics 103, SpringerVerlag, pp. 514. 
Abstract: Given noisy signal, its finite discrete wavelet transform is an estimator of signal's wavelet expansion coefficients. An appropriate thresholding of coefficients for further reconstruction of denoised signal plays a keyrole in the wavelet decomposition/reconstruction procedure. [DJ1] proposed a global threshold $\lambda = \sigma \sqrt{2\log{n}}$ and showed that such a threshold asymptotically reduces the expected risk of the corresponding wavelet estimator close to the possible minimum. To apply their threshold for finite samples they suggested to always keep coefficients of the first coarse $j_0$ levels. We demonstrate that the choice of $j_0$ may strongly affect the corresponding estimators. Then, we consider the thresholding of wavelet coefficients as a multiple hypotheses testing problem and use the False Discovery Rate (FDR) approach to multiple testing of [BH1]. The suggested procedure controls the expected proportion of incorrectly kept coefficients among those chosen for the wavelet reconstruction. The resulting procedure is inherently adaptive, and responds to the complexity of the estimated function. Finally, comparing the proposed FDRthreshold with that fixed global of Donoho and Johnstone by evaluating the relative MeanSquareError across the various testfunctions and noise levels, we find the FDRestimator to enjoy robustness of MSEefficiency. 
Abramovich, F. (1993).
The asymptotic mean squared error of Lsmoothing splines
Statistics & Probability Letters 18, pp. 179182. 
Abstract: We establish the asymptotical equivalence between Lspline smoothing and kernel estimation. The equivalent kernel is used to derive the asymptotic mean squared error of the Lsmoothing spline estimator. The paper extends the corresponding results for polynomial spline smoothing. 
Abramovich, F. (1988).
Some remarks on the robustness of LLtype fuzzy linear algebraic systems
BUSEFAL 36, pp. 98105. 
Abramovich, F., Wagenknecht, M. and Khurgin, Y.I. (1988).
Solution of LRtype fuzzy systems of linear algebraic equations
BUSEFAL 35, pp. 8699. 
Abstract: Systems of linear algebraic equations with LRtype fuzzy coefficients are considered in this article. The notion of solution of LRtype fuzzy system is discussed. It is shown that in general case the exact solution may not exist, so it is offered to find as appoximate one (quasisolution). It appears that such problem may be reduced to an ordinary (nonfussy) nonlinear optimization problem. The numerical example of application of this method is provided. 