• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
Of all publications in the section: 9
Sort:
by name
by year
Article
Le Gouic T., Paris Q. Electronic journal of statistics. 2018. Vol. 12. No. 2. P. 4239-4263.

In this paper, we define and study a new notion of stability for the k-means clustering scheme building upon the field of quantization of a probability measure. We connect this definition of stability to a geometric feature of the underlying distribution of the data, named absolute margin condition, inspired by recent works on the subject.

Added: Nov 9, 2018
Article
Silin I., Spokoiny V. Electronic journal of statistics. 2018. Vol. 12. No. 1. P. 1948-1987.

Let X_1,…,X_n be an i.i.d. sample in R^p with zero mean and the covariance matrix \Sigma^{*}. The classical PCA approach recovers the projector \P_J^{*} onto the principal eigenspace of \Sigma^{*} by its empirical counterpart \hat \P_J. Recent paper [24] investigated the asymptotic distribution of the Frobenius distance between the projectors \|\hat \P_J - \P_J^{*}\|_2, while [27] offered a bootstrap procedure to measure uncertainty in recovering this subspace \P_J^{*} even in a finite sample setup. The present paper considers this problem from a Bayesian perspective and suggests to use the credible sets of the pseudo-posterior distribution on the space of covariance matrices induced by the conjugated Inverse Wishart prior as sharp confidence sets. This yields a numerically efficient procedure. Moreover, we theoretically justify this method and derive finite sample bounds on the corresponding coverage probability. Contrary to [24, 27], the obtained results are valid for non-Gaussian data: the main assumption that we impose is the concentration of the sample covariance \hat \Sigma in a vicinity of \Sigma^{*}. Numerical simulations illustrate good performance of the proposed procedure even on non-Gaussian data in a rather challenging regime.

Added: Jul 23, 2018
Article
Belomestny D., Panov V. Electronic journal of statistics. 2013. Vol. 7. P. 2970-3003.

In this paper we consider a class of time-changed L\'evy processes that can be represented in the form \(Y_{s}=X_{T(s)}\), where \(X\) is a L\'evy  process and \(T\) is a non-negative and non-decreasing stochastic process independent of \(X\). The aim of this work is to infer on the Blumenthal-Getoor index of the process \(X\) from  low-frequency observations of the time-changed L\'evy process \(Y\). We propose a consistent estimator for this index, derive the minimax rates of convergence and show that these rates can not be improved in general. The performance of the estimator is illustrated by numerical examples.

Added: Sep 23, 2013
Article
Lee E., Mammen E. Electronic journal of statistics. 2016. Vol. 10. No. 1. P. 855-894.

Varying coefficient models are useful generalizations of parametric linear models. They allow for parameters that depend on a covariate or that develop in time. They have a wide range of applications in time series analysis and regression. In time series analysis they have turned out to be a powerful approach to infer on behavioral and structural changes over time. In this paper, we are concerned with high dimensional varying coefficient models including the time varying coefficient model. Most studies in high dimensional nonparametric models treat penalization of series estimators. On the other side, kernel smoothing is a well established, well understood and successful approach in nonparametric estimation, in particular in the time varying coefficient model. But not much has been done for kernel smoothing in high-dimensional models. In this paper we will close this gap and we develop a penalized kernel smoothing approach for sparse high-dimensional models. The proposed estimators make use of a novel penalization scheme working with kernel smoothing. We establish a general and systematic theoretical analysis in high dimensions. This complements recent alternative approaches that are based on basis approximations and that allow more direct arguments to carry over insights from high-dimensional linear models. Furthermore, we develop theory not only for regression with independent observations but also for local stationary time series in high-dimensional sparse varying coefficient models. The development of theory for local stationary processes in a high-dimensional setting creates technical challenges. We also address issues of numerical implementation and of data adaptive selection of tuning parameters for penalization.The finite sample performance of the proposed methods is studied by simulations and it is illustrated by an empirical analysis of NASDAQ composite index data.

Added: Jun 3, 2016
Article
Mammen E., Lee E. Electronic journal of statistics. 2016. Vol. 10. No. 1. P. 855-894.

Varying coefficient models are useful generalizations of parametric linear models. They allow for parameters that depend on a covariate or that develop in time. They have a wide range of applications in time series analysis and regression. In time series analysis they have turned out to be a powerful approach to infer on behavioral and structural changes over time. In this paper, we are concerned with high dimensional varying coefficient models including the time varying coefficient model. Most studies in high dimensional nonparametric models treat penalization of series estimators. On the other side, kernel smoothing is a well established, well understood and successful approach in nonparametric estimation, in particular in the time varying coefficient model. But not much has been done for kernel smoothing in high-dimensional models. In this paper we will close this gap and we develop a penalized kernel smoothing approach for sparse high-dimensional models. The proposed estimators make use of a novel penalization scheme working with kernel smoothing. We establish a general and systematic theoretical analysis in high dimensions. This complements recent alternative approaches that are based on basis approximations and that allow more direct arguments to carry over insights from high-dimensional linear models. Furthermore, we develop theory not only for regression with independent observations but also for local stationary time series in high-dimensional sparse varying coefficient models. The development of theory for local stationary processes in a high-dimensional setting creates technical challenges. We also address issues of numerical implementation and of data adaptive selection of tuning parameters for penalization.The finite sample performance of the proposed methods is studied by simulations and it is illustrated by an empirical analysis of NASDAQ composite index data.

Added: Oct 12, 2016
Article
Moulines E., Brosse N., Durmus A. Electronic journal of statistics. 2018. Vol. 12. No. 1. P. 851-889.

We derive explicit bounds for the computation of normalizing constants Z for log-concave densities \pi=e^{−U}/Z w.r.t. the Lebesgue measure on Rd. Our approach relies on a Gaussian annealing combined with recent and precise bounds on the Unadjusted Langevin Algorithm [15]. Polynomial bounds in the dimension d are obtained with an exponent that depends on the assumptions made on U. The algorithm also provides a theoretically grounded choice of the annealing sequence of variances. A numerical experiment supports our findings. Results of independent interest on the mean squared error of the empirical average of locally Lipschitz functions are established.

Added: Dec 12, 2018
Article
Bellec P., Dalalyan A., Grappin E. et al. Electronic journal of statistics. 2018. Vol. 12. No. 2. P. 3443-3472.

In this paper we revisit the risk bounds of the lasso estimator in the context of transductive and semi-supervised learning. In other terms, the setting under consideration is that of regression with random design under partial labeling. The main goal is to obtain user-friendly bounds on the off-sample prediction risk. To this end, the simple setting of bounded response variable and bounded (high-dimensional) covariates is considered. We propose some new adaptations of the lasso to these settings and establish oracle inequalities both in expectation and in deviation. These results provide non-asymptotic upper bounds on the risk that highlight the interplay between the bias due to the mis-specification of the linear model, the bias due to the approximate sparsity and the variance. They also demonstrate that the presence of a large number of unlabeled features may have significant positive impact in the situations where the restricted eigenvalue of the design matrix vanishes or is very small.

Added: Nov 9, 2018
Article
Krymova E. A., Chernousova E., Golubev Y. Electronic journal of statistics. 2013. Vol. 7. P. 2395-2419.
Added: Aug 22, 2016
Article
Belomestny D., Panov V. Electronic journal of statistics. 2015. Vol. 9. No. 2. P. 1974-2006.

In this paper, we consider the problem of statistical inference for generalized Ornstein-Uhlenbeck processes of the type

\[

X_{t} = e^{-\xi_{t}} \left( X_{0} + \int_{0}^{t} e^{\xi_{u-}} d u \right),

\]

where \(\xi_s\) is a L{\'e}vy process. Our primal goal is to estimate the characteristics of the L\'evy process \(\xi\) from the low-frequency observations of the process \(X\). We present a novel approach towards estimating the L{\'e}vy triplet of \(\xi,\) which is based on the Mellin transform technique. It is shown that the resulting estimates attain optimal minimax convergence rates. The suggested algorithms are illustrated by numerical simulations.

Added: Sep 1, 2015