A mixed data frame (MDF) is a table collecting categorical, numerical, and count observations. The use of MDF is widespread in statistics and the applications are numerous from abundance data in ecology to recommender systems. In many cases, an MDF exhibits simultaneously main effects, such as row, column, or group effects and interactions, for which a low-rank model has often been suggested. Although the literature on low-rank approximations is very substantial, with few exceptions, existing methods do not allow to incorporate main effects and interactions while providing statistical guarantees. The present work fills this gap. We propose an estimation method which allows to recover simultaneously the main effects and the interactions. We show that our method is near optimal under conditions which are met in our targeted applications. We also propose an optimization algorithm which provably converges to an optimal solution. Numerical experiments reveal that our method, mimi, performs well when the main effects are sparse and the interaction matrix has low-rank. We also show that mimi compares favorably to existing methods, in particular when the main effects are significantly large compared to the interactions, and when the proportion of missing entries is large. The method is available as an R package on the Comprehensive R Archive Network. Supplementary materials for this article are available online.
The paper describes a recent study aimed at investigating the most efficient data imputation algorithm for several methods of data analysis such as regression modeling, factor analysis, descriptive statistics, and correlation analysis. The lack of recommendations when choosing the data imputation algorithm poses the problem of choice ambiguity in each situation.
The authors consider that the data imputation algorithm should be selected according to the method employed after data improvement. In other words, it is believed that for each data analysis method the efficiency of the same data imputation algorithm is different. The statistical experiment was used to evaluate the efficiency of several data imputation algorithms for each method of data analysis.
The core idea of statistical experiment was to compare the results of each method application used in the etalon data set (without missing values) with the results obtained on a large number of artificial subsamples generated from the original data set where missing values were filed with comparable data imputation algorithms.
Generation of subsamples was carried out via the bootstrap procedure, which allowed to undertake
statistical evaluation and to build confidence intervals for each parameter before and after the data imputation.
Through this experiment the authors managed to evaluate the efficiency of such data imputation algorithms as imputation with the average trend measures, the EM algorithm, the imputation via regression model and Hot Deck algorithm for the mentioned methods of data analysis.
We consider certain spaces of functions on the circle, which naturally appear in harmonic analysis, and superposition operators on these spaces. We study the following question: which functions have the property that each their superposition with a homeomorphism of the circle belongs to a given space? We also study the multidimensional case.
We consider the spaces of functions on the m-dimensional torus, whose Fourier transform is p -summable. We obtain estimates for the norms of the exponential functions deformed by a C1 -smooth phase. The results generalize to the multidimensional case the one-dimensional results obtained by the author earlier in “Quantitative estimates in the Beurling—Helson theorem”, Sbornik: Mathematics, 201:12 (2010), 1811 – 1836.
We consider the spaces of function on the circle whose Fourier transform is p-summable. We obtain estimates for the norms of exponential functions deformed by a C1 -smooth phase.