?
Исследование устойчивости к аномальным наблюдениям модификаций метода главных компонент
The paper considers the problem of reducing multidimensional
correlated indicators. One of the approaches
to solving this problem is based on the method of principal
components, which makes it possible to compactly
describe the vector with correlated coordinates (components)
using the principal components vector with
uncorrelated coordinates of much smaller dimension,
while retaining most of the information about correlation
structure of the original vector. On simulated and
real data, several modifications of the principal components
method were compared differing in the method
of evaluating correlation matrix of the observation
vector. The work objective is to demonstrate advantages
of the robust modifications of the principal components
method in cases, where data contained the abnormal values. To compare the considered modifications on the
model data, metric was introduced that measured the
difference between estimated and true eigenvalues
of the initial data correlation matrix. This metric behavior
depending on the probability distribution of observations
was studied by computer simulation. As the
distributions, multivariate distributions with the offdiagonal
correlation matrices simulating a polluted
sample were selected. Next, a sample of 13 correlated
socioeconomic indicators for 85 countries was considered,
where 46 abnormal values were identified. The
considered modifications of the principal components
method chose the same optimal number of principal
components equal to three. However, the real data
compression quality, which was defined as the share
of the initial indicators total variance described by the
first three principal components, turned out to be significantly
higher for the robust modifications of the
principal components method. Results obtained
on these real data are in good agreement with conclusions
of the computer simulation