Текстуальный анализ ценообразования на рынке московской жилой недвижимости
In this paper, we apply textual analysis to the hedonic pricing model in the residential real estate market of Moscow. We collect data on 60 thousand sale ads in July 2019 on the CIAN web- site (one of the largest web-sites on residential real estate market in Russia). A special parser program was written in Python to gather the data. The text analyzing algorithm developed by authors chooses words (unigrams) and phrases (bigrams) that are the most significant predic- tors of price. The advantage of this approach is that the selection of explanatory variables for the econometric model is based on the revealed preferences of market participants – the algo- rithm determines tokens indicated by apartment owners interested in a successful sale. Thus, we identify important subjective pricing factors in the Moscow real estate market. It is shown that the use of text analysis can significantly improve the predictable power of the pricing model. In particular, inclusion of unigrams reduces the standard error of estimation by 15%. The mechanism of this improvement is the inclusion of pricing factors that are difficult to quantify. For example, «water purification», «concierge guard», «club house», «video surveillance system» and similar bigrams reflect the safety, location type and other local public goods that are difficult to measure.