Унификация данных музейного Госкаталога РФ
In the era of big data the interest in big data in humanities is growing (for example, in digital humanities). In
Russia there is the Russian Museum State Catalogue that contains information about objects in Russian museums.
The work is still in progress but it already contains information about more than 16 million objects. A lot of data
fields are written in natural language and it makes data analysis almost impossible. Instruments of natural language
processing (for example, named-entity recognition) help to process data and make it possible to analyse it. In this
work we describe the processing of the date of creation, the place of creation, authors and used techniques. As
an example of a research on processed data we describe different categories of objects from the perspective of
place of origin and time of creation. The results meet expectations (well-known facts about history of art).