• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта

Глава

Languages of Russia: Using Social Networks to Collect Texts

P. 179-185.
Krylova I., Orekhov B., Stepanova E., Zaydelman L.

In this paper we outline a method of finding texts in minor languages of Russia in social networks by the example of VKontakte. We find language-specific markers – special tokens that contain letter combinations unique to a certain language and highly frequent in texts in this language. We use Yandex.XML to generate lists of web-pages that contain texts in these languages. We then download data from web-pages in the https://​vk.​com domain through Vkontakte API.

В книге

Languages of Russia: Using Social Networks to Collect Texts
Под науч. редакцией: P. Braslavski, P. Markov, Y. Volkovich et al. Vol. 573. Switzerland: Springer International Publishing, 2016.