Semantic Analysis of the Imperial Topic: Case of St. Petersburg
In the world of rapidly developing Science and Technology (S&T), with increasing volumes of S&T-related data and greater interdisciplinary and collaborative research, technology mining (TM) helps to acquire intelligence about emerging trends and future S&T developments. The task is becoming crucial not only for high-tech startups and large organizations, but also for venture capitalists and other companies, which make decisions about S&T investments. Governments and Public Research Institutions are also among the main stakeholders and potential users of TM to set up R&D priorities, plans and programs according to the current and future state of S&T development. Term clusters built by TM and bibliometric tools based on co-occurrence of authors’ keywords or terms processed from titles and abstracts of scientific documents combine totally different types of objects: research fields, major problems and challenges, methods, inventions, products, technologies and etc. Specific expertise in the field may allow a researcher to identify key objects of the study. However, objects themselves and their frequency dynamics over the time period alone do not fully indicate S&T developments and emerging trends in the area. In order to improve the process of the identification of emerging S&T trends and developments, the paper focuses on dynamic term clustering and suggests a systemic approach to combine TM, bibliometrics, NLP and semantic analysis as part of the unified analytical framework. The approach proposed utilizes existing clustering methods and tools along with the analysis of term linguistic dependencies in order to study changes of objects over the time along with their semantic meanings.
В работе предложен метод семантического поиска специалистов по набору составленных ими текстов. Описан формат запросов, позволяющий определять набор искомых компетенций. Разработаны алгоритмы построения и сравнения семантических представлений фрагментов текстов на естественном языке. На основепредложенной модели разработан и испытан прототип поисковой системы ExpSearch-1 (Experts Search, версия 1).
В статье рассматривается стратегия «смешивания» методов, получившая особое распространение в западной исследовательской традиции. Освещаются методы анализа текста, продемонстрирована разница между формализованным и неформализованным подходом на примере изучения образа современной России в текстах американского издания «Нью-Йорк таймс», где внимание уделено алгоритмам работы с текстами. Показано, что для изучения такого явления, как образ страны, сочетание формализованных и неформализованных подходов к анализу текста – необходимое и естественное исследовательское явление.
Numerous cultural events take place around the world every year. Visitors leave digital footprint after attending such events, which is a good source of data analysis in tourist behavior and cultural studies. This research provides mapping of festival themes associated with the annual cultural event “Museum Night” on social networking site (SNS) VKontakte (VK) most popular in Russia. All posts containing the official event hashtag in Russian (#ночьмузеев) were collected from VK. To analyse the data, more than 38k posts spanning 2012 to 2019 are used. The results show the dynamic of the event web activity and changes over the last years.
A detailed description of a method for detection of code clones is described. This method is based on the semantic analysis of programs and on new algorithms that make it scalable without affecting its accuracy. The proposed method involves two phases. In the first phase, the program dependence graph (PDG) is constructed while the program is compiled. LLVM is used as the compilation infrastructure. In the second phase, similar subgraphs of maximum size that represent code clones are detected. Before starting the search for similar subgraphs, the PDG is divided into subgraphs that will be considered as potential clones of each other. To ensure scalability of the search for similar subgraphs, the composition of algorithms is used. The first algorithm checks that a pair of graphs cannot have similar subgraphs of the desired size; this is done in a linear amount of time. If this algorithm fails, another (approximate) algorithm is executed to find similar subgraphs of maximum size. After similar subgraphs have been found, the program code is additionally checked for the position of the code lines corresponding to the detected clone candidates. Tests showed that the developed tool is more accurate than similar tools, such as MOSS, CCFinder, and CloneDR. Results obtained for the projects Linux-2.6, Firefox Mozilla, LLVM/Clang, and OpenSSL are presented