Формализация содержания кинофильмов с использованием текстовой информации
Many semantic text analysis problems employ string-to-text relevance measures. Research paper annotation problem is no exception. In general, research papers are annotated according to a system of topics, organized as a taxonomy, a hierarchy of topics (or concepts). For example the papers, published in journals of the international Association of Computing Machinery (ACM), the most influential organization in the Computer Science world, are annotated according to the Computing Classification System taxonomy (ACM CCS). String-to-text relevance measures should be used to automate the research paper annotation procedure since taxonomy topics are strings ant research papers or any of their constituents are texts. A relevance measure maps a string–text pair to a real number. The meaning of the mapping depends on the relevance model under consideration. Under any model, the higher the relevance value, the stronger the association between the string and the text. This paper explores the use of phrase-to-text relevance measures to annotate research papers in Computer Science by key phrases taken from the ACM Computing Classification System. Three phrase-to-text relevance measures are experimentally compared in this setting. The measures are: (a) cosine relevance score between conventional vector space representations of the texts coded with tf-idf weighting; (b) a popular characteristic of the probability of “elite” term generation BM25; and (c) a characteristic of the symbol conditional probability averaged over matching fragments in suffix trees representing texts and phrases, CPAMF, introduced by the authors. Our experiment is conducted over a set of texts published in journals of the ACM and manually annotated by their authors using topics from the ACM CCS. Applying any of the relevance measures to an article results in a list of taxonomy topics sorted in the descending order of their relevance values. The results are evaluated by comparing these sorted lists and lists of topics assigned to articles manually. The higher a manually assigned topic is placed in a relevance based sorted list of topics, the more accurate the sorted list is. The accuracy of the computational annotations is scored by using three different scoring functions: a) MAP, b) nDCG, c) Intersection at k, where (a) and (b) are taken from the literature, and (c) is introduced by the authors. It appears, CPAMF outperforms both the cosine measure and BM25 by a wide margin over all three scoring functions.
Formal Concept Analysis (FCA) is an unsupervised clustering technique and many scientific papers are devoted to applying FCA in Information Retrieval (IR) research. We collected 103 papers published between 2003-2009 which mention FCA and information retrieval in the abstract, title or keywords. Using a prototype of our FCA-based toolset CORDIET, we converted the pdf-files containing the papers to plain text, indexed them with Lucene using a thesaurus containing terms related to FCA research and then created the concept lattice shown in this paper. We visualized, analyzed and explored the literature with concept lattices and discovered multiple interesting research streams in IR of which we give an extensive overview. The core contributions of this paper are the innovative application of FCA to the text mining of scientific papers and the survey of the FCA-based IR research.
Institutions affect investment decisions, including investments in human capital. Hence institutions are relevant for the allocation of talent. Good market-supporting institutions attract talent to productive value-creating activities, whereas poor ones raise the appeal of rent-seeking. We propose a theoretical model that predicts that more talented individuals are particularly sensitive in their career choices to the quality of institutions, and test these predictions on a sample of around 95 countries of the world. We find a strong positive association between the quality of institutions and graduation of college and university students in science, and an even stronger negative correlation with graduation in law. Our findings are robust to various specifications of empirical models, including smaller samples of former colonies and transition countries. The quality of human capital makes the distinction between educational choices under strong and weak institutions particularly sharp. We show that the allocation of talent is an important link between institutions and growth.
Tourism development in St. Petersburg, which is a major cultural centre, has improved in terms of the tourist flow; both tourism types and tourist products have become more diverse. These improvements give ground for a fairly optimistic prognosis for the tourist industry in St. Petersburg. At the same time, there are a number of factors which tend to endanger sustainable development of tourism in St. Petersburg. The current situation calls for a more flexible and innovative approach to industry development. Among these factors we can single out the pronounced seasonal character of tourism, short-term visits of most of the tourists, the rather conservative, academic image of the St. Petersburg culture, which compromises the city’s appeal as a tourist destination for certain tourist segments. Apart from that, the critical limitation imposed on the development of cultural tourism in general and of creative tourism in particular is the low involvement of the population in cultural and tourist events held in the city. All in all this makes it relevant to look for new approaches for creative tourism development in St. Petersburg as an important tool for the sustainable development of the industry.
The article aims at considering the existing and potential competitive advantages of St. Petersburg as a tourist destination on the basis of creative tourism development.