It is intuitively clear that the search for scientific publications often has many characteristics of a research search. The purpose of this paper is to formalize this intuitive understanding, explore which research tasks of scientists can be attributed to research search, what approaches exist to solve a research search problem in general, and how they are implemented in specialized search engines for scientists. We researched existing works regarding information seeking behavior of scientists and the special variant of a search called exploratory search. There are several types of search typical for scientists, and we showed that most of them are exploratory. Exploratory search is different to information retrieval and demands special support from search systems. We analyzed seventeen actual search systems for academicians (from Google Scholar, Scopus and Web of Science to ResearchGate) from the exploratory search support aspect. We found that most of them didn’t go far from simple information retrieval and there is a room for further improvements especially in the collaborative search support.
The work is devoted to methods of social network users’ age detection. Social networks allow users to fill their profiles that may contain an age. Profiles are not fully filled, so the task of unknown attributes detection arises. Explicit and predicted values are used in recommender and marketing systems. Moreover, the predicted values can be used for determining online communities’ demographic profiles and for inferring the target audience of marketing campaigns in the Internet. In this paper a method for predicting unfilled age values is proposed. The method uses the following information available from the social network: explicit users’ ages and social graph. The graph contains nodes representing users and communities. Community is the special page in the Internet that unites users on interests. Friendship relations between users and subscriptions of users on communities represented as edges of the social graph. The method is based on the label propagation in the friendship and subscription graphs. Ages of the users are representd by labels that are propagated in the graph. The scheme of the algorithm is following: initialize user labels according to explicit profiles; build vector model that contains distributions of the neighbours’ ages grouped by user age; compute weights of users and communities, propagate labels to communities; build vector model considering calculated weights; propagate labels to users that have not filled their age in the profile. The paper describes the algorithm and contains experimantal results showing that friendship relations are more useful for age prediction in the social network than communities.
Nowadays OpenStack platform is a leading solution in cloud computing field. Keystone, the OpenStack Identity Service, is one of its major components. This service is responsible for authentication and authorization of users and services of the system. Keystone is a high-load service since all interactions between services happen through it. This leads us to the following problem: increasing the number of cloud service users results to significant load growth. In this paper we demonstrate the problem of Keystone performance degradation during constant load. In order to find source of the problem we have tested Keystone with different backends (PostgreSQL, MariaDB), frontends (Apache2, ngnix) and keeping the database on different hardware (HDD, SSD and tmpfs on RAM). Using different configuration sets is necessary because it helps us to narrow the amount of possible reasons causing degradation. Tests were conducted with Rally. As a result, in all test cases we have seen inadequate quick degradation under relatively light load. We have also implemented a mock service which represents the simplest Keystone tasks. Our service turned out to be much faster than Keystone. The problem with Keystone might be related to either its internal logic implementation or incorrect interaction with other components; it is the subject of further research.
he paper is devoted to methods for construction of socio-demographic profile of Internet users. Gender, age, political and religion views, region, relationship status are examples of demographic attributes. This work is a survey of methods that detect demographic attributes from user’s profile and messages. The most of surveyed works are devoted to gender detection. Age, political views and region are also interested researches. The most popular data sources for demographic attributes extraction are social networks, such as Facebook, Twitter, Youtube. The most of solutions are based on supervised machine learning. Machine learning allows to find target values (demographic attributes) dependencies from input data and use them to predict the value of the target attribute for the new data. The following problem solving steps are surveyed in the paper: feature extraction, feature selection, model training, evaluation. Researches use different kind of data to predict demographic attributes. The most popular data source is text. Words sequences (n-grams), parts of speech, emoticons, features specific to particular resources (eg, @ mentions and # Hashtags on Twitter) are extracted and used as input for machine learning algorithms. Social graphs are also used as source data. Communities of users that are automatically extracted from social graph are user as features for attributes prediction. Text data produces a lot of features. Feature selection algorithms are needed to reduce feature space. The paper surveys feature selection, classification and regression algorithms, evaluation metrics.
Finite state transducers extend the finite state automata to model functions on strings or lists. They may be used also as simple models of sequential reactive programs. These programs operate in the interaction with the environment permanently receiving data (requests) from it. At receiving a piece of data such program performs a sequence of actions. When certain control points are achieved a program outputs the current results of computation as a response. It is significant that different sequences of actions may yield the same result. Therefore, the basic actions of a program may be viewed as generating elements of some appropriate semigroup, and the result of computation may be regarded as the composition of actions performed by the program. This paper offers an alternative technique for the analysis of finite state transducers over semigroups. To check the equivalence of two initial transducers we associate with them a Labeled Transition Systems (LTS). Each path in this LTS represents all possible runs of the initial transducers on the same input word. Every node of the LTS keeps track of the states of the initial transducers achieved at reading some input word and the deficiency of the output words computed so far. If both transducers reach their final states and the deficiency of their outputs is nonzero then this indicates that the initial transducers produce different images for the same word, and, hence, they are not equivalent. The nodes of the LTS that capture this effect are called rejecting nodes. Thus, the equivalence checking of the initial transducers is reduced to checking the reachability of rejecting nodes in the LTS. We show that one needs to analyze only a bounded fragment of the LTS to certify (un)reachability of rejecting nodes. The size of this fragment is polynomial of the size of the initial transducers if both transducers are deterministic, and single-exponential if they are k-bounded. The same approach is applicable for checking k-valuedness of transducers over semigroups.
The presented overview is concerned with lexical query optimization and covers papers published in the last four decades. The paper consists of five sections. The first section – Introduction – classifies query optimization techniques into semantic optimizations and lexical optimizations. Semantic optimizations usually relies on data integrity rules that are stores within metadata part of databases, and on data statistics. This kind of optimizations is discussed in many textbooks and papers. Lexical optimizations (more often called rewriting) use only a text of query and no other information about database and its structure. Lexical optimizations are further classified into query transformations, query amelioration, and query reduction. The second section of the paper discusses techniques of query transformation such as predicate pushdown, transformation of nested query into query with joins, etc. Query amelioration is a topic of the third section with a focus on magic set optimizations. The forth section covers query reduction optimizations. The section briefly describes traditional approaches (such as tableau-based) and considers in more details three new algorithms proposed by authors. The fifth section concludes the paper.