The work is devoted to methods of social network users’ age detection. Social networks allow users to fill their profiles that may contain an age. Profiles are not fully filled, so the task of unknown attributes detection arises. Explicit and predicted values are used in recommender and marketing systems. Moreover, the predicted values can be used for determining online communities’ demographic profiles and for inferring the target audience of marketing campaigns in the Internet. In this paper a method for predicting unfilled age values is proposed. The method uses the following information available from the social network: explicit users’ ages and social graph. The graph contains nodes representing users and communities. Community is the special page in the Internet that unites users on interests. Friendship relations between users and subscriptions of users on communities represented as edges of the social graph. The method is based on the label propagation in the friendship and subscription graphs. Ages of the users are representd by labels that are propagated in the graph. The scheme of the algorithm is following: initialize user labels according to explicit profiles; build vector model that contains distributions of the neighbours’ ages grouped by user age; compute weights of users and communities, propagate labels to communities; build vector model considering calculated weights; propagate labels to users that have not filled their age in the profile. The paper describes the algorithm and contains experimantal results showing that friendship relations are more useful for age prediction in the social network than communities.
Nowadays OpenStack platform is a leading solution in cloud computing field. Keystone, the OpenStack Identity Service, is one of its major components. This service is responsible for authentication and authorization of users and services of the system. Keystone is a high-load service since all interactions between services happen through it. This leads us to the following problem: increasing the number of cloud service users results to significant load growth. In this paper we demonstrate the problem of Keystone performance degradation during constant load. In order to find source of the problem we have tested Keystone with different backends (PostgreSQL, MariaDB), frontends (Apache2, ngnix) and keeping the database on different hardware (HDD, SSD and tmpfs on RAM). Using different configuration sets is necessary because it helps us to narrow the amount of possible reasons causing degradation. Tests were conducted with Rally. As a result, in all test cases we have seen inadequate quick degradation under relatively light load. We have also implemented a mock service which represents the simplest Keystone tasks. Our service turned out to be much faster than Keystone. The problem with Keystone might be related to either its internal logic implementation or incorrect interaction with other components; it is the subject of further research.
he paper is devoted to methods for construction of socio-demographic profile of Internet users. Gender, age, political and religion views, region, relationship status are examples of demographic attributes. This work is a survey of methods that detect demographic attributes from user’s profile and messages. The most of surveyed works are devoted to gender detection. Age, political views and region are also interested researches. The most popular data sources for demographic attributes extraction are social networks, such as Facebook, Twitter, Youtube. The most of solutions are based on supervised machine learning. Machine learning allows to find target values (demographic attributes) dependencies from input data and use them to predict the value of the target attribute for the new data. The following problem solving steps are surveyed in the paper: feature extraction, feature selection, model training, evaluation. Researches use different kind of data to predict demographic attributes. The most popular data source is text. Words sequences (n-grams), parts of speech, emoticons, features specific to particular resources (eg, @ mentions and # Hashtags on Twitter) are extracted and used as input for machine learning algorithms. Social graphs are also used as source data. Communities of users that are automatically extracted from social graph are user as features for attributes prediction. Text data produces a lot of features. Feature selection algorithms are needed to reduce feature space. The paper surveys feature selection, classification and regression algorithms, evaluation metrics.
Finite state transducers extend the finite state automata to model functions on strings or lists. They may be used also as simple models of sequential reactive programs. These programs operate in the interaction with the environment permanently receiving data (requests) from it. At receiving a piece of data such program performs a sequence of actions. When certain control points are achieved a program outputs the current results of computation as a response. It is significant that different sequences of actions may yield the same result. Therefore, the basic actions of a program may be viewed as generating elements of some appropriate semigroup, and the result of computation may be regarded as the composition of actions performed by the program. This paper offers an alternative technique for the analysis of finite state transducers over semigroups. To check the equivalence of two initial transducers we associate with them a Labeled Transition Systems (LTS). Each path in this LTS represents all possible runs of the initial transducers on the same input word. Every node of the LTS keeps track of the states of the initial transducers achieved at reading some input word and the deficiency of the output words computed so far. If both transducers reach their final states and the deficiency of their outputs is nonzero then this indicates that the initial transducers produce different images for the same word, and, hence, they are not equivalent. The nodes of the LTS that capture this effect are called rejecting nodes. Thus, the equivalence checking of the initial transducers is reduced to checking the reachability of rejecting nodes in the LTS. We show that one needs to analyze only a bounded fragment of the LTS to certify (un)reachability of rejecting nodes. The size of this fragment is polynomial of the size of the initial transducers if both transducers are deterministic, and single-exponential if they are k-bounded. The same approach is applicable for checking k-valuedness of transducers over semigroups.
The presented overview is concerned with lexical query optimization and covers papers published in the last four decades. The paper consists of five sections. The first section – Introduction – classifies query optimization techniques into semantic optimizations and lexical optimizations. Semantic optimizations usually relies on data integrity rules that are stores within metadata part of databases, and on data statistics. This kind of optimizations is discussed in many textbooks and papers. Lexical optimizations (more often called rewriting) use only a text of query and no other information about database and its structure. Lexical optimizations are further classified into query transformations, query amelioration, and query reduction. The second section of the paper discusses techniques of query transformation such as predicate pushdown, transformation of nested query into query with joins, etc. Query amelioration is a topic of the third section with a focus on magic set optimizations. The forth section covers query reduction optimizations. The section briefly describes traditional approaches (such as tableau-based) and considers in more details three new algorithms proposed by authors. The fifth section concludes the paper.
In 2005, I wrote an article in which I discussed the most important features of the standards ODMG 3.0 (ODMG object model) and the SQL:2003 (SQL data model) and convincingly (as it seemed to me) argued that the similarity between the object model and object extensions to SQL is purely external, that close in mind syntactic constructions hide deep differences at the model level. Since then, it took many years for which I understood many things that were wrongly or insufficient correctly understood by me then, and gradually came to the conclusions that: 1. differences that seemed to me deep, are not such, and indeed are not differences at the model level; 2. the object extensions of SQL provide no less (and rather large) capabilities than the ODMG object model; 3. reasonably (from the standpoint of the database community) used DBMSes based on the ODMG data model, one will create databases and tools to manipulate them similar to those prescribed by the SQL data model.
The paper discusses an optimization approach for external calls in positionindependent code that is based on loading the callee address immediately at the call site from the Global Offset Table (GOT), avoiding the use of the Procedure Linkage Table (PLT). Normally the Linux toolchain creates the PLT both in the main executable (which comprises position-dependent code and has to rely on the PLT mechanism to make external calls) and in shared libraries, where the PLT serves to implement lazy binding of dynamic symbols, but is not required otherwise. However, calls via the PLT have some overhead due to an extra jump instruction and poorer instruction cache locality. On some architectures, binary interface of PLT calls constrains compiler optimization at the call site. It is possible to avoid the overhead of PLT calls by loading the callee address from the GOT at the call site and performing an indirect call, although it prevents lazy symbol resolution and may cause increase in code size. We implement this code generation variant in GCC compiler for x86 and ARM architectures. On ARM, loading the callee address from the GOT at call site normally needs a complex sequence with three load instructions. To improve that, we propose new relocation types that allow to build a PC-relative address of a given GOT slot with a pair of movt, movw instructions, and implement these relocation types in GCC and Binutils (assembler and linker) for both ARM and Thumb-2 modes. Our evaluation results show that proposed optimization yields performance improvements on both x86 (up to 12% improvement with Clang/LLVM built with multiple shared libraries, on big translation units) and ARM (up to 7% improvement with SQLite, average over several tests), even though code size on ARM also grows by 13-15%.
The paper describes the workflow for optimizing programs for performance targeting the fixed hardware architecture with static compilation using GCC and LLVM compilers as examples. The workflow has gradually grown within ISP RAS Compiler Technology Team when working on GCC and LLVM compiler optimization. We start with preparing a benchmark using the given application as a source, and then proceed in parallel with manual analysis of generated assembly code and automatic compiler tuning tool. In general, a compiler optimization improvement produced by the manual analysis gives 1-5% speedup, while the automatic tuning results may give up to 10-20% speedup. However, the manual analysis results are usually valid for the broad class of applications and are contributed to the open source compilers, while the automatic tuning results make sense only for the given application. We present some of the optimizations performed, e.g. improved NEON and Thumb-2 support for GCC, vectorization improvements for LLVM, register allocation improvements for LLVM, and the corresponding evaluation results. We also describe TACT, a tool for automatic compiler tuning for the given application mentioned above, and its example use cases both for an application developer and a compiler engineer. We give the sample of TACT optimization results.