Performance Evaluation of Large Table Association Problem Implemented in Apache Spark on Cluster with Angara Interconnect
In this paper we consider an association problem with constraints for two dynamically enlarging tables. We consider a base full association algorithm and propose a partial association algorithm that improves efficiency of the base algorithm. We implement and evaluate the algorithms in Apache Spark for a particular case on the cluster with Angara interconnect.
The Semantic Evaluation (SemEval) series of workshops focuses on the evaluation and comparison of systems that can analyse diverse semantic phenomena in text with the aim of extending the current state of the art in semantic analysis and creating high quality annotated datasets in a range of increasingly challenging problems in natural language semantics. SemEval provides an exciting forum for researchers to propose challenging research problems in semantics and to build systems/techniques to address such research problems. SemEval-2016 is the tenth workshop in the series of International Workshops on Semantic Evaluation Exercises. The first three workshops, SensEval-1 (1998), SensEval-2 (2001), and SensEval-3 (2004), focused on word sense disambiguation, each time growing in the number of languages offered, in the number of tasks, and also in the number of participating teams. In 2007, the workshop was renamed to SemEval, and the subsequent SemEval workshops evolved to include semantic analysis tasks beyond word sense disambiguation. In 2012, SemEval turned into a yearly event. It currently runs every year, but on a two-year cycle, i.e., the tasks for SemEval-2016 were proposed in 2015. SemEval-2016 was co-located with the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’2016) in San Diego, California. It included the following 14 shared tasks organized in five tracks: • Text Similarity and Question Answering Track – Task 1: Semantic Textual Similarity: A Unified Framework for Semantic Processing and Evaluation – Task 2: Interpretable Semantic Textual Similarity – Task 3: Community Question Answering • Sentiment Analysis Track – Task 4: Sentiment Analysis in Twitter – Task 5: Aspect-Based Sentiment Analysis – Task 6: Detecting Stance in Tweets – Task 7: Determining Sentiment Intensity of English and Arabic Phrases • Semantic Parsing Track – Task 8: Meaning Representation Parsing – Task 9: Chinese Semantic Dependency Parsing • Semantic Analysis Track – Task 10: Detecting Minimal Semantic Units and their Meanings – Task 11: Complex Word Identification – Task 12: Clinical TempEval iii • Semantic Taxonomy Track – Task 13: TExEval-2 – Taxonomy Extraction – Task 14: Semantic Taxonomy Enrichment This volume contains both Task Description papers that describe each of the above tasks and System Description papers that describe the systems that participated in the above tasks. A total of 14 task description papers and 198 system description papers are included in this volume. We are grateful to all task organisers as well as the large number of participants whose enthusiastic participation has made SemEval once again a successful event. We are thankful to the task organisers who also served as area chairs, and to task organisers and participants who reviewed paper submissions. These proceedings have greatly benefited from their detailed and thoughtful feedback. We also thank the NAACL 2016 conference organizers for their support. Finally, we most gratefully acknowledge the support of our sponsor, the ACL Special Interest Group on the Lexicon (SIGLEX). The SemEval-2016 organizers, Steven Bethard, Daniel Cer, Marine Carpuat, David Jurgens, Preslav Nakov and Torsten Zesch
In many areas, such as social science, politics or market research, people need to track sentiment and their changes over time. For sentiment analysis in this field it is more important to correctly estimate proportions of each sentiment expressed in the set of documents (quantification task) than to accurately estimate sentiment of a particular document (classification). Basically, our study was aimed to analyze the effectiveness of two iterative quantification techniques and to compare their effectiveness with baseline methods. All the techniques are evaluated using a set of synthesized data and the SemEval-2016 Task4 dataset. We made the quantification methods from this paper available as a Python open source library. The results of comparison and possible limitations of the quantification techniques are discussed.
During the last decade, the concept of the Knowledge Triangle (KT) in the form of change processes that foster greater interaction between education, research and innovation activities has left the academic community and diffused to the higher education and research policy arena. As a result, numerous policy measures have been developed and implemented aiming at strengthening interaction between the different sides of the knowledge triangle. Similarly, structured and systematic efforts have been taken to describe and understand the important role of universities in the innovation landscape. Universities fulfil numerous missions but they also face the challenge of meeting diverging expectations by different stakeholders. Furthermore, this challenge is complicated by the fact that universities and their surrounding environments are not static but co-develop continuously. The book presents a number of case studies showing how universities react to these changing conditions. It shows examples of aligning universities to the Knowledge Triangle.
Apache Spark is one of the most popular Big Data frameworks. Performance evaluation of Big Data frameworks is a topic of interest due to the increasing number and importance of data analytics applications within the context of HPC and Big Data convergence. In the paper we present early performance evaluation of a typical supervised graph anomaly detection problem implemented using GraphX and MLlib libraries in Apache Spark on a cluster.
LoRaWAN is a relatively new protocol designed to provide cheap and reliable wireless connectivity in various Internet of Things scenarios. Being a Low Power Wide Area Network technology operating in the ISM band, it rapidly got popularity in both industry and academic communities. Literature review shows that in spite of numerous studies of its PHY layer –, the MAC layer got little attention, even though it has multiple issues ,  that limit its performance. However, as LoRaWAN is designed to support networks of thousands of devices, it is crucial not only to consider the performance of this technology in point-to-point scenarios, but also to evaluate its applicability in case of highly-populated networks.
Because of the lack of data on cash flows, it is impossible to use traditional measures of return such as IRR and TVPI for evaluation the performance of private equity funds in emerging markets.
In this study, we proposed an approach based on the use of adjusted rates of return for the PE funds, which can be implemented without the use of data on cash flows and net assets of the funds. The proposed indicators can be calculated on the basis of the publicly available data on portfolio transactions of the fund.
The study was presented methodology based on the performance of private equity portfolio transactions as well as the analysis of empirical data on a sample of 1957 deals in BRIC countries from 2000 to 2012.
The results of the empirical analysis largely support a number of fundamental characteristics of the PE funds, previously identified for the developed capital markets such as:
1. Private equity deals in developing countries are more risky assets than traditional instruments.
2. The return on the majority of transactions is below the return of the stock market, however, the most successful are significantly ahead of the market.
3. Coefficient β of buyout funds is less than one, indicating the low exposure to systemic risk.
Some characteristics were confirmed only in part:
1. The investments of venture capital funds have a coefficient β is greater than one for the markets of Brazil and India, and less than one for Russia and China.
2. Return on investment is higher for buyout funds than for venture capital funds in Russia and China. In India and Brazil - the opposite result.
The rest of the characteristics are fundamentally different from the identified in the developed capital markets:
1. The period of ownership for the private equity fund investment in developing countries is less than for developed countries and is an average of 3.3 years.
The article demonstrates that crimes that come to the attention of the criminal police have varying worth in the eyes of Russian policemen and, consequently, attract unequal efforts. The worth of crimes is closely related to the criteria for evaluation of police performance. The data derived from 12 in-depth interviews with Russian police officers, nine indepth interviews with senior students of Moscow University of Russian Interior Ministry who are undergoing practice within police departments, and online discussions within the police community show that policemen in Russia made their practical decisions while balancing between multiple orders of worth. The theoretical framework of data interpretation is represented by symbiosis theories of valuations and the institutional logics approach. Operationalized as a set of cultural rules and expectations defining legitimate grounds for assessing and determining what rational behavior in a given organizational context really is, the concept of institutional logics stresses the interrelations between value-oriented and material dimensions of social action but allows one to stress the hierarchy and constant competition between various orders of worth in an organization. Four institutional logics — state, clan, quasi-market, and professional — are empirically identified. Each of them brings its own order of worth to the police organizational environment. Crimes in the eyes of the police always have a price — expressed in either “checkmarks,” points of recognition by the boss or colleagues, or money. The data suggest that, despite the hierarchy between the orders of (crimes’) worth within the police system as a whole, in each case, institutional logics and criteria of worth related to them compete with each other. Depending on the characteristics of the criminal case and the situation in the police department at a given moment, the competition between various orders of worth is resolved by policemen in different ways. The results of the study shed light on the functioning of police discretion and help to accentuate the dysfunctional side of police reform in Russia.