Book
Actual Problems of Systems and Software Engineering APSSE 2019 (Invited Papers)
The volume consists of invited papers of the Sixth International Conference “Actual Problems of System and Software Engineering” (APSSE-2019). The Conference was held at the National Research University “Higher School of Economics” from November 12 to November 14, 2019 in Moscow, Russia.
The conference is traditional meeting of specialists in the field of system and software engineering as well as Big Data based information and analytical systems. Traditionally the conference take place once in 2 years. Attendees are from leading universities from Moscow, St.Petersburg, Tomsk, Penza, Magnitogorsk, Omsk and other cities as well as their customers from IT, oil and gas, aviation, public, banking, medicine and other industries. This is the Sixth Conference joining about 200+ specialists from Russia, Italy, Germany, UK etc.
The conference was devoted to the analysis of the status, contemporary trends, research issues and practical results obtained by national and foreign scientists and experts in the system and software engineering area, as well as information and analytical systems development area using Big Data technologies.
The target audience of the conference came to be the experts, students and postgraduates, IEEE members working in the area of ordering, designing, development, implementation, operation, and maintenance of information and analytical systems for various applications and their software, also working on custom software development.
Plenary papers were delivered by the leading domestic and foreign specialists and were aimed at developing the views on the most important and fundamental aspects of the information technology development.
Initially more than 130 papers were submitted All the submitted articles were reviewed by the members of the Program Committee as well as by the independent reviewers.
There are many very interesting invited papers at the conference. For example series of papers from the scientific school of professor Andrey Kostogryzov “Mathematical models and methods of system engineering for preventive risks control in real time” and a few else, paper of professor Sergey Kuznetsov “Towards a Native Architecture of in-NVM DBMS”, paper of Professor, Dr. Sci. Med. Asot Mkrtumyan and Professor Dr.Sci.Tech. Alexandr Shmid “Remote noninvasive detection of carbohydrate metabolism disorders by first-lead ECG screening in CardioQVARK project”, series of papers of school of Professor Valery Vasenin concerning project ISTINA.
We are grateful to the authors of the invited papers, who submitted their papers to this volume, as well as to the members of the Organizing Committee, Program Committee and reviewers who took part in the reviewing submissions to our conference. Special thanks to the organizations that provided support to the Conference: National Research University Higher School of Economics, IEEE, IEEE Computer Society, IEEE Region 8, EC-leasing Co., Ivannikov Institute for System Programming of the Russian Academy of Sciences, Federal Research Center “Computer Science and Control“ of the Russian Academy of Sciences.
Co-chairs of APSSE-2019 Program Committee Boris A. Pozin, Alexander K. Petrenko
Present research is devoted to the comparative analysis of the quality of classification for some methods of descriptive and predictive analytics in the case when most (or all) of independent variables are measured in quality scale with large amount of levels. In this case, some classification methods or their popular realizations calls for conversion of quality variables into systems of dummy variables. If quality scales have large amount of levels which are presented in almost equal proportions in the training set, i.e. it doesn't make sense to enlarge levels, above mentioned requirement will lead to the dramatically rise of problem dimension. As a result, researcher is faced with the curse of dimensionality. It means that, if the problem dimension rise, it'll be necessary to rise the sample size to preserve factors impact estimation accuracy. At the same time, it's not always possible to arrange appropriate growth of the training set volume. In some cases, it's limited by specific properties of the body of interest (system). If such situation appears, it'll be extremely important to evaluate the sensitivity of prediction/classification methods to the curse of dimensionality. Authors of this research focused on the four method of classification, which earn first lines in the lists of the popular methods of business analysis long ago. There are: • Two methods of classification tree building — CART and C4.5 • Logistic regression • Classification on the basis of random forest The first three are descriptive methods, which let's get interpreting (man ready) models, the fourth belongs to predictive analytics. Selection is not random. Descriptive analytics problems extremely important for the process of planning, when it's necessary to get answer on the question "What will be if …?". Particularly, one need to get target group description for organization of marketing communication. At the same time, it is quite conceivable that utilization of interpreting (man ready) models involves loss of prediction quality in comparison with methods of predictive analytics. The current research domain is the activity of microfinancing institutions (MFIs). Traditional problem here is the potential client assessment. The main challenge, which arise in the process of above mentioned problem solution, is the constraints on the volume, composition and type of data, which is available for prediction of default or default probability assessment. Thus, it's necessary to evaluate the abilities of classification methods which were designed for work with large amount of data (it means big size of the training set and a lot of variables, from which the most important should be selected). In real practice of microfinancing organization, the most of recorded factors are measured on the qualitative scales with large amount of levels, what is the origin of the above-mentioned problems. The empirical part of the research is grounded on the data of real microfinancing organization. Some hypotheses about the reasons of default were tested as byproduct of this research.
The article includes analysis electrocardiograms (ECG) of people with tuberculosis diagnosed. The data obtained are compared with the forms of ECG Fourier spectrum of healthy people and people with an ischemia diagnosed. The database of 93 000 ECG was used in this work to gain the data. The hypothesis of the possibility to detect the presence of tuberculosis in the form of its ECG by clustering the forms of the Fourier spectrum of ECG is put forward.
The article is dedicated to implementation and evaluation of a model for simulation of emergency situations on main gas pipeline. Data produced by model was compared to data obtained from real emergency situations on main gas pipeline. Result of this work is the model built using MATLAB Simulink software that can be used for datasets generation, which are useful for artificial neural networks training.
The paper presents formal security model of Linux distributions provided by Bazealt SPO, which integrates multi-level security (MLS) and mandatory integrity control (MIC) implemented on the base of SELinux framework. The model also includes information flows analysis framework described as an extended Take-Grant model. The main novelty of the model is the integration of MLS and MIC that can be implemented in SELinux. The model is specified in a hierarchical manner on Event-B language, its security properties are represented as invariants and formally proved.
Many experts in the field of data management believe that the emergence of non-volatile byte-addressable main memory (NVM) available for practical use will lead to the development of a new type of ultra-high-speed database management systems (DBMS) with single-level data storage (native in-NVM DBMS). However, the number of researchers who are actively engaged in research of architectures of native in-NVM DBMS has not increased in recent years. The most active researchers are PhD students that are not afraid of the risks, which, of course, exist in this new area. The second section of the article discusses the state of the art in NVM hardware. The analysis shows that NVM in the DIMM form factor has already become a reality, and that in the near future we can expect the appearance on the market NVM-DIMMs with the speed of conventional DRAM and endurance close to that of hard drives. The third section is devoted to the review of related works, among which the works of young researchers are the most advanced. In the fourth section, we state and justify that the work performed so far in the field of in-NVM DBMS, did not lead to the emergence of a native architecture. This is hampered by the set of limiting factors analyzed by us. In this regard, in the fifth section, we present a sketch of the native architecture of an in-NVM DBMS, the choice of which is influenced only by the goals of simplicity and efficiency. In conclusion, we summarizes the article and argues the need for additional research into many aspects of the native architecture of an in-NVM DBMS.
Two versions of the mathematical model that detect different glycemia cases using heart rate variability (HRV) values taking into account the patient age have been developed and evaluated. HRV and glucose data have been obtained from 128 patients with type 2 diabetes mellitus (T2DM). Based on the evaluation results, the fundamental possibility of developing a non-invasive glycemia monitoring system based on one of the model variants has been confirmed. To increase the accuracy of the model, it is necessary to conduct a similar study involving patients without annotated type 2 diabetes.
Two versions of the mathematical model that detect different glycemia cases using heart rate variability (HRV) values taking into account the patient age have been developed and evaluated. HRV and glucose data have been obtained from 128 patients with type 2 diabetes mellitus (T2DM).
Based on the evaluation results, the fundamental possibility of developing a non-invasive glycemia monitoring system based on one of the model variants has been confirmed. To increase the accuracy of the model, it is necessary to conduct a similar study involving patients without annotated type 2 diabetes.

Pattern structures, an extension of FCA to data with complex descriptions, propose an alternative to conceptual scaling (binarization) by giving direct way to knowledge discovery in complex data such as logical formulas, graphs, strings, tuples of numerical intervals, etc. Whereas the approach to classification with pattern structures based on preceding generation of classifiers can lead to double exponent complexity, the combination of lazy evaluation with projection approximations of initial data, randomization and parallelization, results in reduction of algorithmic complexity to low degree polynomial, and thus is feasible for big data.
The proceedings of the 11th International Conference on Service-Oriented Computing (ICSOC 2013), held in Berlin, Germany, December 2–5, 2013, contain high-quality research papers that represent the latest results, ideas, and positions in the field of service-oriented computing. Since the first meeting more than ten years ago, ICSOC has grown to become the premier international forum for academics, industry researchers, and practitioners to share, report, and discuss their ground-breaking work. ICSOC 2013 continued along this tradition, in particular focusing on emerging trends at the intersection between service-oriented, cloud computing, and big data.
Full texts of third international conference on data analytics are presented.
The practical relevance of process mining is increasing as more and more event data become available. Process mining techniques aim to discover, monitor and improve real processes by extracting knowledge from event logs. The two most prominent process mining tasks are: (i) process discovery: learning a process model from example behavior recorded in an event log, and (ii) conformance checking: diagnosing and quantifying discrepancies between observed behavior and modeled behavior. The increasing volume of event data provides both opportunities and challenges for process mining. Existing process mining techniques have problems dealing with large event logs referring to many different activities. Therefore, we propose a generic approach to decompose process mining problems. The decomposition approach is generic and can be combined with different existing process discovery and conformance checking techniques. It is possible to split computationally challenging process mining problems into many smaller problems that can be analyzed easily and whose results can be combined into solutions for the original problems.
In 2015-2016 the Department of Communication, Media and Design of the National Research University “Higher School of Economics” in collaboration with non-profit organization ROCIT conducted research aimed to construct the Index of Digital Literacy in Russian Regions. This research was the priority and remain unmatched for the momentIn 2015-2016 the Department of Communication, Media and Design of the National Research University “Higher School of Economics” in collaboration with non-profit organization ROCIT conducted research aimed to construct the Index of Digital Literacy in Russian Regions. This research was the priority and remain unmatched for the moment
Companies are increasingly paying close attention to the IP portfolio, which is a key competitive advantage, so patents and patent applications, as well as analysis and identification of future trends, become one of the important and strategic components of a business strategy. We argue that the problems of identifying and predicting trends or entities, as well as the search for technical features, can be solved with the help of easily accessible Big Data technologies, machine learning and predictive analytics, thereby offering an effective plan for development and progress. The purpose of this study is twofold, the first is an identification of technological trends, the second is an identification of application areas and/or that are most promising in terms of technology development and investment. The research was based on methods of clustering, processing of large text files and search queries in patent databases. The suggested approach is considered on the basis of experimental data in the field of moving connected UAVs and passive acoustic ecology control.
The article is dedicated to the analysis of Big Data perspective in jurisprudence. It is proved that Big Data have to be used as the explanatory and predictable tool. The author describes issues concerning Big Data application in legal research. The problems are technical (data access, technical imperfections, data verification) and informative (interpretation of data and correlations). It is concluded that there is the necessity to enhance Big Data investigations taking into account the abovementioned limits.