Abstract. We present the method for improving the quality metrics of text classification. The result achieved by using of additional semantico-syntactic features for text classifier. These features calculated from a semantico-syntactic representation of text. In our research, we used Stanford CoreNLP parser and its “Universal++Dependencies” representation of parse tree. It allowed us to handle some dependencies between words without additions preprocessing of parse tree and get a more complete set of semantico-syntactic features. In comparison with statistical features, such as TF–IDF (Term Frequency – Inverse Document Frequency) for words or n-grams, our features allows to build more “meaningful” numerical model of texts. At the same time, semantico-syntactic features can be used both for the training of a separate classifier, and are added to statistical features and used in training together.We performed an experiment on English texts from arXiv.org. We have taken the titles and abstracts of 4500 papers from three lexically close subject areas without intersection in subjects and used them for training and evaluation of two classifiers to check our idea. The first classifier trained on statistical features. The second trained on both statistical and semantico-syntactic features. Both of them used support vector machine method and tuned separately for maximum accuracy using cross-validation. The experiment showed a decrease of classification error count by 12.15 % compared with the classifier that trained on the statistical features.
The article addresses the stage of web-application realization for uploading, storing, and processing academic texts in English. The architecture and functions of the application are described, realization methods of software solution are suggested with the specifics of the subject domain. The choice of software tool are described.
Currently, the main banking activity is associated with the need to make optimal management decisions in the face of considerable variability and uncertainty. Such solutions, as a rule, are based on the processing of very large data sets (Big Data) in real time. Typical examples are the tasks of optimal allocation of loan applications for underwriters, the tasks of maximizing risk-return in the management of loan portfolios (such as RAROC), the task of optimizing non-operating costs (Cost to Income) by redistributing resources by processes.
There is an innovative approach to solving similar problems based on simulation modeling (SM), evolutionary computation and machine learning. This approach has already been successfully applied in leading international and Russian companies. At the same time, simulation models are integrated with corporate enterprise information systems, BPM (Business Process Management System) and ERP systems, information storage (DWH), and are thus used in the actual business processes of the organization as the core of an intelligent decision support system.
This work is devoted to developing the simulation model that is intended for the optimal allocation of credit applications for interregional underwriting center of a commercial bank. The main feature of the model is taking into account many factors affecting on credit applications time processing of underwriters, which are responsible for the estimation a probability of default for credit applications. Such factors are related to the current utilization of underwriters in tasks, accessibility of underwriters for new tasks at the current time, etc. The developed simulation model is implemented in the biggest Russian Bank and used as a part of the BPM-system (Business Process Management).
The article examines the decision support methods used in the software design in accordance with the State standard GOST 19.102–77 and GOST R ISO/IEC 9126–93. The main stages of the software products design are analysed: the first – technical specifications development, the second – project development, the third – software testing.
A basic project choice is the main part of the scientific research work on the first stage. The article suggests the criteria of the software product quality and effectiveness and offers several group support decision making methods to make a reasonable choice and ranking for these criteria such as: the method for ranking alternatives, the minimum distance method, the expert assessments clustering method. Is given an example of the selection between five alternative projects on the basis of the method for ranking alternatives.
A choice of a programming language is the significant part of the scientific research work on the second stage. The task of choosing a programming language is a multi-objective one, where a number of criteria is considerably more than a number of alternatives. Offers apply group expert procedures or individual support decision making methods that allow using of numerical and linguistic criteria such as: the method for ranking alternatives, the direct rank appointment method, the method of pairwise comparisons and the minimum distance method. Is given an example of a programming language selection with the help of the method of pairwise comparisons.
A choice of a test ware is the third stage of the scientific research. A review of existing test classes is provided. We consider two examples: of a test ware selection using the method of analytical hierarchies and ranking test classes based on the minimum distance method.
The article proposed a set of best practices for decision support, and examples of the methods under consideration during the design of the software.
The article researches the issues of enterprise IT-infrastructure Monitoring Systems evaluation and choice to satisfy customer's requests. The authors describe the modern Monitoring Systems functions and features. The questions of ranking alternative Systems according to the customer’s requirements and characteristics of the IT-infrastructure of the enterprise are observed. The Monitoring Systems evaluation criteria and methods of constructing a system of decision support (DSS) for choosing the most satisfactory Monitoring System are investigated. The choice of Monitoring System is a complex multi-criteria procedure. The authors propose to replace this rocedure with the clear and understandable decision-making algorithm that uses a valid sequence of simple for a person procedures. A set of alternative variants (Monitoring Systems set), an array of criteria for the selection of the most rational variant and limitations on parameter’s values of the Monitoring Systems are specified to formalize the decision making task. The evaluating technique of the criteria mutual importance is suggested, using the decision making methods: the analytic hierarchy process or rank method. The possibility of reducing the number of criteria by transferring part of their limitations is considered. The construction of criteria hierarchy is proposed to simplify the calculation of weights of criteria: functional criteria; reliability criteria; criteria for the user’s interfaces evaluation; criteria of integration degree with customer support; criteria for the deployment, maintenance and support; cost criteria. The correspondence method selection for Monitoring Systems parameters linguistic descriptions with criteria scales for building decision support systems is produced. The proposed method consists of the following main steps: formulation of decision-making task, relative importance calculation of the Monitoring Systems evaluation criteria, Systems relative value calculation. The application example of the proposed method is illustrated. The involvement of experts is proposed as an alternative approach to solve the problem of Monitoring Systems selection and ranking. In this case, the procedure is performed on group decision making base, in particular – the minimum distance method. The advantage of the proposed technique is the possibility of the Monitoring System choice automation. The application of these methods will improve the decision objectivity to take into account many factors, including the enterprise IT-infrastructure specifics. Software development based on the proposed technique will allow integrator companies and consulting companies to reduce the time of making the most rational decisions.
The article describes benefits of QFD methods for agile development of software to improve quality of the products. The main viewed aspects are requirements, testing and priorities control. Author writes about the means and the effects of QFD practices adoption.
An approach to the detection of hidden information (stegocontainers) in the audio data of MP3 files based on neural network modeling is considered. A multilayer perceptron is used as the instrumental model of the neural network. The structural components of the MP3 file are analyzed: fields containing related information (song titles, album, information about the author, lyrics, etc.), and frames, and fragmented sets of encoded audio data. Useful data are highlighted. A procedure is proposed for presenting audio data of any MP3 file as a uniform set of features of a relatively small size. The dimension of the feature set (data set) can be selected from the range [100-520], in accordance with the minimum and maximum frame size, depending on the compression quality of a single audio file when encoded in MP3 format. Modern software packages for encrypting and decrypting stegocontainers into MP3 files are being investigated. Based on selected software implementations, a database of examples (data sets) is formed from pre-processed MP3 files both containing the stegocontainer and without the stegocontainer. The structure of the neural network for steganalysis of MP3 files is determined experimentally, it is trained and tested. The test results of the neural network system allow us to state its high efficiency
Reservoir Computing is taking attention of neural networks structures developers because of machine learning algorithms are simple at the high level of generalization of the models. The approaches are numerous. Reservoir Computing can be applied to different architectures including recurrent neural networks with irregular connections that are called Echo State Networks (ESN). However, the existence of successful examples of chaotic sequences predictions does not provide successful method of multiple attribute objects classification.
In this paper the binary ESN classifiers are researched. We show that the reason of low precision of classification is the existence of unbalanced classes. Then the method to solve the problem is proposed. It is possible to use randomizing algorithm of learning data set balancing and method of data temporalization. The resulting errors matrixes have pretty good numbers. The proposed method is illustrated by the usage on synthetic data set. The features of ESN classifier are demonstrated in the case of rare events detection such as transaction attributes fraud detection.
The method of a depersonalization of personal information on the basis of hashing of data is considered. Approaches to realization of this method are offered at data storage in a relational database. The technical solutions providing security of data from de depersonalization and high speed of data processing locate.
There is a large number of foreign ERP (Enterprise Resource Planning) systems on the Russian market. Producers face the necessity to localize such information systems before coming into the Russian market. The object of this article is to assess the quality of the solution localization of the foreign ERP-systems suppliers with regard to the requirements of the Russian tax accounting. In the first part there are some provisions of the Russian Tax Code concerning specific tax accounting of the VAT (Value-Added Tax), as well as document flow requirements for VAT. The solutions of top companies in the Russian market of information systems such as Microsoft, SAP SE and Oracle were chosen as the objects of the research. The Tax accounting in VAT is assessed in respect to implementing to three popular Russian information systems Microsoft Dynamics NAV, SAP ERP and Oracle E-Business Suite. The article makes an attempt to evaluate the quality of standard localization packages of mentioned information systems. Disadvanages of the Russian localization for each considered information systems are described. In the author's opinion, the main reason of failure localization information systems SAP ERP and Oracle E-Business Suite is associated with delayed software updates connected with frequent changes in the Russian legislation. The Information system Microsoft Dynamics NAV is updated regularly, though there are many small gaps and spaces in the standard localization package. In the final part, it is concluded that for the full operation the standard localization package should be developed at the stage of the implementation of each examined information systems.
The article discusses the application of statistical modeling for the durability prediction of electronic equipment. Durability indicators are one of the important characteristics of onboard electronic equipment of spacecraft with long terms of active existence. The purpose of the study is to improve the quality of design work by improving the method of calculating the life of electronic equipment, taking into account the probabilistic characteristics of the life components of its electronic components. The object of the study is a typical procedure for calculating the life of electronic equipment at the design stage from data on the characteristics of the durability of electronic components. The subject of the study are methods, models and algorithms applicable to the analysis of the design level of the durability of electronic equipment. To study the effectiveness of statistical modeling, software was developed and the reliability of its work was tested. With the help of this software, computational experiments were carried out. Comparison of the results of statistical modeling and calculations by a typical methodology made it possible to identify a number of significant limitations of the deterministic method. In the standard methodology, it is assumed that the variation coefficient of life is a constant for all electronic components. But the analysis of Data Sheet of electronic components has shown that the values of the variation coefficient can be different for electronic components of different types. The calculations have confirmed the need to take into account the values of the variation coefficient of life in predicting the durability of electronic equipment. In addition, in the standard methodology it is assumed that the life of electronic equipment that contains electronic components with the same life values will be equal to the life of electronic equipment that contains only one such electronic component. Computational experiments have shown that the life of electronic equipment that contains electronic components with the same life values will be less than the life of electronic equipment that contains only one such electronic component. Based on the studies, the scope of the standard methodology was determined and the effectiveness of statistical modeling was proved. However, the results obtained by the statistical modeling method should be corrected by the results of the tests and the controlled operation of electronic equipment.
Due to the overall growth of automation and informatization in all spheres of human activity, systems based on local area networks are becoming more common. Such systems find their use in the fields that place increased demands on the reliability, such as nuclear energy, aviation, chemical manufacturing, etc. To ensure continuity of service, spare part sets (SPT) are being formed in the development phase. SPT cost can reach half the cost of the entire system. In that regard the developer must take SPT costs into account at the stage of selecting the type of organization of computer networks. The paper presents a comparative analysis of the financial costs on the formation of sets of spare parts with different types of service conditions.
This study (research grant No 14-05-0038) was supported by The National Research University–Higher School of Economics’ Academic Fund Program in 2014.