Process mining is a relatively new field of computer science, which deals with process discovery and analysis based on event logs. In this paper we consider the problem of discovering a high-level business process model from a low-level event log, i.e. automatic synthesis of process models based on the information stored in event logs of information systems. Events in a high-level model are abstract events, which can be refined to low-level subprocesses, whose behavior is recorded in event logs. Models synthesis is intensively studied in the frame of process mining research, but only models and event logs of the same granularity are mainly considered in the literature. Here we present an algorithm for discovering high-level acyclic process models from event logs and some specified partition of low-level events into subsets associated with abstract events in a high-level model.
Process mining is a relatively new field of computer science, which deals with process discovery and analysis based on event logs. In this paper we consider the problem of models and event logs conformance checking. Conformance checking is intensively studied in the frame of process mining research, but only models and event logs of the same granularity were considered in the literature. Here we present and justify the method of checking conformance between a high-level model (e.g. built by an expert) and a low-level log (generated by a system).
A parabolic partial differential equation 𝑢′𝑡(𝑡, 𝑥) = 𝐿𝑢(𝑡, 𝑥) is considered, where 𝐿 is a linear second-order differential operator with time-independent coefficients, which may depend on 𝑥. We assume that the spatial coordinate 𝑥 belongs to a finite- or infinite-dimensional real separable Hilbert space 𝐻. Assuming the existence of a strongly continuous resolving semigroup for this equation, we construct a representation of this semigroup by a Feynman formula, i.e. we write it in the form of the limit of a multiple integral over 𝐻 as the multiplicity of the integral tends to infinity. This representation gives a unique solution to the Cauchy problem in the uniform closure of the set of smooth cylindrical functions on 𝐻. Moreover, this solution depends continuously on the initial condition. In the case where the coefficient of the first-derivative term in 𝐿 vanishes we prove that the strongly continuous resolving semigroup exists (this implies the existence of the unique solution to the Cauchy problem in the class mentioned above) and that the solution to the Cauchy problem depends continuously on the coefficients of the equation.
Information systems (IS) produce numerous traces and logs at runtime. In context of SOA-based (service-oriented architecture) IS, these logs contain details about sequences of process and service calls. Modern application monitoring and error tracking tools provide only rather straightforward log search and filtering functionality. However, ``clever'' analysis of the logs is highly useful, since it can provide valuable insights into the system architecture, interaction of business domains and services. Here we took runs event logs (trace data) of a big booking system and discovered architectural guidelines violations and common anti-patterns. We applied mature process mining techniques for discovery and analysis of these logs. Process mining aims to discover, analyze, and improve processes on the basis of IS behavior recorded as event logs. In several specific examples, we show successful applications of process mining to system runtime analysis and motivate further research in this area.
Process-Aware Information Systems (PAIS) is a special class of the IS intended for the support the tasks of initialization, end-to-end management and completion of business processes. During the operation such systems accumulate a large number of data that are recorded in the form of the event logs. Event logs are a valuable source of knowledge about the actual behavior of a system. For example, there can be found information about the discrepancy between the real and the prescribed behavior of the system; to identify bottlenecks and performance issues; to detect anti-patterns of building a business system. These problems are studied by the discipline called “Process Mining”.
The practical application of the process mining methods and practices is carried out using the specialized software for data analysts. The subject area of the process analysis involves the work of an analyst with a large number of graphical models. Such work will be more efficient with a convenient graphical modeling tool. The paper discusses the principles of building a graphical tool “VTMine for Visio” for the process modeling, based on the widespread application for business intelligence Microsoft Visio. There are presented features of the architecture design of the software extension for application in the process mining domain and integration with the existing libraries and tools for working with data. The application of the developed tool for solving various types of tasks for modeling and analysis of processes is demonstrated on a set of experimental schemes.
The article considers the problem of estimating autoregressive model parameters of elementary speech units such as phonemes. It is suggested an iterative algorithm based on the Newton numerical minimization technique to search an autoregressive model of phonemes specified its multiple samples. For this purpose the analytical expressions of the gradient and the Hessian of Kullback–Leibler information divergence between autoregressive models were computed. Experimental studies on a set of English phonemes showed that the developed algorithm requires less computational effort for large amounts of data, and iterations count depends little on the amount of input data as opposed to reference phoneme selection algorithm based on the criterion of a minimum sum of information divergence. Moreover, the proposed algorithm allows finding models of phonemes, which provide a higher probability of correct recognition.
In the article, we consider verification of programs with mutual recursion in the data driven functional parallel language Pifagor. In this language the program could be represented as a data flow graph, that has no control connections, and has only data relations. Under these conditions it is possible to simplify the process of formal verification, since there is no need to analyse resource conflicts, which are present in the systems with ordinary architectures. The proof of programs correctness is based on the elimination of mutual recursions by program transformation. The universal method of mutual recursion of an arbitrary number of functions elimination consists in constructing the universal recursive function that simulates all the functions in the mutual recursion. A natural number is assigned to each function in mutual recursion. The universal recursive function takes as its argument the number of a function to be simulated and the arguments of this function. In some cases of the indirect recursion it is possible to use a simpler method of program transformation, namely, the merging of the functions code into a single function. To remove mutual recursion of an arbitrary number of functions, it is suggested to construct a graph of all connected functions and transform this graph by removing functions that are not connected with the target function, then by merging functions with indirect recursion and finally by constructing the universal recursive function. It is proved that in the Pifagor language such transformations of functions as code merging and universal recursive function construction do not change the correctness of the initial program. An example of partial correctness proof is given for the program that parses a simple arithmetic expression. We construct the graph of all connected functions and demonstrate two methods of proofs: by means of code merging and by means of the universal recursive function.
Modern methods and libraries for high quality pseudorandom number generation and for generation of parallel random number streams for Monte Carlo simulations are considered. The probability equidistribution property and the parameters when the property holds at dimensions up to logarithm of mesh size are considered for Multiple Recursive Generators.
Process mining is a new direction in the field of modeling and analysis of processes, where using information from event logs, describing the history of the system behavior, plays an important role. Methods and approaches used in the process mining are often based on various heuristics, and experiments with large event logs are crucial for the study and comparison of the developed methods and algorithms. Such experiments are very time consuming, so automation of experiments is an important task in the field of process mining. This paper presents the language DPMine developed specifically to describe and carry out experiments on the discovery and analysis of process models. The basic concepts of the DPMine language, as well as principles and mechanisms of its extension are described. Ways of integration of the DPMine language as dynamically loaded components into the VTMine modeling tool are considered. An illustrating example of an experiment to build a fuzzy model of the process discovered from the log data stored in a normalized database is given.
Mathematical models of distributed computing based on the calculus of mobile processes ($\pi$-calculus) are widely used for checking the information security properties of cryptographic protocols. Since $\pi$calculus is Turing-complete, this problem is undecidable in general case. Therefore, the research is carried out only for some special classes of $\pi$-calculus processes with restricted computational capabilities, for example, for non-recursive processes, in which all runs have a bounded length, for processes with a bounded number of parallel components, etc. However, even in these cases, the proposed checking procedures are time consuming. We assume that this is due to the very nature of the $\pi$ -calculus processes. The goal of this paper is to show that even for the weakest model of passive adversary and for relatively simple protocols that use only the basic $\pi$-calculus operations, the task of checking the information security properties of these protocols is co-NP-complete.
A statically typed version of the data driven functional parallel computing model is proposed. It enables a representation of dynamically changing parallelism by means of asynchronous serial data flows. We consider the features of the syntax and semantics of the statically typed data driven functional parallel programming language Smile that supports asynchronous sequential flows. Our main idea is to apply the Hoar concept of communicating sequential processes to the computation control on the data readiness. It is assumed that on the data readiness a control signal is emitted to inform the processes about the occurrence of certain events. The special feature of our approach is that the model is extended with the special asynchronous containers that can generate events on their partial filling. These containers are a stream and a swarm, each of which has its own specifics. A stream is used to process data which have identical type. The data comes sequentially and asynchronously at arbitrary time moments. The number of the incoming data elements is initially unknown, so the processing completes on the signal of the end of the stream. A swarm is used to contain independent data of the same type and may be used for the massive parallel operations performing. Unlike a stream, the swarm’s size is fixed and known in advance. General principles of the operations with the asynchronous sequential flows with an arbitrary order of data arrival are described. The use of the streams and the swarms in various situations is considered. We propose the language constructions which allow us to operate the swarms and streams and describe the specifics of their application. We provide the sample functions to illustrate the use of the different approaches to description of the parallelism: recursive processing of the asynchronous flows, processing of the flows in an arbitrary or predefined order of operations, direct access and access by the reference to the elements of the streams and swarms, pipelining of calculations. We give a preliminary parallelism assessment which depends on the ratio of the rates of data arrival and their processing. The proposed methods can be used in the development of the future languages and tool-kits of architecture-independent parallel programming.
Complex information systems are often implemented by using more than one programming language. Sometimes this variety takes a form of one host and one or few string-embedded languages. Textual representation of clauses in a string-embedded language is built at run time by a host program and then analyzed, compiled or interpreted by a dedicated runtime component (database, web browser etc.) Most general-purpose programming languages may play the role of the host; one of the most evident examples of the string-embedded language is the dynamic SQL which was specified in ISO SQL standard and is supported by the majority of DBMS. Standard IDE functionality such as code completion or syntax highlighting can really helps the developers who use this technique. There are several tools providing this functionality, but they all process only one concrete string-embedded language and cannot be easily extended for supporting another language. We present a platform which allows to easily create tools for string-embedded language processing.
During the life-cycle of an Information System (IS) its actual behaviour may not correspond to the original system model. However, to the IS support it is very important to have the latest model that reflects the current system behaviour. To correct the model, the information from the event log of the system may be used. In this paper, we consider the problem of process model adjustment (correction) using the information from an event log. The input data for this task are the initial process model (a Petri net) and the event log. The result of correction should be a new process model, better reflecting the real IS behavior than the initial model. The new model could be also built from scratch, for example, with the help of one of the known algorithms for automatic synthesis of the process model from an event log. However, this may lead to crucial changes in the structure of the original model, and it will be difficult to compare the new model with the initial one, hindering its understanding and analysis. It is important to keep the initial structure of the model as much as possible. In this paper, we propose a method for process model correction based on the principle of "divide and conquer". The initial model is decomposed in several fragments. For each fragment its conformance to the event log is checked. Fragments which do not match the log are replaced by newly synthesized ones. The new model is then assembled from the fragments via transition fusion. The experiments demonstrate that our correction algorithm gives good results when it is used for correcting local discrepancies. The paper presents the description of the algorithm, the formal justication for its correctness, as well as the results of experimental testing by some artificial examples.
One of the most simple models of computation which is suitable for representation of reactive systems behaviour is a ﬁnite state transducer which operates over an input alphabet of control signals and an output alphabet of basic actions. The behaviour of such a reactive system displays itself in the correspondence between ﬂows of control signals and compositions of basic actions performed by the system. We believe that the behaviour of this kind requires more suitable and expressive means for formal speciﬁcations than the conventional LTL. In this paper, we deﬁne some new (as far as we know) extension LP-LTL of Linear Temporal Logic speciﬁcally intended for describing the properties of transducers computations. In this extension the temporal operators are parameterized by sets of words (languages) which represent distinguished ﬂows of control signals that impact on a reactive system. Basic predicates in our variant of the temporal logic are also languages in the alphabet of basic actions of a transducer; they represent the expected response of the transducer to the speciﬁed environmental inﬂuences. In our earlier papers, we considered a model checking problem for LP-LTL and LP-CTL and showed that this problem has eﬀective solutions. The aim of this paper is to estimate the expressive power of LP-LTL by comparing it with some well known logics widely used in the computer science for speciﬁcation of reactive systems behaviour. We discovered that a restricted variant LP-1-LTL of our logic is more expressive than LTL and another restricted variant LP-n-LTL has the same expressive power as monadic second order logic S1S.