### Book

## 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA 2020)

Positive-Unlabeled (PU) learning is an analog to supervised binary classification for the case when only the positive sample is clean, while the negative sample is contaminated with latent instances of positive class and hence can be considered as an unlabeled mixture. The objectives are to classify the unlabeled sample and train an unbiased positive-negative classifier, which generally requires to identify the mixing proportions of positives and negatives first. Recently, unbiased risk estimation framework has achieved state-of-the-art performance in PU learning. This approach, however, exhibits two major bottlenecks. First, the mixing proportions are assumed to be identified, i.e. known in the domain or estimated with additional methods. Second, the approach relies on the classifier being a neural network. In this paper, we propose DEDPUL, a method that solves PU Learning without the aforementioned issues. The mechanism behind DEDPUL is to apply a computationally cheap post-processing procedure to the predictions of any classifier trained to distinguish positive and unlabeled data. Instead of assuming the proportions to be identified, DEDPUL estimates them alongside with classifying unlabeled sample. Experiments show that DEDPUL outperforms the current state-of-the-art in both proportion estimation and PU Classification and is flexible in the choice of the classifier.

In this work we derive an inversion formula for the Laplace transform of a density observed on a curve in the complex domain, which generalizes the well known Post– Widder formula. We establish convergence of our inversion method and derive the corresponding convergence rates for the case of a Laplace transform of a smooth density. As an application we consider the problem of statistical inference for variance-mean mixture models. We construct a nonparametric estimator for the mixing density based on the generalized Post–Widder formula, derive bounds for its root mean square error and give a brief numerical example.

This paper generalizes recent proposals of density forecasting models and it develops theory for this class of models. In density forecasting, the density of observations is estimated in regions where the density is not observed. Identification of the density in such regions is guaranteed by structural assumptions on the density that allows exact extrapolation. In this paper, the structural assumption is made that the density is a product of one-dimensional functions. The theory is quite general in assuming the shape of the region where the density is observed. Such models naturally arise when the time point of an observation can be written as the sum of two terms (e.g., onset and incubation period of a disease). The developed theory also allows for a multiplicative factor of seasonal effects. Seasonal effects are present in many actuarial, biostatistical, econometric and statistical studies. Smoothing estimators are proposed that are based on backfitting. Full asymptotic theory is derived for them. A practical example from the insurance business is given producing a within year budget of reported insurance claims. A small sample study supports the theoretical results

In this paper, in-sample forecasting is defined as forecasting a structured density to sets where it is unobserved. The structured density consists of one-dimensional in-sample components that identify the density on such sets. We focus on the multiplicative density structure, which has recently been seen as the underlying structure of non-life insurance forecasts. In non-life insurance the in-sample area is defined as one triangle and the forecasting area as the triangle that 20 added to the first triangle produces a square. Recent approaches estimate two one-dimensional components by projecting an unstructured two-dimensional density estimator onto the space of multiplicatively separable functions. We show that time-reversal reduces the problem to two onedimensional problems, where the one-dimensional data are left-truncated and a one-dimensional survival density estimator is needed. This paper then uses the local linear density smoother with 25 weighted cross-validated and do-validated bandwidth selectors. Full asymptotic theory is provided, with and without time reversal. Finite sample studies and an application to non-life insurance are included.

A model for organizing cargo transportation between two node stations connected by a railway line which contains a certain number of intermediate stations is considered. The movement of cargo is in one direction. Such a situation may occur, for example, if one of the node stations is located in a region which produce raw material for manufacturing industry located in another region, and there is another node station. The organization of freight traﬃc is performed by means of a number of technologies. These technologies determine the rules for taking on cargo at the initial node station, the rules of interaction between neighboring stations, as well as the rule of distribution of cargo to the ﬁnal node stations. The process of cargo transportation is followed by the set rule of control. For such a model, one must determine possible modes of cargo transportation and describe their properties. This model is described by a ﬁnite-dimensional system of diﬀerential equations with nonlocal linear restrictions. The class of the solution satisfying nonlocal linear restrictions is extremely narrow. It results in the need for the “correct” extension of solutions of a system of diﬀerential equations to a class of quasi-solutions having the distinctive feature of gaps in a countable number of points. It was possible numerically using the Runge–Kutta method of the fourth order to build these quasi-solutions and determine their rate of growth. Let us note that in the technical plan the main complexity consisted in obtaining quasi-solutions satisfying the nonlocal linear restrictions. Furthermore, we investigated the dependence of quasi-solutions and, in particular, sizes of gaps (jumps) of solutions on a number of parameters of the model characterizing a rule of control, technologies for transportation of cargo and intensity of giving of cargo on a node station.

Event logs collected by modern information and technical systems usually contain enough data for automated process models discovery. A variety of algorithms was developed for process models discovery, conformance checking, log to model alignment, comparison of process models, etc., nevertheless a quick analysis of ad-hoc selected parts of a journal still have not get a full-fledged implementation. This paper describes an ROLAP-based method of multidimensional event logs storage for process mining. The result of the analysis of the journal is visualized as directed graph representing the union of all possible event sequences, ranked by their occurrence probability. Our implementation allows the analyst to discover process models for sublogs defined by ad-hoc selection of criteria and value of occurrence probability

The geographic information system (GIS) is based on the first and only Russian Imperial Census of 1897 and the First All-Union Census of the Soviet Union of 1926. The GIS features vector data (shapefiles) of allprovinces of the two states. For the 1897 census, there is information about linguistic, religious, and social estate groups. The part based on the 1926 census features nationality. Both shapefiles include information on gender, rural and urban population. The GIS allows for producing any necessary maps for individual studies of the period which require the administrative boundaries and demographic information.

Existing approaches suggest that IT strategy should be a reflection of business strategy. However, actually organisations do not often follow business strategy even if it is formally declared. In these conditions, IT strategy can be viewed not as a plan, but as an organisational shared view on the role of information systems. This approach generally reflects only a top-down perspective of IT strategy. So, it can be supplemented by a strategic behaviour pattern (i.e., more or less standard response to a changes that is formed as result of previous experience) to implement bottom-up approach. Two components that can help to establish effective reaction regarding new initiatives in IT are proposed here: model of IT-related decision making, and efficiency measurement metric to estimate maturity of business processes and appropriate IT. Usage of proposed tools is demonstrated in practical cases.