The paper presents a new geometrically motivated method for non-linear regression based on Manifold learning technique. The regression problem is to construct a predictive function which estimates an unknown smooth mapping f from q-dimensional inputs to m-dimensional outputs based on a training data set consisting of given ‘input-output’ pairs. The unknown mapping f determines q-dimensional manifold M(f) consisting of all the ‘input-output’ vectors which is embedded in (q+m)-dimensional space and covered by a single chart; the training data set determines a sample from this manifold. Modern Manifold Learning methods allow constructing the certain estimator M* from the manifold-valued sample which accurately approximates the manifold. The proposed method called Manifold Learning Regression (MLR) finds the predictive function fMLR to ensure an equality M(fMLR) = M*. The MLR simultaneously estimates the m×q Jacobian matrix of the mapping f.

In this paper, we analyze a new approach for demand prediction in retail. One of the signicant gaps in demand prediction by machine learning methods is the unaccounted sales data censorship. Econometric approaches to modeling censored demand are used to obtain consistent and unbiased estimates of parameters. These approaches can also be transferred to different classes of machine learning models to reduce the prediction error of sales volume. In this study we build two ensemble models to predict demand with and without demand censorship, aggregating predictions for machine learning methods such as Linear regression, Ridge regression, LASSO and Random forest. Having estimated the predictive properties of both models, we test the best predictive power of the models with accounting for the censored nature of demand.

The disadvantages of the traditional method of calculating spare parts, tools and accessories package (SPTA) are considered. A multi-criteria methodology for calculating SPTA package has been developed for the simultaneous use of several criteria, such as weight, size or cost. Based on the convolution method, which allows to reduce the multicriterial task to a scalar one. To check the eectiveness of the developed method, SPTA package for a mobile communications vehicle was calculated using the current one-parameter and developed multi-parameter methods. The weight and volume of the required SPTA package was used as limitations. Based on the results obtained, an analysis of the developed method was carried out.

Structured-output learning is a challenging problem; particularly so because of the difficulty in obtaining large datasets of fully labelled instances for training. In this paper we try to overcome this difficulty by presenting a multi-utility learning framework for structured prediction that can learn from training instances with different forms of supervision. We propose a unified technique for inferring the loss functions most suitable for quantifying the consistency of solutions with the given weak annotation. We demonstrate the effectiveness of our framework on the challenging semantic image segmentation problem for which a wide variety of annotations can be used. For instance, the popular training datasets for semantic segmentation are composed of images with hard-to-generate full pixel labellings, as well as images with easy-to-obtain weak annotations, such as bounding boxes around objects, or image-level labels that specify which object categories are present in an image. Experimental evaluation shows that the use of annotation-specific loss functions dramatically improves segmentation accuracy compared to the baseline system where only one type of weak annotation is used.

We proposed a prototype of near-duplicate detection system for web-shop owners. It’s a typical situation for this online businesses to buy description of their goods from so-called copyrighters. Copyrighter can cheat from time to time and provide the owner with some almost identical descriptions for different items. In this paper we demonstrated how we can use FCA for fast clustering and revealing such duplicates in real online perfume shop’s datasets.

A hybrid approach to automated identification and monitoring of technology trends is presented. The hybrid approach combines methods of ontology based information extraction and statistical methods for processing OBIE results. The key point of the approach is the so called ‘black box’ principle. It is related to identification of trends on the basis of heuristics stemming from an elaborate ontology of a technology trend.

Conventional image recognition methods usually include dividing the keypoint neighborhood (for local features) or the whole object (for global features) into a grid of blocks, computing the gradient magnitude and orientation at each image sample point and uniting the orientation histograms of all blocks into a single descriptor. The query image is recognized by matching its descriptors with the descriptors of reference images. The matching is usually done by summation of distances between descriptors of corresponding blocks. Unfortunately, such approach does not lead to a correct distance between vector of points (histograms of each block) if popular square of Euclidean distance is used as a discrimination. To calculate the correct discrimination, we propose to sum the square roots (or, more generally, appropriate nonlinear transformation) of distances between block histograms. Such approach is experimentally examined in a face recognition problem with FERET and AT&T datasets. The results support the statement that the proposed approach provides higher accuracy (up to 5.5%) than state-of-the-art methods not only for the Euclidean distance but for other popular similarity measures (*L* 1, Kullback-Leibler, Jensen-Shannon, chi-squared and homogeneity-testing probabilistic neural network).

Several notions of links between contexts – intensionally related concepts, shared intents, and bonds, as well as interrelations thereof – are considered. Algorithmic complexity of the problems related to respective closure operators are studied. The expression of bonds in terms of shared intents is given.

Lexicographically minimal and lexicographically maximal suffixes of a string are fundamental notions of stringology. It is well known that the lexicographically minimal and maximal suffixes of a given string S can be computed in linear time and space by constructing a suffix tree or a suffix array of S. Here we consider the case when S is a substring of another string T of length n.We propose two linear-space data structures for T which allow to compute the minimal suffix of S in O(log^1+ε n) time (for any fixed ε > 0) and the maximal suffix of S in O(log n) time. Both data structures take O(n) time to construct.

In this paper we propose a woven block code construction based on two convolutional codes. We also propose a soft-input decoder that allows this construction to have better error correction performance than the turbo codes with a conventional decoder. Computer simulation has showed a 0.1 dB energy gain relative to the LTE turbo code. Asymptotically the proposed code has distance greater than the product of free distances of component codes.

Gaussian graphical model selection is a statistical problem

that identifies the Gaussian graphical model from observations. Existing

Gaussian graphical model selection methods focus on the error rate

for incorrect edge inclusion. However, when comparing statistical procedures,

it is also important to take into account the error rate for

incorrect edge exclusion. To handle this issue we consider the graphical

model selection problem in the framework of multiple decision theory.We

show that the statistical procedure based on simultaneous inference with

UMPU individual tests is optimal in the class of unbiased procedures.

Probabilistic neural network (PNN) is the well-known instance-based learning algorithm, which is widely used in various pattern classification and regression tasks, if rather small number of instances for each class is available. The known disadvantage of this network is its insufficient classification computational complexity. The common way to overcome this drawback is the reduction techniques with selection of the most typical instances. Such approach causes the shifting of the estimates of the class probability distribution, and, in turn, the decrease of the classification accuracy. In this paper we examine another possible solution by replacing the Gaussian window and the Parzen kernel to the orthogonal series Fejér kernel and using the naïve assumption about independence of features. It is shown, that our approach makes it possible to achieve much better runtime complexity in comparison with either original PNN or its modification with the preliminary clustering of the training set.

We initiate a comprehensive study of the complexity of computing Boolean functions by polynomial threshold functions (PTFs) on general Boolean domains. A typical example of a general Boolean domain is {1,2}^*n* . We are mainly interested in the length (the number of monomials) of PTFs, with their degree and weight being of secondary interest.

First we motivate the study of PTFs over the {1,2}^*n* domain by showing their close relation to depth two threshold circuits. In particular we show that PTFs of polynomial length and polynomial degree compute exactly the functions computed by polynomial size THR ∘ MAJcircuits. We note that known lower bounds for THR ∘ MAJ circuits extends to the likely strictly stronger model of PTFs. We also show that a “max-plus” version of PTFs are related to AC 0 ∘ THR circuits.

We exploit this connection to gain a better understanding of threshold circuits. In particular, we show that (super-logarithmic) lower bounds for 3-player randomized communication protocols with unbounded error would yield (super-polynomial) size lower bounds for THR ∘ THR circuits.

Finally, having thus motivated the model, we initiate structural studies of PTFs. These include relationships between weight and degree of PTFs, and a degree lower bound for PTFs of constant length.

We describe pretty-printing combinators with choice which provide optimal document layout in polynomial time. Bottom-up tree rewriting and dynamic programming (BURS) is used to calculate a set of possible layouts for a given output width. We also present the results of suggested approach evaluation and discuss its application for the implementation of pretty-printers.

Telecommunication industry is in transformation stage and characterised by heavy capital intensity and high-risk environment. Ensuring that capital expenditures projects in the industry meet their promises, i.e. achieve the goals set in strategic plans is a complex task, which cannot be solved by conventional, project management techniques. To solve this task in the environment of high uncertainty we develop project controlling system that identifies the key project’s goals, “tracks” the progress of achieving these goals by using risk-oriented control procedures and suggests remediation measures to reduce the deviations from the project’s goals. We developed the reference model of project controlling in telecommunication industry and described examples of controls used in the system, where and when they should be implemented. We argue that implementation of project controlling can reduce the deviations from goals set by the project’s plan by around 50%.

We address question of synchronisation in parallel discrete event simulation (PDES) algorithms. We study synchronisation in conservative PDES model adding long-range connections between processing elements. We investigate how fraction of the random long-range connections in the synchronisation scheme influences the simulation time profile of PDES. We found that small fraction of random distant connections enhance synchronisation, namely, the width of the local virtual times remains constant with increasing number of processing elements. At the same time the conservative algorithm of PDES on small-world networks remains free from deadlocks. We compare our results with the case-study simulations.