Probabilistic neural network (PNN) is the well-known instance-based learning algorithm, which is widely used in various pattern classification and regression tasks, if rather small number of instances for each class is available. The known disadvantage of this network is its insufficient classification computational complexity. The common way to overcome this drawback is the reduction techniques with selection of the most typical instances. Such approach causes the shifting of the estimates of the class probability distribution, and, in turn, the decrease of the classification accuracy. In this paper we examine another possible solution by replacing the Gaussian window and the Parzen kernel to the orthogonal series Fejér kernel and using the naïve assumption about independence of features. It is shown, that our approach makes it possible to achieve much better runtime complexity in comparison with either original PNN or its modification with the preliminary clustering of the training set.

We initiate a comprehensive study of the complexity of computing Boolean functions by polynomial threshold functions (PTFs) on general Boolean domains. A typical example of a general Boolean domain is {1,2}^*n* . We are mainly interested in the length (the number of monomials) of PTFs, with their degree and weight being of secondary interest.

First we motivate the study of PTFs over the {1,2}^*n* domain by showing their close relation to depth two threshold circuits. In particular we show that PTFs of polynomial length and polynomial degree compute exactly the functions computed by polynomial size THR ∘ MAJcircuits. We note that known lower bounds for THR ∘ MAJ circuits extends to the likely strictly stronger model of PTFs. We also show that a “max-plus” version of PTFs are related to AC 0 ∘ THR circuits.

We exploit this connection to gain a better understanding of threshold circuits. In particular, we show that (super-logarithmic) lower bounds for 3-player randomized communication protocols with unbounded error would yield (super-polynomial) size lower bounds for THR ∘ THR circuits.

Finally, having thus motivated the model, we initiate structural studies of PTFs. These include relationships between weight and degree of PTFs, and a degree lower bound for PTFs of constant length.

We describe pretty-printing combinators with choice which provide optimal document layout in polynomial time. Bottom-up tree rewriting and dynamic programming (BURS) is used to calculate a set of possible layouts for a given output width. We also present the results of suggested approach evaluation and discuss its application for the implementation of pretty-printers.

Telecommunication industry is in transformation stage and characterised by heavy capital intensity and high-risk environment. Ensuring that capital expenditures projects in the industry meet their promises, i.e. achieve the goals set in strategic plans is a complex task, which cannot be solved by conventional, project management techniques. To solve this task in the environment of high uncertainty we develop project controlling system that identifies the key project’s goals, “tracks” the progress of achieving these goals by using risk-oriented control procedures and suggests remediation measures to reduce the deviations from the project’s goals. We developed the reference model of project controlling in telecommunication industry and described examples of controls used in the system, where and when they should be implemented. We argue that implementation of project controlling can reduce the deviations from goals set by the project’s plan by around 50%.

We address question of synchronisation in parallel discrete event simulation (PDES) algorithms. We study synchronisation in conservative PDES model adding long-range connections between processing elements. We investigate how fraction of the random long-range connections in the synchronisation scheme influences the simulation time profile of PDES. We found that small fraction of random distant connections enhance synchronisation, namely, the width of the local virtual times remains constant with increasing number of processing elements. At the same time the conservative algorithm of PDES on small-world networks remains free from deadlocks. We compare our results with the case-study simulations.

Pair-wise sequence alignment is the basic method of comparative analysis of proteins and nucleic acids. Studying the results of alignment one has to consider two questions^ (1) did the program find all the interesting similarities (“sensitivity”) and (2) are all the found similarities interesting (“selectivity‘’). Definitely, one has to specify, what alignments are considered as the interesting ones. Analogous questions can be addressed to each of the obtained alignments: (3) which part of the aligned positions are aligned correctly (“confidence”) and (4) does alignment contain all pairs of the corresponding positions of compared sequences (“accuracy”). Naturally, the answer on the questions depends on the definition of correct alignment. The presentation addresses above two pairs of questions that are extremely important in interpreting of the results of sequence comparison.

The nearest neighbor search problem is well known since 60s. Many approaches have been proposed. One is to build a graph over the set of objects from a given database and use a greedy walk as a basis for a search algorithm. If the greedy walk has an ability to find the nearest neighbor in the graph starting from any vertex with a small number of steps, such a graph is called a navigable small world. In this paper we propose a new algorithm for building graphs with navigable small world properties. The main advantage of the proposed algorithm is that it is free from input parameters and has an ability to adapt on the fly to any changes in the distribution of data. The algorithm is based on the idea of removing local minimums by adding new edges. We realize this idea to improve search properties of the structure by using the set of queries in the execution stage. An empirical study of the proposed algorithm and comparison with previous works are reported in the paper.

The parallel computing algorithms are explored to improve the efficiency of image recognition with large database. The novel parallel version of the directed enumeration method (DEM) is proposed. The experimental study results in face recognition problem with FERET and Essex datasets are presented. We compare the performance of our parallel DEM with the original DEM and parallel implementations of the nearest neighbor rule and conventional Best Bin First (BBF) k-d tree. It is shown that the proposed method is characterized by increased computing efficiency (2-10 times in comparison with exhaustive search and the BBF) and lower error rate than the original DEM.

We consider the problem of sorting a sequence of *n *keys in a RAM-like environment where memory faults are possible. An algorithm is said to be *δ**-resilient *if it can tolerate up to *δ* memory faults during its execution. A resilient sorting algorithm must produce a sequence where every pair of uncorrupted keys is ordered correctly. Finocchi, Grandoni, and Italiano devised a *δ*-resilient deterministic mergesort algorithm that runs in *O*(*n *log *n *+ *δ*2) time. We present a *δ*-resilient randomized algorithm (based on quicksort) that runs in *O*(*n *log *n *+ *δ**√**n *log *n*) expected time and its deterministic variation that runs in *O*(*n *log *n *+*δ**√**n *log *n*) worst-case time. This improves the previous known result for *δ** >**√**n *log *n*. Our deterministric sorting relies on the notion of an approximate *k*th order statistic. For this auxiliary problem, we devise a deterministic algorithm that runs in *O*(*n *+ *δ**√**n*) time and produces a key (either corrupted or not) whose order rank differs from *k *by at most *O*(*δ*).

In this paper we present a new approach to reduce the computational time spent on coloring in one of the recent branch-and-bound algorithms for the maximum clique problem. In this algorithm candidates to the maximum clique are colored in every search tree node. We suggest that the coloring computed in the parent node is reused for the child nodes when it does not lead to many new branches. So we reuse the same coloring only in the nodes for which the upper bound is greater than the current best solution only by a small value δ. The obtained increase in performance reaches 70 % on benchmark instances.

The article refers to the problem of ringing artifact suppression. The ringing effect is caused by high-frequency information corruption or loss, it appears as waves or oscillations near strong edges. We propose a novel method for ringing artifact suppression after Fourier cut-off filtering. It can be also used for image deringing in the case of image resampling and other applications where the frequency loss can be estimated. The method is based on the joint sparse coding approach. The proposed method preserves more small image details than the state-of-the-art algorithms based on total variation minimization, and outperforms them in terms of image quality metrics.

In this paper we introduce a generalized learning algorithm for probabilistic topic models (PTM). Many known and new algorithms for PLSA, LDA, and SWB models can be obtained as its special cases by choosing a subset of the following “options”: regularization, sampling, update frequency, sparsing and robustness. We show that a robust topic model, which distinguishes specific, background and topic terms, doesn’t need Dirichlet regularization and provides controllably sparse solution.

We propose a novel approach for solving the approximate nearest neighbor search problem in arbitrary metric spaces. The distinctive feature of our approach is that we can incrementally build a non-hierarchical distributed structure for given metric space data with a logarithmic complexity scaling on the size of the structure and adjustable accuracy probabilistic nearest neighbor queries. The structure is based on a small world graph with vertices corresponding to the stored elements, edges for links between them and the greedy algorithm as base algorithm for searching. Both search and addition algorithms require only local information from the structure. The performed simulation for data in the Euclidian space shows that the structure built using the proposed algorithm has navigable small world properties with logarithmic search complexity at fixed accuracy and has weak (power law) scalability with the dimensionality of the stored data.

Since the early 1990s, speaker adaptation have become one of the intensive areas in speech recognition. State-of-the-art batch-mode adaptation algorithms assume that speech of particular speaker contains enough information about the user's voice. In this article we propose to allow the user to manually verify if the adaptation is useful. Our procedure requires the speaker to pronounce syllables containing each vowel of particular language. The algorithm contains two steps looping through all syllables. At first, LPC analysis is performed for extracted vowel and the LPC coefficients are used to synthesize the new sound (with a fixed pitch period) and play it. If this synthesized sound is not perceived by the user as an original one then the syllable should be recorded again. At the second stage, speaker is asked to produce another syllable with the same vowel to automatically verify the stability of pronunciation. If two signals are closed (in terms of the Itakura-Saito divergence) then the sounds are marked as "good" for adaptation. Otherwise both steps are repeated. In the experiment we examine a problem of vowel recognition for Russian language in our voice control system which fuses two classifiers: the CMU Sphinx with speaker-independent acoustic model and Euclidean comparison of MFCC features of model vowel and input signal frames. Our results support the statement that the proposed approach provides better accuracy and reliability in comparison with traditional MAP/MLLR techniques implemented in the CMU Sphinx.

Till today, classification of documents into negative, neutral, or positive remains a key task within the analysis of text tonality/sentiment. There are several methods for the automatic analysis of text sentiment. The method based on network models, the most linguistically sound, to our viewpoint, allows us take into account the syntagmatic connections of words. Also, it utilizes the assumption that not all words in a text are equivalent; some words have more weight and cast higher impact upon the tonality of the text than others. We see it natural to represent a text as a network for sentiment studies, especially in the case of short texts where grammar structures play a higher role in formation of the text pragmatics and the text cannot be seen as just “a bag of words”. We propose a method of text analysis that combines using a lexical mask and an efficient clustering mechanism. In this case, cluster analysis is one of the main methods of typology which demands obtaining formal rules for calculating the number of clusters. The choice of a set of clusters and the moment of completion of the clustering algorithm depend on each other. We show that cluster analysis of data from an n-dimensional vector space using the “single linkage” method can be considered a discrete random process. Sequences of “minimum distances” define the trajectories of this process. “Approximation-estimating test” allows establishing the Markov moment of the completion of the agglomerative clustering process.

The paper presents experimental results on automatic word sense disambiguation (WSD). Contexts for polysemous and/or homonymic Russian nouns denoting physical objects serve as an empirical basis of the study. Sets of contexts were extracted from the Russian National Corpus (RNC). Machine learning software for WSD was developed within the framework of the project. WSD tool used in experiments is aimed at statistical processing and classification of noun contexts. WSD procedure was performed taking into account lexical markers of word meanings in contexts and semantic annotation of contexts. Sets of experi- ments allowed to define optimal conditions for WSD in Russian texts.