Recognizing semantically similar sentences or paragraphs across languages is beneficial for many tasks, ranging from cross-lingual information retrieval and plagiarism detection to machine translation. Recently proposed methods for predicting cross-lingual semantic similarity of short texts, however, make use of tools and resources (e.g., machine translation systems, syntactic parsers or named entity recognition) that for many languages (or language pairs) do not exist. In contrast, we propose an unsupervised and a very resource-light approach for measuring semantic similarity between texts in different languages. To operate in the bilingual (or multilingual) space, we project continuous word vectors (i.e., word embeddings) from one language to the vector space of the other language via the linear translation model. We then align words according to the similarity of their vectors in the bilingual embedding space and investigate different unsupervised measures of semantic similarity exploiting bilingual embeddings and word alignments. Requiring only a limited-size set of word translation pairs between the languages, the proposed approach is applicable to virtually any pair of languages for which there exists a sufficiently large corpus, required to learn monolingual word embeddings. Experimental results on three different datasets for measuring semantic textual similarity show that our simple resource-light approach reaches performance close to that of supervised and resource-intensive methods, displaying stability across different language pairs. Furthermore, we evaluate the proposed method on two extrinsic tasks, namely extraction of parallel sentences from comparable corpora and cross-lingual plagiarism detection, and show that it yields performance comparable to those of complex resource-intensive state-of-the-art models for the respective tasks. (C) 2017 Published by Elsevier B.V.
The paper is focused on an application of sequential three-way decisions and granular computing to the problem of multi-class statistical recognition of the objects, which can be represented as a sequence of independent homogeneous (regular) segments. As the segmentation algorithms usually make it possible to choose the degree of homogeneity of the features in a segment, we propose to associate each object with a set of such piecewise regular representations (granules). The coarse-grained granules stand for a low number of weakly homogeneous segments. On the contrary, a sequence with a large count of high-homogeneous small segments is considered as a fine-grained granule. During recognition, the sequential analysis of each granularity level is performed. The next level with the finer granularity is processed, only if the decision at the current level is unreliable. The conventional Chow’s rule is used for a non-commitment option. The decision on each granularity level is proposed to be also sequential. The probabilistic rough set of the distance of objects from different classes at each level is created. If the distance between the query object and the next checked reference object is included in the negative region (i.e., it is less than a fixed threshold), the search procedure is terminated. Experimental results in face recognition with the Essex dataset and the state-of-the-art HOG features are presented. It is demonstrated, that the proposed approach can increase the recognition performance in 2.5–6.5 times, in comparison with the conventional PHOG (pyramid HOG) method.
There is an ongoing evolution involving a new approach to large-scale optimisations based on co-evolutionary searches using interacting heterogeneous agent-processes via the implementation of synchronised genetic algorithms with local populations. The individualisation of heuristic operators at the level of agent-processes that implement independent evolutionary searches facilitate the improved likelihood of obtaining the best solutions in the fastest time. Based on this property, a parallel multi-agent single-objective real-coded genetic algorithm for large-scale constrained black-box single-objective optimisations (LSOPs ) is proposed. This facilitates the effective frequency exchange of the best potential decisions between interacting agent-processes with individual parameters, such as types of crossover and mutation operators with their own characteristics. We have improved the quality of both solutions and the time-efficiency of a multi-agent real-coded genetic algorithm (MA−RCGA ). A novel framework was developed that represents the aggregation of MA−RCGA with simulation models by implementing a set of objective functions for real-world large-scale optimisation problems such as the simulation model of the ecological-economics system implemented in the AnyLogic tool.