We propose a new approach for Collaborative ltering which is based on Boolean Matrix Factorisation (BMF) and Formal Concept Analysis. In a series of experiments on real data (Movielens dataset) we compare the approach with the SVD- and NMF-based algorithms in terms of Mean Average Error (MAE). One of the experimental con- sequences is that it is enough to have a binary-scaled rating data to obtain almost the same quality in terms of MAE by BMF than for the SVD-based algorithm in case of non-scaled data.
Согласно современным исследованиям учет возможной зависимости случайных составляющих ошибки в модели стохастической производственной функции позволяет улучшить качество оценок параметров моделей. Доказано, что для этого можно использовать аппарат копула-функций, позволяющий описать зависимость компонент ошибки с помощью некоторой фиксированной копулы, выбор которой должен быть обусловлен целями исследования.В данной работе особое внимание будет уделено случаю наличия информации о факторах эффективности и показано влияние включения факторов эффективности в модель на зависимость случайных составляющих ошибки.
Assume that $\NP\not\subset\RP$. Gutfreund, Shaltiel, and Ta-Shma in [Computational Complexity 16(4):412-441 (2007)] have proved that for every randomized polynomial time decision algorithm $D$ for SAT there is a polynomial time samplable distribution such that $D$ errs with probability at least $1/6-\eps$ on a random formula chosen with respect to that distribution. A challenging problem is to increase the error probability to the maximal possible $1/2-\eps$ (the random guessing has success probability 1/2). In this paper, we make a small step towards this goal: we show how to increase the error probability to $1/3-\eps$.
An approach of using of the DSM-platform MetaLanguage for integration of various modeling systems is presented. This tool allows to design visual domain-specific modeling languages and to create domain models with developed languages. The MetaLanguage system includes components for describing transformations of models from one formal notation to another. Domain-specific modeling permits various specialists to use concepts from different domains at creating and analyzing of models. An integration of DSM-platforms with tools of models analysis allows to involve domain experts, end-users in the process of constructing and analyzing of models; to reduce the complexity of models development; to fulfill research of models from various points of view with usage of various methods and tools.
To compress large datasets of high-dimensional descriptors, modern quantization schemes learn multiple codebooks and then represent individual descriptors as combinations of codewords. Once the codebooks are learned, these schemes encode descriptors independently. In contrast to that, we present a new coding scheme that arranges dataset descriptors into a set of arborescence graphs, and then encodes non-root descriptors by quantizing their displacements with respect to their parent nodes. By optimizing the structure of arborescences, our coding scheme can decrease the quantization error considerably, while incurring only minimal overhead on the memory footprint and the speed of nearest neighbor search in the compressed dataset compared to the independent quantization. The advantage of the proposed scheme is demonstrated in a series of experiments with datasets of SIFT and deep descriptors.
In this paper an extension of tf-idf weighting on annotated suffix tree (AST) structure is described. The new weighting scheme can be used for computing similarity between texts, which can further serve as in input to clustering algorithm. We present preliminary tests of us-ing AST for computing similarity of Russian texts and show slight im-provement in comparison to the baseline cosine similarity after applying spectral clustering algorithm.