Supplementary Proceedings ICFCA 2019 Conference and Workshops
This paper presents further development of distributed multimodal clustering. We introduce a new version of multimodal clustering algorithm for distributed processing in Apache Hadoop on computer clusters. Its implementation allows a user to conduct clustering on data with modality greater than two. We provide time and space complexity of the algorithm and justify its relevance. The algorithm is adapted for MapReduce distributed processing model. The program implemented by means of Apache Hadoop framework is able to perform parallel computing on thousands of nodes.
Triclustring Toolbox is a collection of triclustering methods consolidated into a single interface. It provides access to both box- and prime-based OAC (Object-Attribute-Condition) triclustering, Spectral triclustering and features implementations of DataPeeler and Trias. The application also contains algorithms for mining triclusters of similar values: NOAC and Tri-K-Means. Quality of triclusters is measured in terms of density, diversity, coverage, and variance, if applicable. Formats for input and output data of all the methods are universal, which makes comparison and interpretation of the results easier. The code is written in C# (.Net 4.5) and runs on Windows. Triclustring Toolbox was used to provide experimental results in several articles on triclustering.
This short paper is related to the problem of finding maximum quasi-bicliques in a bipartite graph (bigraph). A quasi-biclique in a bigraph is its “almost” complete subgraph; here, we assume that the subgraph is a quasi-biclique if it lacks γ · 100% of the edges to become a biclique. The problem of finding the maximal quasi-biclique(s) consists of finding subset(s) of vertices of an input bigraph such that the induced by these subsets subgraph is a quasi-biclique and its size is maximal. A model based on mixed integer programming (MIP) to search for a quasi-biclique is proposed and tested. Another its variant is tested that simultaneously maximizes both the size of the quasi-biclique and its density, using the least-square criterion similar to the one exploited by TriBox method for tricluster generation. Therefore, the output patterns can be called large dense biclusters as well.