Book chapter
How many clusters? An Entropic Approach to Hierarchical Cluster Analysis
Clustering large and heterogeneous data of user-profiles from social media is problematic as the problem of finding the optimal number of clusters becomes more critical than for clustering smaller and homo- geneous data. We propose a new approach based on the deformed R ́enyi entropy for determining the optimal number of clusters in hierarchical clustering of user-profile data. Our results show that this approach allows us to estimate R ́enyi entropy for each level of a hierarchical model and find the entropy minimum (information maximum). Our approach also shows that solutions with the lowest and the highest number of clusters correspond to the entropy maxima (minima of information).
In online social networks, high level features of user behavior such as character traits can be predicted with data from user profiles and their connections. Recent publications use data from online social networks to detect people with depression propensity and diagnosis. In this study, we investigate the capabilities of previously published methods and metrics applied to the Russian online social network VKontakte. We gathered user profile data from most popular communities about suicide and depression on VK.com and performed comparative analysis between them and randomly sampled users. We have used not only standard user attributes like age, gender, or number of friends but also structural properties of their egocentric networks, with results similar to the study of suicide propensity in the Japanese social network Mixi.com. Our goal is to test the approach and models in this new setting and propose enhancements to the research design and analysis. We investigate the resulting classifiers to identify profile features that can indicate depression propensity of the users in order to provide tools for early depression detection. Finally, we discuss further work that might improve our analysis and transfer the results to practical applications.
In this paper we propose two novel methods for analyzing data collected from online social networks. In particular we will do analyses on Vkontake data (Russian online social network). Using biclustering we extract groups of users with similar interests and find communities of users which belong to similar groups. With triclustering we reveal users’ interests as tags and use them to describe Vkontakte groups. After this social tagging process we can recommend to a particular user relevant groups to join or new friends from interesting groups which have a similar taste. We present some preliminary results and explain how we are going to apply these methods on massive data repositories.
We combine bi- and triclustering to analyse data collected from the Russian online social network Vkontakte. Using biclustering we extract groups of users with similar interests and find communities of users which belong to similar groups. With triclustering we reveal users' interests as tags and use them to describe Vkontakte groups. After this social tagging process we can recommend to a particular user relevant groups to join or new friends from interesting groups which have a similar taste. We present some preliminary results and explain how we are going to apply these methods on massive data repositories.
The issue of determining “the right number of clusters” in K-Means has attracted considerable interest, especially in the recent years. Cluster intermix appears to be a factor most affecting the clustering results. This paper proposes an experimental setting for comparison of different approaches at data generated from Gaussian clusters with the controlled parameters of between- and within-cluster spread to model cluster intermix. The setting allows for evaluating the centroid recovery on par with conventional evaluation of the cluster recovery. The subjects of our interest are two versions of the “intelligent” K-Means method, ik-Means, that find the “right” number of clusters by extracting “anomalous patterns” from the data one-by-one. We compare them with seven other methods, including Hartigan’s rule, averaged Silhouette width and Gap statistic, under different between- and within-cluster spread-shape conditions. There are several consistent patterns in the results of our experiments, such as that the right K is reproduced best by Hartigan’s rule – but not clusters or their centroids. This leads us to propose an adjusted version of iK-Means, which performs well in the current experiment setting.
Changes in modern Russian due to the expansion of the new technologies; Russian of the Internet (Runet). Social and cultural consequences of the CMC-revolution.
We combine bi- and triclustering to analyse data collected from the Russian online social network Vkontakte. Using biclustering we extract groups of users with similar interests and find communities of users which belong to similar groups. With triclustering we reveal users' interests as tags and use them to describe Vkontakte groups. After this social tagging process we can recommend to a particular user relevant groups to join or new friends from interesting groups which have a similar taste. We present some preliminary results and explain how we are going to apply these methods on massive data repositories.
Objective: The objective of the study was threefold. First, we examined whether extraversion contributes to the evaluations of an online social network user’s physical attractiveness made by professional recruiters. We studied if this relationship is mediated by a degree of user’s activity and popularity among other users. Second, we presumed this relationship to be specified in terms of the five-factor theory of personality. A type of characteristic adaptation named reflected extraversion was assumed to incrementally contribute to this relationship. Reflected extraversion is a meta-perception representing one’s opinion on how extraverted one is as perceived by significant others. Third, user popularity treated as an external influence in terms of the five-factor theory was presumed to reciprocally affect reflected extraversion. Method: Profiles of 188 online social network users were assessed by four professional recruiters. The latter were asked to evaluate the physical attractiveness of the former. The users completed a number of self-report measures. Various behavioural indicators extracted from the profiles were measured. Results: Extraversion enhanced recruiter-rated physical attractiveness via two paths: user activity and user popularity. The inclusion of reflected extraversion failed to improve the model substantially. However, reflected extraversion mediated the link between trait extraversion and the indicators of user popularity but not the indicators of user activity. The reciprocal path from user popularity towards reflected extraversion was negligible. Conclusions: The study shows that extraversion may allow people to efficiently manage online networking to convince recruiters that they are physically attractive, even in the absence of any offline communications.