Semi-supervised Tag Extraction in a Web Recommender System
An important characteristic feature of recommender systems for web pages is the abundance of textual information in and about the items being recommended (web pages). To improve recommendations and enhance user experience, we propose to use automatic tag (keyword) extraction for web pages entering the recommender system. We present a novel tag extraction algorithm that employs semi-supervised classification based on a dataset consisting of pre-tagged documents and (for the most part) partially tagged documents whose tags are automatically mined from the content. We also compare several classification algorithms for tag extraction in this context.
The problem of detecting terms that can be interesting to the advertiser is considered. If a company has already bought some advertising terms which describe certain services, it is reasonable to find out the terms bought by competing companies. A part of them can be recommended as future advertising terms to the company. The goal of this work is to propose better interpretable recommendations based on FCA and association rules.
This article represents a new technique for collaborative filtering based on pre-clustering of website usage data. The key idea involves using clustering methods to define groups of different users.
The topic of recommender systems is rapidly gaining interest in the user-behaviour modeling research domain. Over the years, various recommender algorithms based on different mathematical models have been introduced in the literature. Researchers interested in proposing a new recommender model or modifying an existing algorithm should take into account a variety of key performance indicators, such as execution time, recall and precision. Till date and to the best of our knowledge, no general cross-validation scheme to evaluate the performance of recommender algorithms has been developed. To fill this gap we propose an extension of conventional cross-validation. Besides splitting the initial data into training and test subsets, we also split the attribute description of the dataset into a hidden and visible part. We then discuss how such a splitting scheme can be applied in practice. Empirical validation is performed on traditional user-based and item-based recommender algorithms which were applied to the MovieLens dataset.
We present a new recommender system developed for the Russian interactive radio network FMhost. To the best of our knowledge, it is the first model and associated case study for recommending radio stations hosted by real DJs rather than automatically built streamed playlists. To address such problems as cold start, gray sheep, boosting of rankings, preference and repertoire dynamics, and absence of explicit feedback, the underlying model combines a collaborative user-based approach with personalized information from tags of listened tracks in order to match user and radio station profiles. This is made possible with adaptive tag-aware profiling that follows an online learning strategy based on user history. We compare the proposed algorithms with singular value decomposition (SVD) in terms of precision, recall, and normalized discounted cumulative gain (NDCG) measures; experiments show that in our case the fusion-based approach demonstrates the best results. In addition, we give a theoretical analysis of some useful properties of fusion-based linear combination methods in terms of graded ordered sets.
This book presents a course of English for Specific Purposes devoted specifically to the widely-discussed topic Web 2.0. It covers several aspects of online communication ranging from online friendship to business interacions. The activities presented in the coursebook are aimed at developing students’ communicative competence in both written and oral discourse. Web 2.0 includes a variety of authentic articles that arouse interest and provoke discussions. It also presents listening texts based on professional podcasts. Most grammar and vocabulary activities are developed from authentic texts as well.
Web 2.0 can be used at the B2-C1 levels of Common European Famework. The coursebook will help learn and practice the target vocabulary. It will be relevant to those interested in the development of Information and Communication Technologies in general and the Internet in particular.
Concept discovery is a Knowledge Discovery in Databases (KDD) research field that uses human-centered techniques such as Formal Concept Analysis (FCA), Biclustering, Triclustering, Conceptual Graphs etc. for gaining insight into the underlying conceptual structure of the data. Traditional machine learning techniques are mainly focusing on structured data whereas most data available resides in unstructured, often textual, form. Compared to traditional data mining techniques, human-centered instruments actively engage the domain expert in the discovery process. This volume contains the contributions to CDUD 2011, the International Workshop on Concept Discovery in Unstructured Data (CDUD) held in Moscow. The main goal of this workshop was to provide a forum for researchers and developers of data mining instruments working on issues with analyzing unstructured data. We are proud that we could welcome 13 valuable contributions to this volume. The majority of the accepted papers described innovative research on data discovery in unstructured texts. Authors worked on issues such as transforming unstructured into structured information by amongst others extracting keywords and opinion words from texts with Natural Language Processing methods. Multiple authors who participated in the workshop used methods from the conceptual structures field including Formal Concept Analysis and Conceptual Graphs. Applications include but are not limited to text mining police reports, sociological definitions, movie reviews, etc.
The 4th International Conference on Educational Data Mining (EDM 2011) brings together researchers from computer science, education, psychology, psychometrics, and statistics to analyze large datasets to answer educational research questions. The conference, held in Eindhoven, The Netherlands, July 6-9, 2011, follows the three previous editions (Pittsburgh 2010, Cordoba 2009 and Montreal 2008), and a series of workshops within the AAAI, AIED, EC-TEL, ICALT, ITS, and UM conferences. The increase of e-learning resources such as interactive learning environments, learning management systems, intelligent tutoring systems, and hypermedia systems, as well as the establishment of state databases of student test scores, has created large repositories of data that can be explored to understand how students learn. The EDM conference focuses on data mining techniques for using these data to address important educational questions.
In this paper we propose two new algorithms based on biclustering analysis, which can be used at the basis of a recommender system for educational orientation of Russian School graduates. The first algorithm was designed to help students make a choice between different university faculties when some of their preferences are known. The second algorithm was developed for the special situation when nothing is known about their preferences. The final version of this recommender system will be used by Higher School of Economics.