A New Cross-Validation Technique to Evaluate Quality of Recommender Systems
The topic of recommender systems is rapidly gaining interest in the user-behaviour modeling research domain. Over the years, various recommender algorithms based on different mathematical models have been introduced in the literature. Researchers interested in proposing a new recommender model or modifying an existing algorithm should take into account a variety of key performance indicators, such as execution time, recall and precision. Till date and to the best of our knowledge, no general cross-validation scheme to evaluate the performance of recommender algorithms has been developed. To fill this gap we propose an extension of conventional cross-validation. Besides splitting the initial data into training and test subsets, we also split the attribute description of the dataset into a hidden and visible part. We then discuss how such a splitting scheme can be applied in practice. Empirical validation is performed on traditional user-based and item-based recommender algorithms which were applied to the MovieLens dataset.
This volume contains the papers selected for presentation at the 2014 IEEE/WIC/ACM International Conference on Web Intelligence (WI'14), held as part of the 2014 Web Intelligence Congress (WIC'14) at the University of Warsaw, Warsaw, Poland, from 11 to 14 in August, 2014. The conference was sponsored and co-organized by the IEEE Computer Society, the Web Intelligence Consortium (WIC), Association for Computing Machinery (ACM), the University of Warsaw, Polish Mathematical Society and Warsaw University of Technology.
The series of Web Intelligence conferences was started in Japan in 2001. Since then, it has been held yearly in several countries, including: Canada, China, France, USA, Australia and Italy. It is recognized as the World's leading forum focusing on the role of Web Intelligence as one of the most important directions for scientific research and development of solutions that contribute to creation of the Knowledge-based Society. In 2014, WI visited Poland as a special event commemorating the 25th anniversary of the Web.
WI'14 received 242 paper submissions, in the areas of foundations of Web Intelligence, semantic aspects of Web Intelligence, World Wide Wisdom Web, Web search and recommendation, Web mining and warehousing, Human-Web interaction, as well as Web Intelligence technologies and applications. After a rigorous evaluation process, 85 papers were selected as regular contributions, giving an acceptance rate of 35.1%.
The first five sections of this volume include 40 regular contributions. Additionally, the first paper in the first section corresponds to one of WIC'14 keynotes. The last four sections of this volume contain 23 papers selected for oral presentations in WI'14 workshops. The remaining 45 regular contributions and 25 papers accepted to WI'14 special sessions are published in another volume of WI’14 proceedings.
Рассматривается способ улучшения производительности рекомендательных систем при помощи предварительного выделения групп пользователей с похожим поведением. Для разбиения пользователей на группы используются распределенная версия алгоритма k-средних и алгоритм canopy для определения начальных центроидов.
The problem of detecting terms that can be interesting to the advertiser is considered. If a company has already bought some advertising terms which describe certain services, it is reasonable to find out the terms bought by competing companies. A part of them can be recommended as future advertising terms to the company. The goal of this work is to propose better interpretable recommendations based on FCA and association rules.
Concept discovery is a Knowledge Discovery in Databases (KDD) research field that uses human-centered techniques such as Formal Concept Analysis (FCA), Biclustering, Triclustering, Conceptual Graphs etc. for gaining insight into the underlying conceptual structure of the data. Traditional machine learning techniques are mainly focusing on structured data whereas most data available resides in unstructured, often textual, form. Compared to traditional data mining techniques, human-centered instruments actively engage the domain expert in the discovery process. This volume contains the contributions to CDUD 2011, the International Workshop on Concept Discovery in Unstructured Data (CDUD) held in Moscow. The main goal of this workshop was to provide a forum for researchers and developers of data mining instruments working on issues with analyzing unstructured data. We are proud that we could welcome 13 valuable contributions to this volume. The majority of the accepted papers described innovative research on data discovery in unstructured texts. Authors worked on issues such as transforming unstructured into structured information by amongst others extracting keywords and opinion words from texts with Natural Language Processing methods. Multiple authors who participated in the workshop used methods from the conceptual structures field including Formal Concept Analysis and Conceptual Graphs. Applications include but are not limited to text mining police reports, sociological definitions, movie reviews, etc.
Четвертая международная конференция по анализу данных в образовании (EDM 2011) объединила исследователей из различных областей: информатики, образования, психологии, психометрики и статистики для анализа больших массивов данных для решения научных задач в области образования. Данная конференция, проведенная в Эйндховене, Нидерланды, Июль 6-9, 2011, уже четвертая, предыдщие три: Питтсбург 2010, Кордоба 2009 и Moнреаль 2008. Увеличение образовательных ресурсов онлайн, таких как интерактивные образовательные системы, системы управления образовательным процессом, интеллектуальные обучающие системы, а также базы данных об успеваемости студентов – все это – огромные массивы данных, которые могут быть использованы для ответа на вопрос, как студенты обучаются. Конференция EDM сфокусирована на использовании методов интеллектуального анализа данных (Data Mining) для использования данных в решении различных задач в образовании.
An important characteristic feature of recommender systems for web pages is the abundance of textual information in and about the items being recommended (web pages). To improve recommendations and enhance user experience, we propose to use automatic tag (keyword) extraction for web pages entering the recommender system. We present a novel tag extraction algorithm that employs semi-supervised classification based on a dataset consisting of pre-tagged documents and (for the most part) partially tagged documents whose tags are automatically mined from the content. We also compare several classification algorithms for tag extraction in this context.
In this paper we propose two new algorithms based on biclustering analysis, which can be used at the basis of a recommender system for educational orientation of Russian School graduates. The first algorithm was designed to help students make a choice between different university faculties when some of their preferences are known. The second algorithm was developed for the special situation when nothing is known about their preferences. The final version of this recommender system will be used by Higher School of Economics.