• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта

Глава

Towards Stable Significant Subgroup Discovery

P. 287-292.
Jyoti -., Buzmakov Aleksey, Kailasam S.

Discovering subgroups with significant association with bi-nary class labels has wide applications in drug discovery, market basketanalysis, etc. The state-of-the-art technique, TopKWY, which mines thetop-k significant subgroups does not scale to large datasets, especially,when the search space of concepts is very large. In this paper, we pro-pose SD-SOFIA, an algorithm that mines stable significant subgroupsrather than just significant subgroups. SD-Sofia is able to mine the samesignificant subgroup or subgroup with comparable quality to TopKWYby navigating only a reduced search space. We have verified the resultin 19 real-world datasets. This insight gives us an opportunity to designefficient and scalable algorithm for finding statistically significant sub-group in large datasets. The quality of the pattern mined and the timetaken by our algorithm is governed by the initial delta threshold value.From experiments, we show that when initial delta threshold value is setbetween 0.5 to 3 percent of the number of objects in the dataset, ouralgorithm generates a pattern with comparable quality as TopKWY.