• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Book chapter

Comparison of String Similarity Measures for Obscenity Filtering

P. 97-101.

In this paper we address the problem of filtering obscene lexis in Russian texts. We use string similarity measures to find words similar or identical to words from a stop list and establish both a test collec- tion and a baseline for the task. Our exper- iments show that a novel string similarity measure based on the notion of an anno- tated suffix tree outperforms some of the other well known measures.