?
Выявление искаженной информации: подход с использованием дискурсивных связей
A linguistic method for determining whether given text is a rumor or disinformation is proposed, based on web mining and linguistic technology comparing two text fragments. We hypothesize about a family of content generation algorithms which are capable of producing deception from a portion of genuine, original text. We then propose a disinformation detection algorithm which finds a candidate source of text on the web and compares it with the given text, applying parse thicket technology. Parse thicket is a graph combined from a sequence of parse trees augmented with inter-sentence relations for anaphora and rhetoric structures. We evaluate our algorithm in the domain of customer reviews, considering a product review as an instance of possible deception. It is confirmed as a plausible way to detect rumor and deception in a web document.