Text integrity assessment: Sentiment profile vs rhetoric structure
We formulate the problem of text integrity assessment as learning thediscourse structure of text given the dataset of texts with high integrity and lowintegrity. We use two approaches to formalizing the discourse structures, sentimentprofile and rhetoric structures, relying on sentence-level sentiment classifierand rhetoric structure parsers respectively. To learn discourse structures, weuse the graph-based nearest neighbor approach which allows for explicit featureengineering, and also SVM tree kernel–based learning. Both learning approachesoperate on the graphs (parse thickets) which are sets of parse trees with nodeswith either additional labels for sentiments, or additional arcs for rhetoric relationsbetween different sentences. Evaluation in the domain of valid vs invalidcustomer complains (those with argumentation flow, non-cohesive, indicating abad mood of a complainant) shows the stronger contribution of rhetoric structureinformation in comparison with the sentiment profile information. Both abovelearning approaches demonstrated that discourse structure as obtained by RSTparser is sufficient to conduct the text integrity assessment. At the same time,sentiment profile-based approach shows much weaker results and also does not complement strongly the rhetoric structure ones.