Finding Maximal Common Sub-parse Thickets for Multi-sentence Search
We develop a graph representation and learning technique for parse structures for paragraphs of text. We introduce Parse Thicket (PT) as a set of syntactic parse trees augmented by a number of arcs for inter-sentence word-word relations such as co-reference and taxonomic relations. These arcs are also derived from other sources, including Speech Act and Rhetoric Structure theories. We provide a detailed illustration of how PTs are built from parse trees and generalized as phrases by computing maximal common subgraphs. The proposed approach is subject to evaluation in the product search and recommendation domain, where search queries include multiple sentences. We draw the comparison for search relevance improvement by pair-wise sentence generalization, phrase-level generalization, and generalizations of PTs as graphs.