?
Parse thicket representations of text paragraphs
We develop a graph representation and learning technique for parse structures
for sentences and paragraphs of text. We introduce parse thicket
as a set of syntactic parse trees augmented by a number of arcs for intersentence
word-word relations such as coreference and taxonomies. These
arcs are also derived from other sources, including Rhetoric Structure and
Speech Act theory. We introduce respective indexing rules that identify inter-
sentence relations and join phrases connected by these relations in the
search index. We propose an algorithm for computing parse thickets from
parse trees. We develop a framework for automatic building and generalizing
of parse thickets. The proposed approach is used for evaluation in the
product search where search queries include multiple sentences. We draw
the comparison for search relevance improvement by pair-wise sentence
generalization and thicket-level generalization.