Finding Maximal Common Sub-parse Thickets for Multi-sentence Search

B. Galitsky; D. Ilvovsky; S. Kuznetsov; F. V. Strok

?

Finding Maximal Common Sub-parse Thickets for Multi-sentence Search

P. 39–57.

Galitsky B., Ilvovsky D., Kuznetsov S., Strok F. V.

We develop a graph representation and learning technique for parse structures for paragraphs of text. We introduce Parse Thicket (PT) as a set of syntactic parse trees augmented by a number of arcs for inter-sentence word-word relations such as co-reference and taxonomic relations. These arcs are also derived from other sources, including Speech Act and Rhetoric Structure theories. We provide a detailed illustration of how PTs are built from parse trees and generalized as phrases by computing maximal common subgraphs. The proposed approach is subject to evaluation in the product search and recommendation domain, where search queries include multiple sentences. We draw the comparison for search relevance improvement by pair-wise sentence generalization, phrase-level generalization, and generalizations of PTs as graphs.

Language: English

Full text

Text on another site

Keywords: graph representation of text parse thickets multi-sentence search

Publication based on the results of:

Mathematical models, algorithms and software for data mining in the text and the structural form (2014)

In book

Graph Structures for Knowledge Representation and Reasoning Third International Workshop, GKR 2013, Beijing, China, August 3, 2013. Revised Selected Papers Editors: Madalina Croitoru, Sebastian Rudolph, Stefan Woltran, Christophe Gonzales. Springer International Publishing. 2014.

Berlin: Springer, 2014.

Improving Text Retrieval Efficiency with Pattern Structures on Parse Thickets

Kuznetsov S., Strok F. V., Ilvovsky D. et al., , in: Proceedings of the Workshop Formal Concept Analysis Meets Information RetrievalVol. 977.: M.: CEUR Workshop Proceedings, 2013. P. 6–21.

We develop a graph representation and learning technique for parse structures for paragraphs of text. We introduce Parse Thicket (PT) as a sum of syntactic parse trees augmented by a number of arcs for inter-sentence word-word relations such as co-reference and taxonomic relations. These arcs are also derived from other sources, including Speech Act and ...

Added: November 18, 2013