• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Book chapter

Term extraction for constructing subject index of educational scientific text

P. 143-152.
Bolshakova E. I., Ivanov K.

Subject index, or back-of-the-book index, is a device intended to provide an easy access to relevant fragments of a text document. Subject indexes usually contain particular single-word and multi-word terms from the corresponding documents. Such indexes are especially useful for reading large documents with specialized terminology, as well as educational texts in difficult scientific and technical areas. The central problem of back-of-the-book indexing is recognition of terms to be included into the index. The paper describes a method developed for extracting and filtering terms from a given educational scientific text, with the purpose of reliable term selection in computer indexing systems. The method is primarily based on rules with lexico-syntactic patterns representing linguistic information about terms and typical contexts of their usage in Russian scientific and educational texts; simple occurrences statistics of terms is used as well.  Experimental evaluation of the method has shown a considerable increase of precision and recall of term extraction compared with the widely-used standard techniques.

In book

Вып. 17(24). М.: Издательский центр «Российский государственный гуманитарный университет», 2018.