Positional skipgrams for Bambara: a resource for corpus-based studies
This article presents a new online dataset of linguistically rich n‑gram frequency data for Bambara based on the disambiguated part of the Bambara Reference Corpus. The n‑grams in the dataset are positional skipgrams that capture information about co-occurrence of lexical items with grammatical categories at various relative positions. These n‑grams were constructed with the aim to leverage those types of information that are available in the morphologically annotated corpus of Bambara given the limited amount of textual data. The methodology and data used for constructing n‑grams for Bambara are discussed, followed by brief illustrations of how the positional skipgrams data may be employed in corpus-based linguistic research.