• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Book

Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP 2019

The RepEval series of workshops started in the midst of a boom of word embeddings with the goals of promoting new benchmarks for vector space meaning representations, highlighting the issues with existing benchmarks and improving on them. In addition to proposals for new evaluation tasks, it has played an important role by providing an outlet for critical analysis, negative results, and methodological caveats (reproducibility, parameters impact, the issue of attribution of results to the representation or the whole system, dataset structure/balance/representativeness). Three years later, mainstream NLP is switching to contextualized representations, but we are still facing many of the same issues: reliable intrinsic metrics are scarce, which means that we rarely know what features of representations make them successful for a given downstream task. This makes development of new meaning representations and their fine-tuning a slow and expensive process with too many variables - even more so than before. The 3rd edition of RepEval aims to foster the discussion of the following issues: • approaches to intrinsic and extrinsic evaluation of all kinds of distributional meaning representations; • evaluation motivated by linguistic, psycholinguistic or neurological evidence, its predictive power, and interpretability of meaning representations; • the (in)stability of vector representations, best practices for reproducible and reliable experiments; • evaluation of representations at subword level, especially for morphologically complex languages; • evaluation of phrase, sentence, paragraph and document-level representations: evidence of compositionality, further diagnostic tests, and how much the preservation of abstract syntactic information actually contributes to performance; • formal analysis of properties of embedding spaces and their impact on downstream tasks; • the contribution of representations per se vs. other modeling choices to system performance in extrinsic evaluations; • validation of evaluation methodology and findings in cross-lingual studies; • specialized vs general-purpose representations, and whether the latter have inherent limits in downstream tasks; • internal states of end-to-end systems as meaning representations, and ways to make more sense of them. In the long run, the methodological and practical contributions of RepEval will add to the discussions on what kinds of representations work best for what tasks, how we can interpret and reliably optimize them, and to what extent it is possible to create cross-task meaning representations that would be necessary for general AI.

Chapters
Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP 2019