Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources


Vulić, Ivan and Glavaš, Goran and Mrkšić, Nikola and Korhonen, Anna


Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)



Figures & Tables

Figure 3 : DST labels (user goals given by slot-value pairs) in a multi-turn dialogue (Mrkšić et al., 2015).
Table 2 : Post-specialisation applied to two other post-processing methods. SL: SimLex; SV:SimVerb. Hold-out setting. NONLINEAR - MM .
Figure 2 : The results of the hold-out experiments on SimLex-999 and SimVerb-3500 after applying our non-linear vector space transformation with different depths (hidden layer size H, see Fig. 1b). The results are presented as averages over 20 runs with the NONLINEAR - MM variant, the shaded regions are spanned by the maximum and minimum scores obtained. Thick horizontal lines refer to Spearman’s rank correlations achieved in the initial space X d . H = 0 denotes the standard linear regression model(Mikolov et al., 2013a; Lazaridou et al., 2015) ( LINEAR - MM shown since it outperforms LINEAR - MSE ).
Figure 1 : (a) High-level illustration of the post-specialisation approach: the subspace X s of the initial
Table 5 : Lexical simplification performance with post-specialisation applied on three input spaces.
Table 3 : DST results in two evaluation settings(hold-out and all) with different GLOVE variants.
Table 1 : Spearman’s ρ correlation scores for three word vector collections on two English word similarity datasets, SimLex-999 (SL) and SimVerb-3500 (SV), using different mapping variants, evaluation protocols,and word vector spaces: from the initial distributional space X d to the fully specialised space X f . H = 5.
Table 4 : Results on word similarity (Spearman’s ρ) and DST (joint goal accuracy) for German and Italian.

Table of Contents

  • Abstract
  • 1 Introduction
  • 2 Related Work and Motivation
  • 3 Methodology: Post-Specialisation
    • 3.1 Initial Specialisation Model: AR
  • 4 Experimental Setup
  • 5 Results and Discussion
    • 5.1 Intrinsic Evaluation: Word Similarity
  • Post-Specialisation with Other Post-Processors
    • 5.2 Downstream Task I: DST
    • 5.3 Downstream Task II: Lexical Simplification
  • 6 Conclusion and Future Work
  • Acknowledgments
  • References
  • 5:135–146. Elia Bruni, Nam-Khanh Tran, and Marco Baroni. 2014.
  • 3:211–225. Sihan Li, Jiantao Jiao, Yanjun Han, and Tsachy


  •   Rami Al-Rfou, Bryan Perozzi, and Steven Skiena.2013. Polyglot: Distributed word representations for multilingual NLP. In Proceedings of CoNLL, pages 183–192.View this Paper
  •   Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2016. Learning principled bilingual mappings of word em-beddings while preserving monolingual invariance. In Proceedings of EMNLP, pages 2289–2294.View this Paper
  •   Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2017. Learning bilingual word embeddings with (almost)no bilingual data. In Proceedings of ACL, pages 451–462.View this Paper
  •   Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2014. Tailoring continuous word representations for dependency parsing. In Proceedings of ACL, pages 809–815.View this Paper
  •   Jiang Bian, Bin Gao, and Tie-Yan Liu. 2014. Knowledge-powered deep learning for word embedding. In Proceedings of ECML-PKDD, pages 132–148.View this Paper
  •   Piotr Bojanowski, Edouard Grave, Armand Joulin, and
  •   Tomas Mikolov. 2017. Enriching word vectors with
  •   subword information. Transactions of the ACL,
+- Similar Papers (10)