Glove: Global Vectors for Word Representation


Pennington, Jeffrey and Socher, Richard and Manning, Christopher


Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)



Figures & Tables

Figure 4: Overall accuracy on the word analogy task as a function of training time, which is governed by the number of iterations for GloVe and by the number of negative samples for CBOW (a) and skip-gram(b). In all cases, we train 300-dimensional vectors on the same 6B token corpus (Wikipedia 2014 +Gigaword 5) with the same 400,000 word vocabulary, and use a symmetric context window of size 10.
Figure 3: Accuracy on the analogy task for 300--dimensional vectors trained on different corpora.
Table 1: Co-occurrence probabilities for target words ice and steam with selected context words from a 6 billion token corpus. Only in the ratio does noise from non-discriminative words like water and fashion cancel out, so that large values (much greater than 1) correlate well with properties specific to ice, and small values (much less than 1) correlate well with properties specific of steam.
Table 4: F1 score on NER task with 50d vectors. Discrete is the baseline without word vectors. We use publicly-available vectors for HPCA, HSMN,
Table 3: Spearman rank correlation on word similarity tasks. All vectors are 300-dimensional. The CBOW ∗ vectors are from the word2vec website and differ in that they contain phrase vectors.
Figure 1: Weighting function f with α = 3/4.
Figure 2: Accuracy on the analogy task as function of vector size and window size/type. All models are trained on the 6 billion token corpus. In (a), the window size is 10. In (b) and (c), the vector size is 100.

Table of Contents

  • Abstract
  • 1 Introduction
  • 2 Related Work
  • 3 The GloVe Model
    • 3.1 Relationship to Other Models
    • tributions P i and Q i , which we define in analogy
  • 4 Experiments
    • 4.1 Evaluation methods
    • 4.2 Corpora and training details
    • 4.3 Results
    • 4.4 Model Analysis: Vector Length and Context Size
    • 4.5 Model Analysis: Corpus Size
    • 4.6 Model Analysis: Run-time
    • 4.7 Model Analysis: Comparison with word2vec
  • 5 Conclusion
  • Acknowledgments
  • References


