Glove: Global Vectors for Word Representation

Author

Pennington, Jeffrey and Socher, Richard and Manning, Christopher

Conference

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Year

2014

Figures & Tables

Figure 4: Overall accuracy on the word analogy task as a function of training time, which is governed by the number of iterations for GloVe and by the number of negative samples for CBOW (a) and skip-gram(b). In all cases, we train 300-dimensional vectors on the same 6B token corpus (Wikipedia 2014 +Gigaword 5) with the same 400,000 word vocabulary, and use a symmetric context window of size 10.
Figure 3: Accuracy on the analogy task for 300--dimensional vectors trained on different corpora.
Table 1: Co-occurrence probabilities for target words ice and steam with selected context words from a 6 billion token corpus. Only in the ratio does noise from non-discriminative words like water and fashion cancel out, so that large values (much greater than 1) correlate well with properties specific to ice, and small values (much less than 1) correlate well with properties specific of steam.
Table 4: F1 score on NER task with 50d vectors. Discrete is the baseline without word vectors. We use publicly-available vectors for HPCA, HSMN,
Table 3: Spearman rank correlation on word similarity tasks. All vectors are 300-dimensional. The CBOW ∗ vectors are from the word2vec website and differ in that they contain phrase vectors.
Figure 1: Weighting function f with α = 3/4.
Figure 2: Accuracy on the analogy task as function of vector size and window size/type. All models are trained on the 6 billion token corpus. In (a), the window size is 10. In (b) and (c), the vector size is 100.

Table of Contents

  • Abstract
  • 1 Introduction
  • 2 Related Work
  • 3 The GloVe Model
    • 3.1 Relationship to Other Models
    • tributions P i and Q i , which we define in analogy
  • 4 Experiments
    • 4.1 Evaluation methods
    • 4.2 Corpora and training details
    • 4.3 Results
    • 4.4 Model Analysis: Vector Length and Context Size
    • 4.5 Model Analysis: Corpus Size
    • 4.6 Model Analysis: Run-time
    • 4.7 Model Analysis: Comparison with word2vec
  • 5 Conclusion
  • Acknowledgments
  • References

References

  •  Tom M. Apostol. 1976. Introduction to Analytic Number Theory. Introduction to Analytic Number Theory.
  •  Marco Baroni, Georgiana Dinu, and Germán Kruszewski. 2014. Don’t count, predict! A systematic comparison of context-counting vs.context-predicting semantic vectors. In ACL.View this Paper
  •  Yoshua Bengio. 2009. Learning deep architectures for AI. Foundations and Trends in Machine Learning.View this Paper
  •  Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A neural probabilistic language model. JMLR, 3:1137–1155.View this Paper
  •  John A. Bullinaria and Joseph P. Levy. 2007. Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods, 39(3):510–526.
  •  Dan C. Ciresan, Alessandro Giusti, Luca M. Gambardella, and Jürgen Schmidhuber. 2012. Deep neural networks segment neuronal membranes in electron microscopy images. In NIPS, pages 2852–2860.View this Paper
  • 2Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of ICML, pages 160–167.View this Paper
  •  Ronan Collobert, Jason Weston, Léon Bottou,Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural Language Processing (Almost) from Scratch. JMLR, 12:2493–2537.View this Paper
  •   Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41.View this Paper
  •  John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. JMLR, 12.View this Paper
  •  Lev Finkelstein, Evgenly Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman,and Eytan Ruppin. 2001. Placing search in context: The concept revisited. In Proceedings of the 10th international conference on World Wide Web, pages 406–414. ACM.View this Paper
  • 2Eric H. Huang, Richard Socher, Christopher D. Manning, and Andrew Y. Ng. 2012. ImprovingView this Paper
  •   Word Representations via Global Context and Multiple Word Prototypes. In ACL.
  •   Rémi Lebret and Ronan Collobert. 2014. Word embeddings through Hellinger PCA. In EACL.View this Paper
  •  Omer Levy, Yoav Goldberg, and Israel RamatGan. 2014. Linguistic regularities in sparse and explicit word representations. CoNLL-2014.View this Paper
  •  Kevin Lund and Curt Burgess. 1996. Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instrumentation, and Computers, 28:203–208.
  • 2Minh-Thang Luong, Richard Socher, and Christopher D Manning. 2013. Better word representations with recursive neural networks for morphology. CoNLL-2013.View this Paper
  • 4Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient Estimation of Word Representations in Vector Space. In ICLR Workshop Papers.View this Paper
  •   Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013b. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111–3119.View this Paper
  • 2Tomas Mikolov, Wen tau Yih, and Geoffrey Zweig. 2013c. Linguistic regularities in continuous space word representations. In HLTNAACL.View this Paper
  •  George A. Miller and Walter G. Charles. 1991. Contextual correlates of semantic similarity. Language and cognitive processes, 6(1):1–28.
  • 2Andriy Mnih and Koray Kavukcuoglu. 2013. Learning word embeddings efficiently with noise-contrastive estimation. In NIPS.View this Paper
  •  Douglas L. T. Rohde, Laura M. Gonnerman,and David C. Plaut. 2006. An improved
  •   model of semantic similarity based on lexical co-occurence. Communications of the ACM,8:627–633.
  •   Herbert Rubenstein and John B. Goodenough.
  •   1965. Contextual correlates of synonymy. Com-
  •   munications of the ACM, 8(10):627–633.
  •  Fabrizio Sebastiani. 2002. Machine learning in automated text categorization. ACM Computing Surveys, 34:1–47.View this Paper
  •  Richard Socher, John Bauer, Christopher D. Manning, and Andrew Y. Ng. 2013. Parsing With Compositional Vector Grammars. In ACL.View this Paper
  •  Stefanie Tellex, Boris Katz, Jimmy Lin, Aaron Fernandes, and Gregory Marton. 2003. Quantitative evaluation of passage retrieval algorithms for question answering. In Proceedings of the SIGIR Conference on Research and Development in Informaion Retrieval.View this Paper
  •   Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In CoNLL-2003.View this Paper
  • 2Joseph Turian, Lev Ratinov, and Yoshua Bengio.2010. Word representations: a simple and general method for semi-supervised learning. In Proceedings of ACL, pages 384–394.View this Paper
  •  2Mengqiu Wang and Christopher D. Manning.2013. Effect of non-linear deep architecture in sequence labeling. In Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP).View this Paper
+- Similar Papers (10)
+- Cited by (1067)