Improving Lexical Embeddings with Semantic Knowledge

Author

Yu, Mo and Dredze, Mark

Conference

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Year

2014

Figures & Tables

Table 1: Sizes of semantic resources datasets.
Table 4: Results for ranking the quality of PPDB pairs as compared to human judgements.
Table 3: MRR for semantic similarity on PPDB and WordNet dev and test data. Higher is better. All RCM objectives are trained with PPDB XXL. To preserve test data integrity, only the best performing setting of each model is evaluated on the test data.
Table 2: LM evaluation on held out NYT data.
Table 5: MRR on PPDB dev data for training on an increasing number of relations.
Table 6: Effect of learning rate α RCM on MRR for the RCM objective in Joint models.

Table of Contents

  • Abstract
  • 1 Introduction
  • 2 Learning Embeddings
    • 2.1 Word2vec
    • 2.2 Relation Constrained Model
    • 2.3 Joint Model
    • 2.4 Parameter Estimation
  • 3 Evaluation
  • 4 Experiments
    • 4.1 Language Modeling
    • 4.2 Measuring Semantic Similarity
    • 4.3 Human Judgements
    • 4.4 Analysis
  • 5 Conclusion
  • References

References

  •  Yoshua Bengio, Holger Schwenk, Jean-Sébastien Senécal, Fréderic Morin, and Jean-Luc Gauvain.2006. Neural probabilistic language models. In Innovations in Machine Learning, pages 137–186. Springer.
  •  Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle, et al. 2007. Greedy layer-wise training of deep networks. In Neural Information Processing Systems (NIPS).View this Paper
  •  Antoine Bordes, Xavier Glorot, Jason Weston, and Yoshua Bengio. 2012. Joint learning of words and meaning representations for open-text semantic parsing. In International Conference on Artificial Intelligence and Statistics, pages 127–135.View this Paper
  •  Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In International Conference on Machine Learning (ICML).View this Paper
  •  Dumitru Erhan, Yoshua Bengio, Aaron Courville,Pierre-Antoine Manzagol, Pascal Vincent, and Samy Bengio. 2010. Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research (JMLR), 11:625–660.View this Paper
  •  Christiane Fellbaum. 1999. WordNet. Wiley Online Library.
  • 3Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. 2013. PPDB: The paraphrase database. In North American Chapter of the Association for Computational Linguistics (NAACL).View this Paper
  •  Eric H Huang, Richard Socher, Christopher D Manning, and Andrew Y Ng. 2012. Improving word representations via global context and multiple word prototypes. In Association for Computational Linguistics (ACL), pages 873–882.View this Paper
  •  Minh-Thang Luong, Richard Socher, and Christopher D Manning. 2013. Better word representations with recursive neural networks for morphology. In Conference on Natural Language Learning(CoNLL).View this Paper
  • 4Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their composition-ality. arXiv preprint arXiv:1310.4546.View this Paper
  •   Andriy Mnih and Geoffrey Hinton. 2007. Three new graphical models for statistical language modelling. In International Conference on Machine Learning(ICML).View this Paper
  •  Andriy Mnih and Yee Whye Teh. 2012. A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426.View this Paper
  •  Robert Parker, David Graff, Junbo Kong, Ke Chen, and Kazuaki Maeda. 2011. English gigaword fifth edition. Technical report, Linguistic Data Consortium.
  •   Deepak Ravichandran, Patrick Pantel, and Eduard Hovy. 2005. Randomized algorithms and nlp: using locality sensitive hash function for high speed noun clustering. In Association for Computational Linguistics (ACL).View this Paper
  •  Deepak Ravichandran, Patrick Pantel, and Eduard Hovy. 2005. Randomized algorithms and nlp: using locality sensitive hash function for high speed noun clustering. In Association for Computational Linguistics (ACL).View this Paper
  •   Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment tree-bank. In Empirical Methods in Natural Language Processing (EMNLP), pages 1631–1642.View this Paper
  •  Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment tree-bank. In Empirical Methods in Natural Language Processing (EMNLP), pages 1631–1642.View this Paper
  •   Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: a simple and general method for semi-supervised learning. In Association for Computational Linguistics (ACL).View this Paper
  •  Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: a simple and general method for semi-supervised learning. In Association for Computational Linguistics (ACL).View this Paper
  •   Benjamin Van Durme and Ashwin Lall. 2010. Online generation of locality sensitive hash signatures. In Association for Computational Linguistics (ACL),pages 231–235.View this Paper
  •  Benjamin Van Durme and Ashwin Lall. 2010. Online generation of locality sensitive hash signatures. In Association for Computational Linguistics (ACL),pages 231–235.View this Paper
+- Similar Papers (10)
+- Cited by (40)