Polyglot: Distributed Word Representations for Multilingual NLP

Author

Al-Rfou, Rami and Perozzi, Bryan and Skiena, Steven

Conference

Proceedings of the Seventeenth Conference on Computational Natural Language Learning

Year

2013

Figures & Tables

Table 2: Statistics of a subset of the languages pro-
Table 6: Accuracy of randomly initialized tagger compared to our results. Using the embed-dings was generally helpful, especially in languages where we did not have many training examples. The scores presented are the best we found for each language (languages with more resources could afford to train longer before overfit--ting).
Table 3: Examples of the nearest five neighbors of every word in several languages. Translation is retrieved from http://translate.google.com.
Table 1: Words nearest neighbors as they appear in the English embeddings.
Table 4: Results of our model against several PoS datasets. The performance is measured using accuracy over the test datasets. Third column represents the total accuracy of the tagger the former two columns reports the accuracy over known words and OOV words (unknown). The results are compared to the
Table 5: Coverage statistics of the embedding’s vocabulary on the part of speech datasets after normalization. Token coverage is the raw percentage of words which were known, while the Word coverage ignores repeated words.
Figure 2: Training and test errors of the French model after 23 days of training. We did not notice any overfitting while training the model. The error curves are smoother the larger the language corpus is.
Figure 1: Neural network architecture. Words are

Table of Contents

  • Abstract
  • 1 Introduction
  • 2 Related Work
  • 3 Distributed Word Representation
  • 4 Corpus Preparation
  • 5 Training
  • 6 Qualitative Analysis
  • 7 Sequence Tagging
  • 8 Conclusion
  • Acknowledgments
  • References

References

  •   Susana Afonso, Eckhard Bick, Renato Haber, and Diana Santos. 2002. Floresta sintá (c) tica”: a treebank for portuguese. In Proc. of the Third Intern. Conf. on Language Resources and Evaluation (LREC), pages 1698–1703.View this Paper
  •  Rami Al-Rfou’ and Steven Skiena. 2012. Speedread:A fast named entity recognition pipeline. In Pro-View this Paper
  •   ceedings of the 24th International Conference on Computational Linguistics (Coling 2012), pages 53–61, Mumbai, India, December. Coling 2012 Organizing Committee.
  •   Eduard Bejček, Jarmila Panevová, Jan Popelka, Pavel Straňák, Magda Ševčı́ková, Jan Štěpánek, and Zdeněk Žabokrtský. 2012. Prague Dependency Treebank 2.5 – a revisited version of PDT 2.0. In Proceedings of COLING 2012, pages 231–246,Mumbai, India, December. The COLING 2012 Organizing Committee.View this Paper
  •   Yoshua Bengio and J-S Senecal. 2008. Adaptive importance sampling to accelerate training of a neural probabilistic language model. Neural Networks,IEEE Transactions on, 19(4):713–722.View this Paper
  •   Y. Bengio, H. Schwenk, J.S. Senécal, F. Morin, and J.L. Gauvain. 2006. Neural probabilistic language models. Innovations in Machine Learning, pages 137–186.
  •   Y. Bengio, J. Louradour, R. Collobert, and J. Weston.2009. Curriculum learning. In International Conference on Machine Learning, ICML.
  •   James Bergstra, Olivier Breuleux, Frédéric Bastien,Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. 2010. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference(SciPy), June. Oral Presentation.View this Paper
  •   John Blitzer, Ryan McDonald, and Fernando Pereira.2006. Domain adaptation with structural correspondence learning. In Conference on Empirical Methods in Natural Language Processing, Sydney, Australia.View this Paper
  •   Léon Bottou. 1991. Stochastic gradient learning in neural networks. In Proceedings of Neuro-Nı̂mes 91, Nimes, France. EC2.
  •   Sabine Brants, Stefanie Dipper, Silvia Hansen, Wolfgang Lezius, and George Smith. 2002. The tiger treebank. In IN PROCEEDINGS OF THE WORKSHOP ON TREEBANKS AND LINGUISTIC THEORIES, pages 24–41.
  •   Thorsten Brants. 2000. Tnt: a statistical part-of-speech tagger. In Proceedings of the sixth conference on Applied natural language processing, pages 224–231. Association for Computational Linguistics.View this Paper
  •   Peter F Brown, Peter V Desouza, Robert L Mercer,Vincent J Della Pietra, and Jenifer C Lai. 1992. Class-based n-gram models of natural language. Computational linguistics, 18(4):467–479.View this Paper
  •   Minmin Chen, Zhixiang Xu, Kilian Weinberger, and Fei Sha. 2012. Marginalized denoising autoen-coders for domain adaptation. In John Langford andView this Paper
  •   ceedings of the 24th International Conference on Computational Linguistics (Coling 2012), pages 53–61, Mumbai, India, December. Coling 2012 Organizing Committee.
  •   Eduard Bejček, Jarmila Panevová, Jan Popelka, Pavel Straňák, Magda Ševčı́ková, Jan Štěpánek, and Zdeněk Žabokrtský. 2012. Prague Dependency Treebank 2.5 – a revisited version of PDT 2.0. In Proceedings of COLING 2012, pages 231–246,Mumbai, India, December. The COLING 2012 Organizing Committee.View this Paper
  •  Yoshua Bengio and J-S Senecal. 2008. Adaptive importance sampling to accelerate training of a neural probabilistic language model. Neural Networks,IEEE Transactions on, 19(4):713–722.View this Paper
  • 2Y. Bengio, H. Schwenk, J.S. Senécal, F. Morin, and J.L. Gauvain. 2006. Neural probabilistic language models. Innovations in Machine Learning, pages 137–186.
  •  Y. Bengio, J. Louradour, R. Collobert, and J. Weston.2009. Curriculum learning. In International Conference on Machine Learning, ICML.
  • 2James Bergstra, Olivier Breuleux, Frédéric Bastien,Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. 2010. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference(SciPy), June. Oral Presentation.View this Paper
  •  John Blitzer, Ryan McDonald, and Fernando Pereira.2006. Domain adaptation with structural correspondence learning. In Conference on Empirical Methods in Natural Language Processing, Sydney, Australia.View this Paper
  •  Léon Bottou. 1991. Stochastic gradient learning in neural networks. In Proceedings of Neuro-Nı̂mes 91, Nimes, France. EC2.
  •   Sabine Brants, Stefanie Dipper, Silvia Hansen, Wolfgang Lezius, and George Smith. 2002. The tiger treebank. In IN PROCEEDINGS OF THE WORKSHOP ON TREEBANKS AND LINGUISTIC THEORIES, pages 24–41.View this Paper
  •  Thorsten Brants. 2000. Tnt: a statistical part-of-speech tagger. In Proceedings of the sixth conference on Applied natural language processing, pages 224–231. Association for Computational Linguistics.View this Paper
  •  Peter F Brown, Peter V Desouza, Robert L Mercer,Vincent J Della Pietra, and Jenifer C Lai. 1992. Class-based n-gram models of natural language. Computational linguistics, 18(4):467–479.View this Paper
  •  Terry Koo, Xavier Carreras, and Michael Collins.2008. Simple semi-supervised dependency parsing. In In Proc. ACL/HLT.View this Paper
  •  Minmin Chen, Zhixiang Xu, Kilian Weinberger, and Fei Sha. 2012. Marginalized denoising autoen-coders for domain adaptation. In John Langford andView this Paper
  •   Joelle Pineau, editors, Proceedings of the 29th International Conference on Machine Learning (ICML12), ICML ’12, pages 767–774. ACM, New York,NY, USA, July.
  •  Yanqing Chen, Bryan Perozzi, Rami Al-Rfou’, and Steven Skiena. 2013. The expressive power of word embeddings. CoRR, abs/1301.3226.View this Paper
  •  R. Collobert and J. Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In International Conference on Machine Learning, ICML.View this Paper
  • 3Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa.2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res., 12:2493–2537,November.View this Paper
  •  Ronan Collobert. 2011. Deep learning for efficient discriminative parsing. In AISTATS.View this Paper
  •  Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen,Matthieu Devin, Quoc Le, Mark Mao, Marc’Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Ng. 2012. Large scale distributed deep networks. In P. Bartlett, F.C.N. Pereira, C.J.C. Burges,L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1232–1240.View this Paper
  •   Joakim Nivre, Johan Hall, Sandra Kübler, Ryan McDonald, Jens Nilsson, Sebastian Riedel, and Deniz Yuret. 2007. The CoNLL 2007 shared task on dependency parsing. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pages 915–932, Prague, Czech Republic, June. Association for Computational Linguistics.View this Paper
  •   Sašo Džeroski, Tomaž Erjavec, Nina Ledinek, Petr Pajas, Zdenek Žabokrtsky, and Andreja Žele. 2006. Towards a slovene dependency treebank. In Proc. of the Fifth Intern. Conf. on Language Resources and Evaluation (LREC).View this Paper
  •  Slav Petrov, Dipanjan Das, and Ryan McDonald. 2012. A universal part-of-speech tagset. In Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard,Joseph Mariani, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Eight International Conference on Language Resources and Evaluation(LREC’12), Istanbul, Turkey, may. European Language Resources Association (ELRA).View this Paper
  •  Xavier Glorot, Antoine Bordes, and Yoshua Bengio.2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the Twenty-eight International Conference on Machine Learning (ICML’11), volume 27,pages 97–110, June.View this Paper
  •   Sameer Pradhan, Alessandro Moschitti, Nianwen Xue,Olga Uryupina, and Yuchen Zhang. 2012. CoNLL2012 shared task: Modeling multilingual unrestricted coreference in OntoNotes. In Proceedings of the Sixteenth Conference on Computational Natural Language Learning (CoNLL 2012), Jeju, Korea.View this Paper
  •   Jan Hajič, Massimiliano Ciaramita, Richard Johansson, Daisuke Kawahara, Maria Antònia Martı́, Lluı́s Màrquez, Adam Meyers, Joakim Nivre, Sebastian Padó, Jan Štěpánek, Pavel Straňák, Mihai Surdeanu,Nianwen Xue, and Yi Zhang. 2009. The CoNLL2009 shared task: Syntactic and semantic dependencies in multiple languages. In Proceedings of the 13th Conference on Computational Natural Language Learning (CoNLL-2009), June 4-5, Boulder,Colorado, USA.View this Paper
  •  Alexandre Klementiev, Ivan Titov, and Binod Bhattarai. 2012. Inducing crosslingual distributed representations of words. In Proceedings of COLING 2012, pages 1459–1474, Mumbai, India, December. The COLING 2012 Organizing Committee.View this Paper
  •  David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 2002. Learning representations by back-propagating errors. Cognitive modeling, 1:213.
  •   Matthias Trautner Kromann. 2003. The danish dependency treebank and the dtag treebank tool. In Proceedings of the Second Workshop on Treebanks and Linguistic Theories (TLT), page 217.View this Paper
  •   Lu Shuxiang. 2004. The Contemporary Chinese Dictionary (Xiandai Hanyu Cidian). Commercial Press.
  •   Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of english: The penn treebank. Computational linguistics, 19(2):313–330.View this Paper
  •   Kiril Simov, Petya Osenova, Milena Slavcheva,Sia Kolkovska, Elisaveta Balabanova, Dimitar Doikoff, Krassimira Ivanova, Er Simov, and Milen Kouylekov. 2002. Building a linguistically interpreted corpus of bulgarian: the bultreebank. In In:Proceedings of LREC 2002, Canary Islands.View this Paper
  •  T. Mikolov, M. Karafiát, L. Burget, J. Cernocky, and S. Khudanpur. 2010. Recurrent neural network based language model. Proceedings of Interspeech.View this Paper
  •  Richard Socher, Eric H. Huang, Jeffrey Pennington,Andrew Y. Ng, and Christopher D. Manning. 2011. Dynamic pooling and unfolding recursive autoen-coders for paraphrase detection. In Advances in Neural Information Processing Systems 24.View this Paper
  •  Andriy Mnih and Geoffrey E Hinton. 2009. A scalable hierarchical distributed language model. Advances in neural information processing systems, 21:1081–1088.View this Paper
  •  Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Y. Ng. 2012. Semantic com-positionality through recursive matrix-vector spaces. In Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing(EMNLP).View this Paper
  •  Frederic Morin and Yoshua Bengio. 2005. Hierarchical probabilistic neural network language model. In Proceedings of the international workshop on artificial intelligence and statistics, pages 246–252.View this Paper
  •  Oscar Täckström, Ryan McDonald, and Jakob Uszkoreit. 2012. Cross-lingual word clusters for direct transfer of linguistic structure. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 477–487. Association for Computational Linguistics.View this Paper
  •  Roberto Navigli and Simone Paolo Ponzetto. 2010. Babelnet: Building a very large multilingual semantic network. In Proceedings of the 48th annual meeting of the association for computational linguistics,pages 216–225. Association for Computational Linguistics.View this Paper
  • 2J. Turian, L. Ratinov, and Y. Bengio. 2010. Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 384–394. Association for Computational Linguistics.View this Paper
  •   Joakim Nivre, Jens Nilsson, and Johan Hall. 2006. Talbanken05: A swedish treebank with phrase structure and dependency annotation. In Proceedings of the fifth International Conference on Language Resources and Evaluation (LREC), pages 1392–1395.View this Paper
  •   Leonoor Van der Beek, Gosse Bouma, Rob Malouf,and Gertjan Van Noord. 2002. The alpino dependency treebank. Language and Computers, 45(1):8–22.View this Paper
+- Similar Papers (10)
+- Cited by (60)