Difference between revisions of "Natural Language Processing"

From Ioannis Kourouklides
Jump to navigation Jump to search
(13 intermediate revisions by the same user not shown)
Line 26: Line 26:
 
** Lemmatisation
 
** Lemmatisation
 
** Word embeddings / Word vectors
 
** Word embeddings / Word vectors
*** word2Vec Model
+
*** Word2Vec Model
 
**** Continuous Skip-gram
 
**** Continuous Skip-gram
 
**** Continuous Bag-of-Words (CBOW)
 
**** Continuous Bag-of-Words (CBOW)
Line 49: Line 49:
 
* Question-Answering System
 
* Question-Answering System
 
* Machine Translation
 
* Machine Translation
 +
* Text summarization
  
 
==Online Courses==
 
==Online Courses==
Line 65: Line 66:
 
* [http://www.ee.columbia.edu/~stanchen/spring16/e6870/outline.html Speech Recognition by Bhuvana Ramabhadran, Markus Nussbaum-Thom, Michael Picheny and Stanley F. Chen]
 
* [http://www.ee.columbia.edu/~stanchen/spring16/e6870/outline.html Speech Recognition by Bhuvana Ramabhadran, Markus Nussbaum-Thom, Michael Picheny and Stanley F. Chen]
 
* [https://sites.google.com/site/gothnlp/links/lecture-notes Natural Language Processing with NLTK]
 
* [https://sites.google.com/site/gothnlp/links/lecture-notes Natural Language Processing with NLTK]
 +
* [http://web.stanford.edu/class/cs224u/ CS224U: Natural Language Understanding by Bill MacCartney and Christopher Potts]
  
 
==Books==
 
==Books==
Line 100: Line 102:
 
* Zeyer, A., Irie, K., Schlüter, R., & Ney, H. (2018). Improved Training of End-to-end Attention Models for Speech Recognition. In ''Proceedings of Interspeech'' (pp. 7-11).
 
* Zeyer, A., Irie, K., Schlüter, R., & Ney, H. (2018). Improved Training of End-to-end Attention Models for Speech Recognition. In ''Proceedings of Interspeech'' (pp. 7-11).
 
* Toshniwal, S., Tang, H., Lu, L., & Livescu, K. (2017). Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition. In ''Proceedings of Interspeech'' (pp. 3532-3536).
 
* Toshniwal, S., Tang, H., Lu, L., & Livescu, K. (2017). Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition. In ''Proceedings of Interspeech'' (pp. 3532-3536).
 +
* Kim, S., Hori, T., & Watanabe, S. (2017). Joint CTC-attention based end-to-end speech recognition using multi-task learning. In ''Proceedings of the [https://en.wikipedia.org/wiki/International_Conference_on_Acoustics,_Speech,_and_Signal_Processing IEEE International Conference on Acoustics, Speech and Signal Processing]'' (pp. 4835-4839).
 
* Wen, T. H., Gasic, M., Mrksic, N., Su, P. H., Vandyke, D., & Young, S. (2015). Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems. ''arXiv preprint arXiv:1508.01745''.
 
* Wen, T. H., Gasic, M., Mrksic, N., Su, P. H., Vandyke, D., & Young, S. (2015). Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems. ''arXiv preprint arXiv:1508.01745''.
 
* Sethu, V., Epps, J., & Ambikairajah, E. (2015). Speech based emotion recognition. In ''Speech and Audio Processing for Coding, Enhancement and Recognition'' (pp. 197-228). Springer.
 
* Sethu, V., Epps, J., & Ambikairajah, E. (2015). Speech based emotion recognition. In ''Speech and Audio Processing for Coding, Enhancement and Recognition'' (pp. 197-228). Springer.
Line 109: Line 112:
 
==Software==
 
==Software==
 
===Speech Recognition===
 
===Speech Recognition===
 +
* [https://nvidia.github.io/OpenSeq2Seq/ OpenSeq2Seq] - Python with TensorFlow
 +
* [https://github.com/mozilla/DeepSpeech DeepSpeech] - Python with TensorFlow
 
* [https://pypi.org/project/SpeechRecognition/ SpeechRecognition] - Python library for performing speech recognition, with support for several engines and APIs, online and offline
 
* [https://pypi.org/project/SpeechRecognition/ SpeechRecognition] - Python library for performing speech recognition, with support for several engines and APIs, online and offline
 
* [http://kaldi-asr.org/ Kaldi] - C++
 
* [http://kaldi-asr.org/ Kaldi] - C++
 
* [https://cmusphinx.github.io/ CMUSphinx] - Open Source Speech Recognition Toolkit
 
* [https://cmusphinx.github.io/ CMUSphinx] - Open Source Speech Recognition Toolkit
 
* [https://github.com/mozilla/DeepSpeech DeepSpeech] - Python
 
* [https://github.com/mozilla/DeepSpeech DeepSpeech] - Python
* [https://github.com/facebookresearch/wav2letter wav2letter] - Lua
+
* [https://github.com/facebookresearch/wav2letter wav2letter] - C++ (and previously in Lua)
 
* [https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API Web Speech API (Mozilla)] - Speech Synthesis (Text-to-Speech) and Speech Recognition (Asynchronous Speech Recognition)
 
* [https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API Web Speech API (Mozilla)] - Speech Synthesis (Text-to-Speech) and Speech Recognition (Asynchronous Speech Recognition)
 
* [https://w3c.github.io/speech-api/speechapi.html W3C Web Speech API] - JavaScript
 
* [https://w3c.github.io/speech-api/speechapi.html W3C Web Speech API] - JavaScript
Line 123: Line 128:
  
 
===Miscellaneous===
 
===Miscellaneous===
 +
* [https://nvidia.github.io/OpenSeq2Seq/ OpenSeq2Seq] - Python with TensorFlow
 +
* [https://allennlp.org/ AllenNLP] - Python with PyTorch
 
* [http://www.nltk.org/ NLTK] - Python
 
* [http://www.nltk.org/ NLTK] - Python
 
* [https://radimrehurek.com/gensim/ gensim] - Python
 
* [https://radimrehurek.com/gensim/ gensim] - Python
Line 128: Line 135:
 
* [https://github.com/eellak/nlpbuddy nlpbuddy] - Python
 
* [https://github.com/eellak/nlpbuddy nlpbuddy] - Python
 
* [http://mallet.cs.umass.edu/ MALLET] - Java
 
* [http://mallet.cs.umass.edu/ MALLET] - Java
 +
* [https://github.com/plasticityai/magnitude magnitude] - Python
 
* [https://pypi.org/project/gTTS/ gTTS (Google Text-to-Speech)] -  Python library and CLI tool to interface with Google Translate's text-to-speech API
 
* [https://pypi.org/project/gTTS/ gTTS (Google Text-to-Speech)] -  Python library and CLI tool to interface with Google Translate's text-to-speech API
 
* [https://www.idiap.ch/software/bob/docs/bob/bob.bio.spear/stable/index.html#bob-bio-spear SPEAR: A Speaker Recognition Toolkit based on Bob] - Python
 
* [https://www.idiap.ch/software/bob/docs/bob/bob.bio.spear/stable/index.html#bob-bio-spear SPEAR: A Speaker Recognition Toolkit based on Bob] - Python
Line 153: Line 161:
  
 
==Other Resources==
 
==Other Resources==
* [https://scholar.google.com/citations?view_op=top_venues&hl=en&vq=eng_computationallinguistics Computational Linguistics] - Google Scholar Metrics (Top Publications)
+
*[https://scholar.google.com/citations?view_op=top_venues&hl=en&vq=eng_computationallinguistics Computational Linguistics] - Google Scholar Metrics (Top Publications)
 +
*[https://rajpurkar.github.io/SQuAD-explorer/ SQuAD] - The Stanford Question Answering Dataset
 
*[https://github.com/keon/awesome-nlp Awesome-NLP] - Github
 
*[https://github.com/keon/awesome-nlp Awesome-NLP] - Github
 
*[https://github.com/talolard/seq2seq_learn seq2seq_learn] - Github
 
*[https://github.com/talolard/seq2seq_learn seq2seq_learn] - Github
Line 234: Line 243:
 
*[https://github.com/A2Zadeh/CMU-MultimodalSDK CMU-MOSI] - dataset
 
*[https://github.com/A2Zadeh/CMU-MultimodalSDK CMU-MOSI] - dataset
 
*[http://web.eecs.umich.edu/~mihalcea/downloads.html#MOUD MOUD] - dataset
 
*[http://web.eecs.umich.edu/~mihalcea/downloads.html#MOUD MOUD] - dataset
 +
  
 
[[Category:Machine Learning]]
 
[[Category:Machine Learning]]

Revision as of 18:32, 3 July 2019

This page contains resources about Natural Language Processing, Text Mining, Speech Processing, Audio Signal Processing and Computational Linguistics.

Subfields and Concepts

  • Vector Space Model (VSM)
  • Latent Semantic Indexing
  • Latent Semantic Analysis
  • Latent Dirichlet Allocation (LDA)
  • Attention Mechanism
    • Transformer
    • Set Transformer
  • Speaker Recognition
  • Speaker Identification / Speaker Diarization
    • Speaker Segmentation
    • Speaker Clustering
  • Speech Synthesis / Text-to-Speech
  • Speech Recognition / Voice Recognition / Speech-to-Text / Transcription
    • Conversational Speech
    • Voice Dictation
    • Voice Commands
  • Audio Captioning / Subtitling
  • Automatic Lyrics Recognition
  • Topic Model
  • Text Preprocessing
    • Tokenization
    • Stemming
    • Lemmatisation
    • Word embeddings / Word vectors
      • Word2Vec Model
        • Continuous Skip-gram
        • Continuous Bag-of-Words (CBOW)
      • GloVe
      • FastText
    • Bag-of-Words (BoW) Model
    • N-grams
      • Unigrams
      • Bigrams
  • Term Frequency - Inverse Document Frequency (TF-IDF)
  • Sequence-to-Sequence (seq2seq) Model
  • Dynamic Memory Network (a specific architecture of Artificial Neural Networks)
  • Sequence Tagging
  • Natural Language Understanding (NLU)
  • Natural Language Generation (NLG)
  • Named-Entity Recognition (NER)
  • (Natural Language) Semantic Analysis
  • Sentiment Analysis
  • Emotion Recognition
  • Diacritization (e.g. in Hebrew or Arabic)
  • Chatbots
  • Question-Answering System
  • Machine Translation
  • Text summarization

Online Courses

Video Lectures

Lecture Notes

Books

Natural Language Processing

  • Arumugam, R., & Shanmugamani, R. (2018). Hands-On Natural Language Processing with Python. Packt Publishing.
  • Srinivasa-Desikan, B. (2018). Natural Language Processing and Computational Linguistics. Packt Publishing.
  • Goldberg, Y., & (2017). Neural Network Methods for Natural Language Processing. Morgan & Claypool Publishers.
  • Silge, J., & Robinson, D. (2017). Text mining with R: A tidy approach. O'Reilly Media, Inc.
  • Sarkar, D. (2016). Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from Your Data. Apress. (link)
  • Mihalcea, R., & Radev, D. (2011). Graph-based natural language processing and information retrieval. Cambridge University Press.
  • Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. O'Reilly Media, Inc. (link)
  • Zhai, C. (2008). Statistical language models for information retrieval. Morgan and Claypool Publishers.
  • Tiwary, U. S., & Siddiqui, T. (2008). Natural language processing and information retrieval. Oxford University Press.
  • Manning, C. D., & Schutze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press.

Speech Processing

  • Rabiner, L. R., & Schafer, R. W. (2011). Theory and applications of digital speech processing. Pearson.
  • Gold, B., Morgan, N., & Ellis, D. (2011). Speech and audio signal processing: processing and perception of speech and music. 2nd Ed. John Wiley & Sons.
  • Mitra, S. K., & Kuo, Y. (2010). Digital signal processing: a computer-based approach. 4th Ed. McGraw-Hill.
  • Spanias, A., Painter, T., & Atti, V. (2006). Audio signal processing and coding. John Wiley & Sons.
  • Huang, X., Acero, A., Hon, H. W., & Reddy, R. (2001). Spoken language processing: A guide to theory, algorithm, and system development. Prentice Hall.
  • Quatieri, T. F. (2001). Discrete-time speech signal processing: principles and practice. Pearson.
  • Holmes, J., & Holmes, W. (2001). Speech Synthesis and Recognition. 2nd Ed. CRC press.
  • Huang, X., Acero, A., Hon, H. W., & Reddy, R. (2001). Spoken language processing: A guide to theory, algorithm, and system development. Prentice Hall.
  • Jelinek, F. (1998). Statistical methods for speech recognition. MIT Press.
  • Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Prentice Hall.

Speech and Natural Language Processing (both)

  • Jurafsky, D., & Martin, J. H. (2000). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall. (link)

Scholarly articles

  • Park, D. S., Chan, W., Zhang, Y., Chiu, C. C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019). SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. arXiv preprint arXiv:1904.08779.
  • Das, A., Li, J., Zhao, R., & Gong, Y. (2018). Advancing connectionist temporal classification with attention modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4769-4773).
  • Moriya, T., Ueno, S., Shinohara, Y., Delcroix, M., Yamaguchi, Y., & Aono, Y. (2018). Multi-task Learning with Augmentation Strategy for Acoustic-to-word Attention-based Encoder-decoder Speech Recognition. In Proceedings of Interspeech (pp. 2399-2403).
  • Zeyer, A., Irie, K., Schlüter, R., & Ney, H. (2018). Improved Training of End-to-end Attention Models for Speech Recognition. In Proceedings of Interspeech (pp. 7-11).
  • Toshniwal, S., Tang, H., Lu, L., & Livescu, K. (2017). Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition. In Proceedings of Interspeech (pp. 3532-3536).
  • Kim, S., Hori, T., & Watanabe, S. (2017). Joint CTC-attention based end-to-end speech recognition using multi-task learning. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4835-4839).
  • Wen, T. H., Gasic, M., Mrksic, N., Su, P. H., Vandyke, D., & Young, S. (2015). Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems. arXiv preprint arXiv:1508.01745.
  • Sethu, V., Epps, J., & Ambikairajah, E. (2015). Speech based emotion recognition. In Speech and Audio Processing for Coding, Enhancement and Recognition (pp. 197-228). Springer.
  • Blei, D. M. (2012). Probabilistic Topic Models. Communications of the ACM, 55(4), 77-84.

Tutorials

Software

Speech Recognition

Miscellaneous

See also

Other Resources