Difference between revisions of "Natural Language Processing"

From Ioannis Kourouklides
Jump to navigation Jump to search
(47 intermediate revisions by 2 users not shown)
Line 5: Line 5:
 
* Latent Semantic Indexing
 
* Latent Semantic Indexing
 
* Latent Semantic Analysis
 
* Latent Semantic Analysis
* Latent Dirichlet Allocation
+
* Latent Dirichlet Allocation (LDA)
 
* Attention Mechanism
 
* Attention Mechanism
 
** Transformer
 
** Transformer
Line 22: Line 22:
 
* Topic Model
 
* Topic Model
 
* Text Preprocessing
 
* Text Preprocessing
 +
** Tokenization
 
** Stemming
 
** Stemming
 
** Lemmatisation
 
** Lemmatisation
 
** Word embeddings / Word vectors
 
** Word embeddings / Word vectors
*** word2Vec Model
+
*** Word2Vec Model
 
**** Continuous Skip-gram
 
**** Continuous Skip-gram
 
**** Continuous Bag-of-Words (CBOW)
 
**** Continuous Bag-of-Words (CBOW)
Line 47: Line 48:
 
* Chatbots
 
* Chatbots
 
* Question-Answering System
 
* Question-Answering System
 +
* Machine Translation
 +
* Text summarization
  
 
==Online Courses==
 
==Online Courses==
Line 60: Line 63:
 
===Lecture Notes===
 
===Lecture Notes===
 
* [https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-864-advanced-natural-language-processing-fall-2005/index.htm Advanced Language Processing by Michael Collins and Regina Barzilay] - MIT Opencourseware
 
* [https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-864-advanced-natural-language-processing-fall-2005/index.htm Advanced Language Processing by Michael Collins and Regina Barzilay] - MIT Opencourseware
 +
* [https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-345-automatic-speech-recognition-spring-2003/ Automatic Speech Recognition by James Glass and Victor Zue] - MIT Opencourseware
 
* [http://www.ee.columbia.edu/~stanchen/spring16/e6870/outline.html Speech Recognition by Bhuvana Ramabhadran, Markus Nussbaum-Thom, Michael Picheny and Stanley F. Chen]
 
* [http://www.ee.columbia.edu/~stanchen/spring16/e6870/outline.html Speech Recognition by Bhuvana Ramabhadran, Markus Nussbaum-Thom, Michael Picheny and Stanley F. Chen]
 
* [https://sites.google.com/site/gothnlp/links/lecture-notes Natural Language Processing with NLTK]
 
* [https://sites.google.com/site/gothnlp/links/lecture-notes Natural Language Processing with NLTK]
 +
* [http://web.stanford.edu/class/cs224u/ CS224U: Natural Language Understanding by Bill MacCartney and Christopher Potts]
  
 
==Books==
 
==Books==
Line 70: Line 75:
 
* Silge, J., & Robinson, D. (2017). ''Text mining with R: A tidy approach''. O'Reilly Media, Inc.
 
* Silge, J., & Robinson, D. (2017). ''Text mining with R: A tidy approach''. O'Reilly Media, Inc.
 
* Sarkar, D. (2016). ''Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from Your Data''. Apress. ([https://github.com/dipanjanS/text-analytics-with-python link])
 
* Sarkar, D. (2016). ''Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from Your Data''. Apress. ([https://github.com/dipanjanS/text-analytics-with-python link])
 +
* Mihalcea, R., & Radev, D. (2011). ''Graph-based natural language processing and information retrieval''. Cambridge University Press.
 
* Bird, S., Klein, E., & Loper, E. (2009). ''Natural language processing with Python: analyzing text with the natural language toolkit''. O'Reilly Media, Inc. ([http://www.nltk.org/book/ link])
 
* Bird, S., Klein, E., & Loper, E. (2009). ''Natural language processing with Python: analyzing text with the natural language toolkit''. O'Reilly Media, Inc. ([http://www.nltk.org/book/ link])
 +
* Zhai, C. (2008). ''Statistical language models for information retrieval''. Morgan and Claypool Publishers.
 +
* Tiwary, U. S., & Siddiqui, T. (2008). ''Natural language processing and information retrieval''. Oxford University Press.
 
* Manning, C. D., & Schutze, H. (1999). ''Foundations of Statistical Natural Language Processing''. MIT Press.
 
* Manning, C. D., & Schutze, H. (1999). ''Foundations of Statistical Natural Language Processing''. MIT Press.
  
Line 78: Line 86:
 
* Mitra, S. K., & Kuo, Y. (2010). ''Digital signal processing: a computer-based approach''. 4th Ed. McGraw-Hill.
 
* Mitra, S. K., & Kuo, Y. (2010). ''Digital signal processing: a computer-based approach''. 4th Ed. McGraw-Hill.
 
* Spanias, A., Painter, T., & Atti, V. (2006). ''Audio signal processing and coding''. John Wiley & Sons.
 
* Spanias, A., Painter, T., & Atti, V. (2006). ''Audio signal processing and coding''. John Wiley & Sons.
 +
* Huang, X., Acero, A., Hon, H. W., & Reddy, R. (2001). ''Spoken language processing: A guide to theory, algorithm, and system development''. Prentice Hall.
 
* Quatieri, T. F. (2001). ''Discrete-time speech signal processing: principles and practice''. Pearson.
 
* Quatieri, T. F. (2001). ''Discrete-time speech signal processing: principles and practice''. Pearson.
 
* Holmes, J., & Holmes, W. (2001). ''Speech Synthesis and Recognition''. 2nd Ed. CRC press.
 
* Holmes, J., & Holmes, W. (2001). ''Speech Synthesis and Recognition''. 2nd Ed. CRC press.
Line 85: Line 94:
  
 
===Speech and Natural Language Processing (both)===
 
===Speech and Natural Language Processing (both)===
* Jurafsky, D., & Martin, J. H. (2000). ''Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition''. Prentice Hall.
+
* Jurafsky, D., & Martin, J. H. (2000). ''Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition''. Prentice Hall. ([https://web.stanford.edu/~jurafsky/slp3/ link])
  
 
==Scholarly articles==
 
==Scholarly articles==
 +
* Park, D. S., Chan, W., Zhang, Y., Chiu, C. C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019). SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. ''arXiv preprint arXiv:1904.08779''.
 +
* Das, A., Li, J., Zhao, R., & Gong, Y. (2018). Advancing connectionist temporal classification with attention modeling. In ''Proceedings of the [https://en.wikipedia.org/wiki/International_Conference_on_Acoustics,_Speech,_and_Signal_Processing IEEE International Conference on Acoustics, Speech and Signal Processing]'' (pp. 4769-4773).
 +
* Moriya, T., Ueno, S., Shinohara, Y., Delcroix, M., Yamaguchi, Y., & Aono, Y. (2018). Multi-task Learning with Augmentation Strategy for Acoustic-to-word Attention-based Encoder-decoder Speech Recognition. In ''Proceedings of Interspeech'' (pp. 2399-2403).
 +
* Zeyer, A., Irie, K., Schlüter, R., & Ney, H. (2018). Improved Training of End-to-end Attention Models for Speech Recognition. In ''Proceedings of Interspeech'' (pp. 7-11).
 +
* Toshniwal, S., Tang, H., Lu, L., & Livescu, K. (2017). Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition. In ''Proceedings of Interspeech'' (pp. 3532-3536).
 +
* Kim, S., Hori, T., & Watanabe, S. (2017). Joint CTC-attention based end-to-end speech recognition using multi-task learning. In ''Proceedings of the [https://en.wikipedia.org/wiki/International_Conference_on_Acoustics,_Speech,_and_Signal_Processing IEEE International Conference on Acoustics, Speech and Signal Processing]'' (pp. 4835-4839).
 
* Wen, T. H., Gasic, M., Mrksic, N., Su, P. H., Vandyke, D., & Young, S. (2015). Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems. ''arXiv preprint arXiv:1508.01745''.
 
* Wen, T. H., Gasic, M., Mrksic, N., Su, P. H., Vandyke, D., & Young, S. (2015). Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems. ''arXiv preprint arXiv:1508.01745''.
 
* Sethu, V., Epps, J., & Ambikairajah, E. (2015). Speech based emotion recognition. In ''Speech and Audio Processing for Coding, Enhancement and Recognition'' (pp. 197-228). Springer.
 
* Sethu, V., Epps, J., & Ambikairajah, E. (2015). Speech based emotion recognition. In ''Speech and Audio Processing for Coding, Enhancement and Recognition'' (pp. 197-228). Springer.
Line 97: Line 112:
 
==Software==
 
==Software==
 
===Speech Recognition===
 
===Speech Recognition===
 +
* [https://nvidia.github.io/OpenSeq2Seq/ OpenSeq2Seq] - Python with TensorFlow
 +
* [https://github.com/mozilla/DeepSpeech DeepSpeech] - Python with TensorFlow
 
* [https://pypi.org/project/SpeechRecognition/ SpeechRecognition] - Python library for performing speech recognition, with support for several engines and APIs, online and offline
 
* [https://pypi.org/project/SpeechRecognition/ SpeechRecognition] - Python library for performing speech recognition, with support for several engines and APIs, online and offline
 
* [http://kaldi-asr.org/ Kaldi] - C++
 
* [http://kaldi-asr.org/ Kaldi] - C++
 
* [https://cmusphinx.github.io/ CMUSphinx] - Open Source Speech Recognition Toolkit
 
* [https://cmusphinx.github.io/ CMUSphinx] - Open Source Speech Recognition Toolkit
 
* [https://github.com/mozilla/DeepSpeech DeepSpeech] - Python
 
* [https://github.com/mozilla/DeepSpeech DeepSpeech] - Python
* [https://github.com/facebookresearch/wav2letter wav2letter] - Lua
+
* [https://github.com/facebookresearch/wav2letter wav2letter] - C++ (and previously in Lua)
 
* [https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API Web Speech API (Mozilla)] - Speech Synthesis (Text-to-Speech) and Speech Recognition (Asynchronous Speech Recognition)
 
* [https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API Web Speech API (Mozilla)] - Speech Synthesis (Text-to-Speech) and Speech Recognition (Asynchronous Speech Recognition)
 
* [https://w3c.github.io/speech-api/speechapi.html W3C Web Speech API] - JavaScript
 
* [https://w3c.github.io/speech-api/speechapi.html W3C Web Speech API] - JavaScript
Line 111: Line 128:
  
 
===Miscellaneous===
 
===Miscellaneous===
 +
* [https://nvidia.github.io/OpenSeq2Seq/ OpenSeq2Seq] - Python with TensorFlow
 +
* [https://allennlp.org/ AllenNLP] - Python with PyTorch
 
* [http://www.nltk.org/ NLTK] - Python
 
* [http://www.nltk.org/ NLTK] - Python
 
* [https://radimrehurek.com/gensim/ gensim] - Python
 
* [https://radimrehurek.com/gensim/ gensim] - Python
Line 116: Line 135:
 
* [https://github.com/eellak/nlpbuddy nlpbuddy] - Python
 
* [https://github.com/eellak/nlpbuddy nlpbuddy] - Python
 
* [http://mallet.cs.umass.edu/ MALLET] - Java
 
* [http://mallet.cs.umass.edu/ MALLET] - Java
 +
* [https://github.com/plasticityai/magnitude magnitude] - Python
 
* [https://pypi.org/project/gTTS/ gTTS (Google Text-to-Speech)] -  Python library and CLI tool to interface with Google Translate's text-to-speech API
 
* [https://pypi.org/project/gTTS/ gTTS (Google Text-to-Speech)] -  Python library and CLI tool to interface with Google Translate's text-to-speech API
 
* [https://www.idiap.ch/software/bob/docs/bob/bob.bio.spear/stable/index.html#bob-bio-spear SPEAR: A Speaker Recognition Toolkit based on Bob] - Python
 
* [https://www.idiap.ch/software/bob/docs/bob/bob.bio.spear/stable/index.html#bob-bio-spear SPEAR: A Speaker Recognition Toolkit based on Bob] - Python
 
* [https://pypi.org/project/SIDEKIT/ SIDEKIT] - Python library for Speaker, Language Recognition and Diarization
 
* [https://pypi.org/project/SIDEKIT/ SIDEKIT] - Python library for Speaker, Language Recognition and Diarization
* [http://opennmt.net/ OpenNMT] - Implemented in LuaTorch, PyTorch and TensorFlow
+
* [http://opennmt.net/ OpenNMT] - Implemented in [https://github.com/OpenNMT/OpenNMT LuaTorch], [https://github.com/OpenNMT/OpenNMT-py PyTorch] and [https://github.com/OpenNMT/OpenNMT-tf TensorFlow]
 
* [http://htk.eng.cam.ac.uk/ HTK] - C
 
* [http://htk.eng.cam.ac.uk/ HTK] - C
 
* [https://github.com/kkroening/ffmpeg-python ffmpeg-python] - Python
 
* [https://github.com/kkroening/ffmpeg-python ffmpeg-python] - Python
Line 125: Line 145:
 
* [https://github.com/rabitt/pysox pysox] - Python
 
* [https://github.com/rabitt/pysox pysox] - Python
 
* [https://docs.python.org/3/library/wave.html wave] - Python
 
* [https://docs.python.org/3/library/wave.html wave] - Python
 +
* [https://textblob.readthedocs.io/ TextBlob] - Python
 +
* [http://www.parl.ai/ ParlAI]
 +
* [http://www.statmt.org/moses/ Moses]
 +
* [https://awslabs.github.io/sockeye/ sockeye] - Python
 +
* [https://github.com/tensorflow/tensor2tensor T2T or Tensor2Tensor] - Python with TensorFlow
 +
* [https://github.com/marian-nmt/marian marian] - C++
 +
* [http://www.speech.sri.com/projects/srilm/ SRILM] - C++
 +
* [https://srilm-python.readthedocs.io/ srilm-python] - Python
 +
* [https://librosa.github.io/librosa/index.html LibROSA] - Python
  
 
==See also==
 
==See also==
Line 132: Line 161:
  
 
==Other Resources==
 
==Other Resources==
* [https://scholar.google.com/citations?view_op=top_venues&hl=en&vq=eng_computationallinguistics Computational Linguistics] - Google Scholar Metrics (Top Publications)
+
*[https://scholar.google.com/citations?view_op=top_venues&hl=en&vq=eng_computationallinguistics Computational Linguistics] - Google Scholar Metrics (Top Publications)
 +
*[https://rajpurkar.github.io/SQuAD-explorer/ SQuAD] - The Stanford Question Answering Dataset
 
*[https://github.com/keon/awesome-nlp Awesome-NLP] - Github
 
*[https://github.com/keon/awesome-nlp Awesome-NLP] - Github
 
*[https://github.com/talolard/seq2seq_learn seq2seq_learn] - Github
 
*[https://github.com/talolard/seq2seq_learn seq2seq_learn] - Github
Line 208: Line 238:
 
*[https://medium.com/@Alibaba_Cloud/self-attention-mechanisms-in-natural-language-processing-9f28315ff905 Self-Attention Mechanisms in Natural Language Processing] - blog post
 
*[https://medium.com/@Alibaba_Cloud/self-attention-mechanisms-in-natural-language-processing-9f28315ff905 Self-Attention Mechanisms in Natural Language Processing] - blog post
 
*[https://github.com/tensorflow/nmt nmt (GitHub)] - code
 
*[https://github.com/tensorflow/nmt nmt (GitHub)] - code
 +
*[https://drive.google.com/file/d/15ehMIJ7wY9A7RSmyJPNmrBMuC7se0PMP/view Frontiers in Natural Language Processing]
 +
*[https://docs.google.com/document/d/18NoNdArdzDLJFQGBMVMsQ-iLOowP1XXDaSVRmYN0IyM/ Frontiers in Natural Language Processing Expert Responses]
 +
*Building a Conversational Chatbot for Slack using Rasa and Python ([https://towardsdatascience.com/building-a-conversational-chatbot-for-slack-using-rasa-and-python-part-1-bca5cc75d32f Part 1], [https://towardsdatascience.com/building-a-conversational-chatbot-for-slack-using-rasa-and-python-part-2-ce7233f2e9e7 Part 2])
 +
*[https://github.com/A2Zadeh/CMU-MultimodalSDK CMU-MOSI] - dataset
 +
*[http://web.eecs.umich.edu/~mihalcea/downloads.html#MOUD MOUD] - dataset
 +
  
 
[[Category:Machine Learning]]
 
[[Category:Machine Learning]]

Revision as of 18:32, 3 July 2019

This page contains resources about Natural Language Processing, Text Mining, Speech Processing, Audio Signal Processing and Computational Linguistics.

Subfields and Concepts

  • Vector Space Model (VSM)
  • Latent Semantic Indexing
  • Latent Semantic Analysis
  • Latent Dirichlet Allocation (LDA)
  • Attention Mechanism
    • Transformer
    • Set Transformer
  • Speaker Recognition
  • Speaker Identification / Speaker Diarization
    • Speaker Segmentation
    • Speaker Clustering
  • Speech Synthesis / Text-to-Speech
  • Speech Recognition / Voice Recognition / Speech-to-Text / Transcription
    • Conversational Speech
    • Voice Dictation
    • Voice Commands
  • Audio Captioning / Subtitling
  • Automatic Lyrics Recognition
  • Topic Model
  • Text Preprocessing
    • Tokenization
    • Stemming
    • Lemmatisation
    • Word embeddings / Word vectors
      • Word2Vec Model
        • Continuous Skip-gram
        • Continuous Bag-of-Words (CBOW)
      • GloVe
      • FastText
    • Bag-of-Words (BoW) Model
    • N-grams
      • Unigrams
      • Bigrams
  • Term Frequency - Inverse Document Frequency (TF-IDF)
  • Sequence-to-Sequence (seq2seq) Model
  • Dynamic Memory Network (a specific architecture of Artificial Neural Networks)
  • Sequence Tagging
  • Natural Language Understanding (NLU)
  • Natural Language Generation (NLG)
  • Named-Entity Recognition (NER)
  • (Natural Language) Semantic Analysis
  • Sentiment Analysis
  • Emotion Recognition
  • Diacritization (e.g. in Hebrew or Arabic)
  • Chatbots
  • Question-Answering System
  • Machine Translation
  • Text summarization

Online Courses

Video Lectures

Lecture Notes

Books

Natural Language Processing

  • Arumugam, R., & Shanmugamani, R. (2018). Hands-On Natural Language Processing with Python. Packt Publishing.
  • Srinivasa-Desikan, B. (2018). Natural Language Processing and Computational Linguistics. Packt Publishing.
  • Goldberg, Y., & (2017). Neural Network Methods for Natural Language Processing. Morgan & Claypool Publishers.
  • Silge, J., & Robinson, D. (2017). Text mining with R: A tidy approach. O'Reilly Media, Inc.
  • Sarkar, D. (2016). Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from Your Data. Apress. (link)
  • Mihalcea, R., & Radev, D. (2011). Graph-based natural language processing and information retrieval. Cambridge University Press.
  • Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. O'Reilly Media, Inc. (link)
  • Zhai, C. (2008). Statistical language models for information retrieval. Morgan and Claypool Publishers.
  • Tiwary, U. S., & Siddiqui, T. (2008). Natural language processing and information retrieval. Oxford University Press.
  • Manning, C. D., & Schutze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press.

Speech Processing

  • Rabiner, L. R., & Schafer, R. W. (2011). Theory and applications of digital speech processing. Pearson.
  • Gold, B., Morgan, N., & Ellis, D. (2011). Speech and audio signal processing: processing and perception of speech and music. 2nd Ed. John Wiley & Sons.
  • Mitra, S. K., & Kuo, Y. (2010). Digital signal processing: a computer-based approach. 4th Ed. McGraw-Hill.
  • Spanias, A., Painter, T., & Atti, V. (2006). Audio signal processing and coding. John Wiley & Sons.
  • Huang, X., Acero, A., Hon, H. W., & Reddy, R. (2001). Spoken language processing: A guide to theory, algorithm, and system development. Prentice Hall.
  • Quatieri, T. F. (2001). Discrete-time speech signal processing: principles and practice. Pearson.
  • Holmes, J., & Holmes, W. (2001). Speech Synthesis and Recognition. 2nd Ed. CRC press.
  • Huang, X., Acero, A., Hon, H. W., & Reddy, R. (2001). Spoken language processing: A guide to theory, algorithm, and system development. Prentice Hall.
  • Jelinek, F. (1998). Statistical methods for speech recognition. MIT Press.
  • Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Prentice Hall.

Speech and Natural Language Processing (both)

  • Jurafsky, D., & Martin, J. H. (2000). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall. (link)

Scholarly articles

  • Park, D. S., Chan, W., Zhang, Y., Chiu, C. C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019). SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. arXiv preprint arXiv:1904.08779.
  • Das, A., Li, J., Zhao, R., & Gong, Y. (2018). Advancing connectionist temporal classification with attention modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4769-4773).
  • Moriya, T., Ueno, S., Shinohara, Y., Delcroix, M., Yamaguchi, Y., & Aono, Y. (2018). Multi-task Learning with Augmentation Strategy for Acoustic-to-word Attention-based Encoder-decoder Speech Recognition. In Proceedings of Interspeech (pp. 2399-2403).
  • Zeyer, A., Irie, K., Schlüter, R., & Ney, H. (2018). Improved Training of End-to-end Attention Models for Speech Recognition. In Proceedings of Interspeech (pp. 7-11).
  • Toshniwal, S., Tang, H., Lu, L., & Livescu, K. (2017). Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition. In Proceedings of Interspeech (pp. 3532-3536).
  • Kim, S., Hori, T., & Watanabe, S. (2017). Joint CTC-attention based end-to-end speech recognition using multi-task learning. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4835-4839).
  • Wen, T. H., Gasic, M., Mrksic, N., Su, P. H., Vandyke, D., & Young, S. (2015). Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems. arXiv preprint arXiv:1508.01745.
  • Sethu, V., Epps, J., & Ambikairajah, E. (2015). Speech based emotion recognition. In Speech and Audio Processing for Coding, Enhancement and Recognition (pp. 197-228). Springer.
  • Blei, D. M. (2012). Probabilistic Topic Models. Communications of the ACM, 55(4), 77-84.

Tutorials

Software

Speech Recognition

Miscellaneous

See also

Other Resources