Natural Language Processing

This page contains resources about Natural Language Processing, Text Mining, Speech Processing, Audio Signal Processing and Computational Linguistics.

Subfields and ConceptsEdit

  • Vector Space Model (VSM)
  • Latent Semantic Indexing
  • Latent Semantic Analysis
  • Latent Dirichlet Allocation (LDA)
  • Part-of-speech tagging
  • Sequence-to-Sequence (seq2seq) Model
  • Dynamic Memory Network (a specific architecture of Artificial Neural Networks)
  • Attention Mechanism
    • RNN with attention
    • Transformer (i.e. self-attention & FNN)
    • Set Transformer
  • Speaker Recognition
  • Speaker Verification
  • Speaker Identification / Speaker Diarization
    • Speaker Segmentation
    • Speaker Clustering
  • Speech Synthesis / Text-to-Speech
  • Speech Recognition / Voice Recognition / Speech-to-Text / Transcription
    • Conversational Speech
    • Voice Dictation
    • Voice Commands
  • Audio Captioning / Subtitling
  • Automatic Lyrics Recognition
  • Topic Model
  • Text Preprocessing
    • Tokenization
    • Stemming
    • Lemmatisation
    • Word embeddings / Feature vectors / Word representations
      • Sparse feature vectors
      • Word2Vec Model
        • Continuous Skip-gram
        • Continuous Bag-of-Words (CBOW)
      • GloVe
      • FastText
    • Bag-of-Words (BoW) Model
    • N-grams
      • Unigrams
      • Bigrams
  • Term Frequency - Inverse Document Frequency (TF-IDF)
  • Sequence Tagging
  • Natural Language Understanding (NLU)
  • Natural Language Generation (NLG)
  • Named-Entity Recognition (NER)
  • (Natural Language) Semantic Analysis
  • Sentiment Analysis
  • Emotion Recognition
  • Diacritization (e.g. in Hebrew or Arabic)
  • Dialogue System / Conversational Agents
    • Task-Oriented Dialogue System / Goal-Oriented Conversational Agent (usually built for speech input and output)
      • Pipeline Systems
        • Natural language understanding (NLU)
        • Dialogue state tracking
        • Dialogue policy learning
        • Natural language generation (NLG)
      • End-to-End trainable Systems
    • Non-Task-Oriented Dialogue System / Chatbot (in the strict sense) / Question-Answering (QA) System
      • Rule-based QA
      • ML-based QA / Corpus-based QA
        • Retrieval-based models (using Utterance selection)
        • Generative models
  • Visual Question-Answering (VQA)
  • Question Generation
  • Machine Translation (MT)
  • Text summarization

Online CoursesEdit

Video LecturesEdit

Lecture NotesEdit


Natural Language ProcessingEdit

  • Arumugam, R., & Shanmugamani, R. (2018). Hands-On Natural Language Processing with Python. Packt Publishing.
  • Srinivasa-Desikan, B. (2018). Natural Language Processing and Computational Linguistics. Packt Publishing.
  • Goldberg, Y., & (2017). Neural Network Methods for Natural Language Processing. Morgan & Claypool Publishers.
  • Silge, J., & Robinson, D. (2017). Text mining with R: A tidy approach. O'Reilly Media, Inc.
  • Sarkar, D. (2016). Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from Your Data. Apress. (link)
  • Mihalcea, R., & Radev, D. (2011). Graph-based natural language processing and information retrieval. Cambridge University Press.
  • Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. O'Reilly Media, Inc. (link)
  • Zhai, C. (2008). Statistical language models for information retrieval. Morgan and Claypool Publishers.
  • Tiwary, U. S., & Siddiqui, T. (2008). Natural language processing and information retrieval. Oxford University Press.
  • Manning, C. D., & Schutze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press.

Speech ProcessingEdit

  • Rabiner, L. R., & Schafer, R. W. (2011). Theory and applications of digital speech processing. Pearson.
  • Gold, B., Morgan, N., & Ellis, D. (2011). Speech and audio signal processing: processing and perception of speech and music. 2nd Ed. John Wiley & Sons.
  • Mitra, S. K., & Kuo, Y. (2010). Digital signal processing: a computer-based approach. 4th Ed. McGraw-Hill.
  • Spanias, A., Painter, T., & Atti, V. (2006). Audio signal processing and coding. John Wiley & Sons.
  • Huang, X., Acero, A., Hon, H. W., & Reddy, R. (2001). Spoken language processing: A guide to theory, algorithm, and system development. Prentice Hall.
  • Quatieri, T. F. (2001). Discrete-time speech signal processing: principles and practice. Pearson.
  • Holmes, J., & Holmes, W. (2001). Speech Synthesis and Recognition. 2nd Ed. CRC press.
  • Huang, X., Acero, A., Hon, H. W., & Reddy, R. (2001). Spoken language processing: A guide to theory, algorithm, and system development. Prentice Hall.
  • Jelinek, F. (1998). Statistical methods for speech recognition. MIT Press.
  • Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Prentice Hall.

Speech and Natural Language Processing (both)Edit

  • Jurafsky, D., & Martin, J. H. (2000). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall. (link)

Scholarly articlesEdit

  • Park, D. S., Chan, W., Zhang, Y., Chiu, C. C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019). SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. arXiv preprint arXiv:1904.08779.
  • Deriu, J., Rodrigo, A., Otegi, A., Echegoyen, G., Rosset, S., Agirre, E., & Cieliebak, M. (2019). Survey on Evaluation Methods for Dialogue Systems. arXiv preprint arXiv:1905.04071.
  • Das, A., Li, J., Zhao, R., & Gong, Y. (2018). Advancing connectionist temporal classification with attention modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4769-4773).
  • Moriya, T., Ueno, S., Shinohara, Y., Delcroix, M., Yamaguchi, Y., & Aono, Y. (2018). Multi-task Learning with Augmentation Strategy for Acoustic-to-word Attention-based Encoder-decoder Speech Recognition. In Proceedings of Interspeech (pp. 2399-2403).
  • Zeyer, A., Irie, K., Schlüter, R., & Ney, H. (2018). Improved Training of End-to-end Attention Models for Speech Recognition. In Proceedings of Interspeech (pp. 7-11).
  • Toshniwal, S., Tang, H., Lu, L., & Livescu, K. (2017). Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition. In Proceedings of Interspeech (pp. 3532-3536).
  • Kim, S., Hori, T., & Watanabe, S. (2017). Joint CTC-attention based end-to-end speech recognition using multi-task learning. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4835-4839).
  • Chen, H., Liu, X., Yin, D., & Tang, J. (2017). A survey on dialogue systems: Recent advances and new frontiers. ACM SIGKDD Explorations Newsletter, 19(2), 25-35.
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 5998-6008).
  • Mishra, A., & Jain, S. K. (2016). A survey on question answering systems with classification. Journal of King Saud University-Computer and Information Sciences, 28(3), 345-361.
  • Yin, J., Jiang, X., Lu, Z., Shang, L., Li, H., & Li, X. (2016). Neural generative question answering. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (pp. 2972-2978).
  • Luong, M. T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. In Empirical Methods in Natural Language Processing (pp. 1412–1421).
  • Wen, T. H., Gasic, M., Mrksic, N., Su, P. H., Vandyke, D., & Young, S. (2015). Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems. arXiv preprint arXiv:1508.01745.
  • Sethu, V., Epps, J., & Ambikairajah, E. (2015). Speech based emotion recognition. In Speech and Audio Processing for Coding, Enhancement and Recognition (pp. 197-228). Springer.
  • Blei, D. M. (2012). Probabilistic Topic Models. Communications of the ACM, 55(4), 77-84.



Speech RecognitionEdit

Text-to-Speech (TTS) SynthesisEdit


See alsoEdit

Other ResourcesEdit