Natural Language Processing

From Ioannis Kourouklides
Jump to navigation Jump to search

This page contains resources about Natural Language Processing, Text Mining, Speech Processing, Audio Signal Processing and Computational Linguistics.

Subfields and Concepts[edit]

  • Vector Space Model (VSM)
  • Latent Semantic Indexing
  • Latent Semantic Analysis
  • Latent Dirichlet Allocation (LDA)
  • Part-of-speech tagging
  • Sequence-to-Sequence (seq2seq) Model
  • Dynamic Memory Network (a specific architecture of Artificial Neural Networks)
  • Attention Mechanism
    • RNN with attention
    • Transformer (i.e. self-attention & FNN)
    • Set Transformer
  • Speaker Recognition
  • Speaker Verification
  • Speaker Identification / Speaker Diarization
    • Speaker Segmentation
    • Speaker Clustering
  • Speech Synthesis / Text-to-Speech
  • Speech Recognition / Voice Recognition / Speech-to-Text / Transcription
    • Conversational Speech
    • Voice Dictation
    • Voice Commands
  • Audio Captioning / Subtitling
  • Automatic Lyrics Recognition
  • Topic Model
  • Text Preprocessing
    • Tokenization
    • Stemming
    • Lemmatisation
    • Word embeddings / Feature vectors / Word representations
      • Sparse feature vectors
      • Word2Vec Model
        • Continuous Skip-gram
        • Continuous Bag-of-Words (CBOW)
      • GloVe
      • FastText
    • Bag-of-Words (BoW) Model
    • N-grams
      • Unigrams
      • Bigrams
  • Term Frequency - Inverse Document Frequency (TF-IDF)
  • Sequence Tagging
  • Natural Language Understanding (NLU)
  • Natural Language Generation (NLG)
  • Named-Entity Recognition (NER)
  • (Natural Language) Semantic Analysis
  • Sentiment Analysis
  • Emotion Recognition
  • Diacritization (e.g. in Hebrew or Arabic)
  • Dialogue System / Conversational Agents
    • Task-Oriented Dialogue System / Goal-Oriented Conversational Agent (usually built for speech input and output)
      • Pipeline Systems
        • Natural language understanding (NLU)
        • Dialogue state tracking
        • Dialogue policy learning
        • Natural language generation (NLG)
      • End-to-End trainable Systems
    • Non-Task-Oriented Dialogue System / Chatbot (in the strict sense) / Question-Answering (QA) System
      • Rule-based QA
      • ML-based QA / Corpus-based QA
        • Retrieval-based models (using Utterance selection)
        • Generative models
  • Visual Question-Answering (VQA)
  • Question Generation
  • Machine Translation (MT)
  • Text summarization

Online Courses[edit]

Video Lectures[edit]

Lecture Notes[edit]


Natural Language Processing[edit]

  • Arumugam, R., & Shanmugamani, R. (2018). Hands-On Natural Language Processing with Python. Packt Publishing.
  • Srinivasa-Desikan, B. (2018). Natural Language Processing and Computational Linguistics. Packt Publishing.
  • Goldberg, Y., & (2017). Neural Network Methods for Natural Language Processing. Morgan & Claypool Publishers.
  • Silge, J., & Robinson, D. (2017). Text mining with R: A tidy approach. O'Reilly Media, Inc.
  • Sarkar, D. (2016). Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from Your Data. Apress. (link)
  • Mihalcea, R., & Radev, D. (2011). Graph-based natural language processing and information retrieval. Cambridge University Press.
  • Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. O'Reilly Media, Inc. (link)
  • Zhai, C. (2008). Statistical language models for information retrieval. Morgan and Claypool Publishers.
  • Tiwary, U. S., & Siddiqui, T. (2008). Natural language processing and information retrieval. Oxford University Press.
  • Manning, C. D., & Schutze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press.

Speech Processing[edit]

  • Rabiner, L. R., & Schafer, R. W. (2011). Theory and applications of digital speech processing. Pearson.
  • Gold, B., Morgan, N., & Ellis, D. (2011). Speech and audio signal processing: processing and perception of speech and music. 2nd Ed. John Wiley & Sons.
  • Mitra, S. K., & Kuo, Y. (2010). Digital signal processing: a computer-based approach. 4th Ed. McGraw-Hill.
  • Spanias, A., Painter, T., & Atti, V. (2006). Audio signal processing and coding. John Wiley & Sons.
  • Huang, X., Acero, A., Hon, H. W., & Reddy, R. (2001). Spoken language processing: A guide to theory, algorithm, and system development. Prentice Hall.
  • Quatieri, T. F. (2001). Discrete-time speech signal processing: principles and practice. Pearson.
  • Holmes, J., & Holmes, W. (2001). Speech Synthesis and Recognition. 2nd Ed. CRC press.
  • Huang, X., Acero, A., Hon, H. W., & Reddy, R. (2001). Spoken language processing: A guide to theory, algorithm, and system development. Prentice Hall.
  • Jelinek, F. (1998). Statistical methods for speech recognition. MIT Press.
  • Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Prentice Hall.

Speech and Natural Language Processing (both)[edit]

  • Jurafsky, D., & Martin, J. H. (2000). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall. (link)

Scholarly articles[edit]

  • Park, D. S., Chan, W., Zhang, Y., Chiu, C. C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019). SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. arXiv preprint arXiv:1904.08779.
  • Deriu, J., Rodrigo, A., Otegi, A., Echegoyen, G., Rosset, S., Agirre, E., & Cieliebak, M. (2019). Survey on Evaluation Methods for Dialogue Systems. arXiv preprint arXiv:1905.04071.
  • Das, A., Li, J., Zhao, R., & Gong, Y. (2018). Advancing connectionist temporal classification with attention modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4769-4773).
  • Moriya, T., Ueno, S., Shinohara, Y., Delcroix, M., Yamaguchi, Y., & Aono, Y. (2018). Multi-task Learning with Augmentation Strategy for Acoustic-to-word Attention-based Encoder-decoder Speech Recognition. In Proceedings of Interspeech (pp. 2399-2403).
  • Zeyer, A., Irie, K., Schlüter, R., & Ney, H. (2018). Improved Training of End-to-end Attention Models for Speech Recognition. In Proceedings of Interspeech (pp. 7-11).
  • Toshniwal, S., Tang, H., Lu, L., & Livescu, K. (2017). Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition. In Proceedings of Interspeech (pp. 3532-3536).
  • Kim, S., Hori, T., & Watanabe, S. (2017). Joint CTC-attention based end-to-end speech recognition using multi-task learning. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4835-4839).
  • Chen, H., Liu, X., Yin, D., & Tang, J. (2017). A survey on dialogue systems: Recent advances and new frontiers. ACM SIGKDD Explorations Newsletter, 19(2), 25-35.
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 5998-6008).
  • Mishra, A., & Jain, S. K. (2016). A survey on question answering systems with classification. Journal of King Saud University-Computer and Information Sciences, 28(3), 345-361.
  • Yin, J., Jiang, X., Lu, Z., Shang, L., Li, H., & Li, X. (2016). Neural generative question answering. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (pp. 2972-2978).
  • Luong, M. T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. In Empirical Methods in Natural Language Processing (pp. 1412–1421).
  • Wen, T. H., Gasic, M., Mrksic, N., Su, P. H., Vandyke, D., & Young, S. (2015). Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems. arXiv preprint arXiv:1508.01745.
  • Sethu, V., Epps, J., & Ambikairajah, E. (2015). Speech based emotion recognition. In Speech and Audio Processing for Coding, Enhancement and Recognition (pp. 197-228). Springer.
  • Blei, D. M. (2012). Probabilistic Topic Models. Communications of the ACM, 55(4), 77-84.



Speech Recognition[edit]

Text-to-Speech (TTS) Synthesis[edit]


See also[edit]

Other Resources[edit]