Natural Language Processing

This page contains resources about Natural Language Processing, Text Mining, Speech Processing, Audio Signal Processing and Computational Linguistics.

Subfields and Concepts

 * Vector Space Model (VSM)
 * Latent Semantic Indexing
 * Latent Semantic Analysis
 * Latent Dirichlet Allocation (LDA)
 * Part-of-speech tagging
 * Sequence-to-Sequence (seq2seq) Model
 * Dynamic Memory Network (a specific architecture of Artificial Neural Networks)
 * Attention Mechanism
 * RNN with attention
 * Transformer (i.e. self-attention & FNN)
 * Set Transformer
 * Speaker Recognition
 * Speaker Verification
 * Speaker Identification / Speaker Diarization
 * Speaker Segmentation
 * Speaker Clustering
 * Speech Synthesis / Text-to-Speech
 * Speech Recognition / Voice Recognition / Speech-to-Text / Transcription
 * Conversational Speech
 * Voice Dictation
 * Voice Commands
 * Audio Captioning / Subtitling
 * Automatic Lyrics Recognition
 * Topic Model
 * Text Preprocessing
 * Tokenization
 * Stemming
 * Lemmatisation
 * Word embeddings / Feature vectors / Word representations
 * Sparse feature vectors
 * Word2Vec Model
 * Continuous Skip-gram
 * Continuous Bag-of-Words (CBOW)
 * GloVe
 * FastText
 * Bag-of-Words (BoW) Model
 * N-grams
 * Unigrams
 * Bigrams
 * Term Frequency - Inverse Document Frequency (TF-IDF)
 * Sequence Tagging
 * Natural Language Understanding (NLU)
 * Natural Language Generation (NLG)
 * Named-Entity Recognition (NER)
 * (Natural Language) Semantic Analysis
 * Sentiment Analysis
 * Emotion Recognition
 * Diacritization (e.g. in Hebrew or Arabic)
 * Dialogue System / Conversational Agents
 * Task-Oriented Dialogue System / Goal-Oriented Conversational Agent (usually built for speech input and output)
 * Pipeline Systems
 * Natural language understanding (NLU)
 * Dialogue state tracking
 * Dialogue policy learning
 * Natural language generation (NLG)
 * End-to-End trainable Systems
 * Non-Task-Oriented Dialogue System / Chatbot (in the strict sense) / Question-Answering (QA) System
 * Rule-based QA
 * ML-based QA / Corpus-based QA
 * Retrieval-based models (using Utterance selection)
 * Generative models
 * Visual Question-Answering (VQA)
 * Question Generation
 * Machine Translation (MT)
 * Text summarization

Video Lectures

 * Natural Language Processing with Deep Learning by Chris Manning and Richard Socher
 * Natural Language Processing by Pushpack Bhattacharyya - NPTEL
 * Natural Language Processing by Dragomir Radev
 * Text Mining and Analytics by ChengXiang ("Cheng") Zhai
 * Deep Learning for Natural Language Processing: Applications of Deep Neural Networks to Machine Learning Tasks by Jon Krohn
 * Natural Language Processing - Coursera
 * Neural Networks for NLP by Graham Neubig (slides)

Lecture Notes

 * Advanced Language Processing by Michael Collins and Regina Barzilay - MIT Opencourseware
 * Automatic Speech Recognition by James Glass and Victor Zue - MIT Opencourseware
 * Speech Recognition by Bhuvana Ramabhadran, Markus Nussbaum-Thom, Michael Picheny and Stanley F. Chen
 * Natural Language Processing with NLTK
 * CS224U: Natural Language Understanding by Bill MacCartney and Christopher Potts

Natural Language Processing

 * Arumugam, R., & Shanmugamani, R. (2018). Hands-On Natural Language Processing with Python. Packt Publishing.
 * Srinivasa-Desikan, B. (2018). Natural Language Processing and Computational Linguistics. Packt Publishing.
 * Goldberg, Y., & (2017). Neural Network Methods for Natural Language Processing. Morgan & Claypool Publishers.
 * Silge, J., & Robinson, D. (2017). Text mining with R: A tidy approach. O'Reilly Media, Inc.
 * Sarkar, D. (2016). Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from Your Data. Apress. (link)
 * Mihalcea, R., & Radev, D. (2011). Graph-based natural language processing and information retrieval. Cambridge University Press.
 * Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. O'Reilly Media, Inc. (link)
 * Zhai, C. (2008). Statistical language models for information retrieval. Morgan and Claypool Publishers.
 * Tiwary, U. S., & Siddiqui, T. (2008). Natural language processing and information retrieval. Oxford University Press.
 * Manning, C. D., & Schutze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press.

Speech Processing

 * Rabiner, L. R., & Schafer, R. W. (2011). Theory and applications of digital speech processing. Pearson.
 * Gold, B., Morgan, N., & Ellis, D. (2011). Speech and audio signal processing: processing and perception of speech and music. 2nd Ed. John Wiley & Sons.
 * Mitra, S. K., & Kuo, Y. (2010). Digital signal processing: a computer-based approach. 4th Ed. McGraw-Hill.
 * Spanias, A., Painter, T., & Atti, V. (2006). Audio signal processing and coding. John Wiley & Sons.
 * Huang, X., Acero, A., Hon, H. W., & Reddy, R. (2001). Spoken language processing: A guide to theory, algorithm, and system development. Prentice Hall.
 * Quatieri, T. F. (2001). Discrete-time speech signal processing: principles and practice. Pearson.
 * Holmes, J., & Holmes, W. (2001). Speech Synthesis and Recognition. 2nd Ed. CRC press.
 * Huang, X., Acero, A., Hon, H. W., & Reddy, R. (2001). Spoken language processing: A guide to theory, algorithm, and system development. Prentice Hall.
 * Jelinek, F. (1998). Statistical methods for speech recognition. MIT Press.
 * Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Prentice Hall.

Speech and Natural Language Processing (both)

 * Jurafsky, D., & Martin, J. H. (2000). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall. (link)

Scholarly articles

 * Park, D. S., Chan, W., Zhang, Y., Chiu, C. C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019). SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. arXiv preprint arXiv:1904.08779.
 * Deriu, J., Rodrigo, A., Otegi, A., Echegoyen, G., Rosset, S., Agirre, E., & Cieliebak, M. (2019). Survey on Evaluation Methods for Dialogue Systems. arXiv preprint arXiv:1905.04071.
 * Das, A., Li, J., Zhao, R., & Gong, Y. (2018). Advancing connectionist temporal classification with attention modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4769-4773).
 * Moriya, T., Ueno, S., Shinohara, Y., Delcroix, M., Yamaguchi, Y., & Aono, Y. (2018). Multi-task Learning with Augmentation Strategy for Acoustic-to-word Attention-based Encoder-decoder Speech Recognition. In Proceedings of Interspeech (pp. 2399-2403).
 * Zeyer, A., Irie, K., Schlüter, R., & Ney, H. (2018). Improved Training of End-to-end Attention Models for Speech Recognition. In Proceedings of Interspeech (pp. 7-11).
 * Toshniwal, S., Tang, H., Lu, L., & Livescu, K. (2017). Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition. In Proceedings of Interspeech (pp. 3532-3536).
 * Kim, S., Hori, T., & Watanabe, S. (2017). Joint CTC-attention based end-to-end speech recognition using multi-task learning. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4835-4839).
 * Chen, H., Liu, X., Yin, D., & Tang, J. (2017). A survey on dialogue systems: Recent advances and new frontiers. ACM SIGKDD Explorations Newsletter, 19(2), 25-35.
 * Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 5998-6008).
 * Mishra, A., & Jain, S. K. (2016). A survey on question answering systems with classification. Journal of King Saud University-Computer and Information Sciences, 28(3), 345-361.
 * Yin, J., Jiang, X., Lu, Z., Shang, L., Li, H., & Li, X. (2016). Neural generative question answering. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (pp. 2972-2978).
 * Luong, M. T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. In Empirical Methods in Natural Language Processing (pp. 1412–1421).
 * Wen, T. H., Gasic, M., Mrksic, N., Su, P. H., Vandyke, D., & Young, S. (2015). Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems. arXiv preprint arXiv:1508.01745.
 * Sethu, V., Epps, J., & Ambikairajah, E. (2015). Speech based emotion recognition. In Speech and Audio Processing for Coding, Enhancement and Recognition (pp. 197-228). Springer.
 * Blei, D. M. (2012). Probabilistic Topic Models. Communications of the ACM, 55(4), 77-84.

Tutorials

 * Trying to Understand Recurrent Neural Networks for Language Processing by Yoav Goldberg,

Speech Recognition

 * OpenSeq2Seq - Python with TensorFlow
 * DeepSpeech - Python with TensorFlow
 * SpeechRecognition - Python library for performing speech recognition, with support for several engines and APIs, online and offline
 * Kaldi - C++
 * CMUSphinx - Open Source Speech Recognition Toolkit
 * DeepSpeech - Python
 * wav2letter - C++ (and previously in Lua)
 * Web Speech API (Mozilla) - Speech Synthesis (Text-to-Speech) and Speech Recognition (Asynchronous Speech Recognition)
 * W3C Web Speech API - JavaScript
 * wit.ai - free API using NLP for text and voice
 * Google Cloud Speech-to-Text API - paid API
 * Microsoft Bing Speech-to-Text API - paid API
 * IBM Watson Speech to Text - paid API
 * AssemblyAI - freeware API

Text-to-Speech (TTS) Synthesis

 * gTTS (Google Text-to-Speech) - Python library and CLI tool to interface with Google Translate's text-to-speech API
 * tacotron - Python and TensorFlow TTS
 * Tacotron-2 - Python and TensorFlow TTS
 * tacotron2 - Python and PyTorch TTS
 * multi-speaker-tacotron - Python and TensorFlow TTS
 * nv-wavenet - Python and C ++ TTS
 * WaveGlow - Python and PyTorch TTS
 * tensorflow-wavenet - Python and TensorFlow TTS
 * wavenet_vocoder - Python and PyTorch TTS
 * The Festival Speech Synthesis System
 * CMU Flite

Miscellaneous

 * OpenSeq2Seq - Python with TensorFlow
 * AllenNLP - Python with PyTorch
 * NLTK - Python
 * gensim - Python
 * spaCy - Python
 * nlpbuddy - Python
 * MALLET - Java
 * magnitude - Python
 * SPEAR: A Speaker Recognition Toolkit based on Bob - Python
 * SIDEKIT - Python library for Speaker, Language Recognition and Diarization
 * OpenNMT - Implemented in LuaTorch, PyTorch and TensorFlow
 * HTK - C
 * ffmpeg-python - Python
 * SoX - command line utility
 * pysox - Python
 * wave - Python
 * TextBlob - Python
 * ParlAI
 * Moses
 * sockeye - Python
 * T2T or Tensor2Tensor - Python with TensorFlow
 * marian - C++
 * SRILM - C++
 * srilm-python - Python
 * LibROSA - Python
 * StarSpace - C++ and Python

Other Resources

 * Computational Linguistics - Google Scholar Metrics (Top Publications)
 * SQuAD - The Stanford Question Answering Dataset
 * Awesome-NLP - Github
 * seq2seq_learn - Github
 * Word Embeddings: A Natural Language Processing Crash Course
 * Complete Guide to Topic Modeling - NLP for hackers
 * Quick Tutorial: Topic Modelling with LDA
 * Smart Compose: Using Neural Networks to Help Write Emails - blog post
 * Wit-Speech-API-Wrapper - Github
 * python-speech-recognition - Github
 * A Practitioner's Guide to Natural Language Processing (Part I) — Processing & Understanding Text - blog post
 * The Ultimate Guide To Speech Recognition With Python - blog post
 * Practical seq2seq - blog post
 * Chatbots with Seq2Seq - blog post
 * Pre-Processing in Natural Language Machine Learning - blog post
 * How to Prepare Text Data for Machine Learning with scikit-learn - blog post
 * Word2Vec and FastText Word Embedding with Gensim - blog post
 * Introduction to NLP - Part 1: Overview - blog post
 * The Ultimate Guide To Speech Recognition With Python - blog post
 * Digital signal processing through speech, hearing, and Python - presentation
 * MATLAB Functionality for Digital Speech Processing - presentation
 * attract-repel - Github
 * speaker-recognition - Github
 * Microsoft Speaker Recognition API: Python Sample - Github
 * Speaker-Recognition - Github
 * speaker-diarization - Github
 * VBDiarization - Github
 * pyAudioAnalysis - Github
 * seq2seq - Github
 * Automatic_Speech_Recognition - Github
 * neuralmonkey - Github
 * nlpnet - Github
 * nematus - Github
 * speech-processing - Github
 * Deep Learning for Natural Language Processing: Tutorials with Jupyter Notebooks - blog post
 * Word morphing - blog post
 * Tutorial: Asynchronous Speech Recognition in Python - blog post
 * Web Speech API Specification - W3C Community Group
 * Speech recognition with wit.ai- blog post
 * LibriSpeech ASR corpus - dataset
 * NLP For Topic Modeling & Summarization Of Legal Documents - blog post
 * Automatic Speech Recognition (Youtube) - video
 * You Shall Not Speak - blog post
 * Simple Audio Recognition (TensorFlow) - blog post
 * open-speech-recording (GitHub) - code
 * Introduction to NLP - blog post
 * Deep Learning for Speech Recognition (Youtube) - video
 * Web Speech API (GitHub) - code
 * Web Speech API: Add Speech to your Website
 * Voice Driven Web Apps: Introduction to the Web Speech API - blog post
 * Launching the Speech Commands Dataset - blog post
 * Improving End-to-End Models For Speech Recognition - blog post
 * Sentiment Analysis on Reddit News Headlines with Python’s Natural Language Toolkit (NLTK) - blog post with code
 * Predicting Reddit News Sentiment with Naive Bayes and Other Text Classifiers - blog post with code
 * Attention in NLP - blog post
 * Building a Question-Answering System from Scratch— Part 1 - blog post
 * Han Xiao - blog
 * text-top-model (GitHub) - code for benchmarking text classification algorithms
 * Text Analytics Techniques - blog
 * Sentiment-Analysis-Twitter (GitHub) - code
 * machine-translation (GitHub) - code
 * Datasets on Linguistics (Kaggle)
 * NLP-progress - Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks
 * Attention-Based-BiLSTM-relation-extraction (GitHub) - code
 * bert (GitHub) - code
 * dialogflow - chatbot (by Google)
 * SCOOT: Speech Communication Online Training - ISCA’s guide to online training resources in Speech Communication
 * uis-rnn (GitHub) - code
 * nlptutorial (GitHub) - code
 * transformer (GitHub) - code
 * transformer-tensorflow (GitHub) - code
 * bert (GitHub) - code
 * Creating an open speech recognition dataset for (almost) any language - blog post
 * How To Create Natural Language Semantic Search For Arbitrary Objects With Deep Learning - blog post
 * How To Create Data Products That Are Magical Using Sequence-to-Sequence Models - blog post
 * Self-Attention Mechanisms in Natural Language Processing - blog post
 * nmt (GitHub) - code
 * Frontiers in Natural Language Processing
 * Frontiers in Natural Language Processing Expert Responses
 * Building a Conversational Chatbot for Slack using Rasa and Python (Part 1, Part 2)
 * CMU-MOSI - dataset
 * MOUD - dataset
 * nlp_tasks (GitHub) - NLP Tasks and References
 * 3 silver bullets of word embeddings in NLP - blog post
 * seq2seq-chatbot (GitHub) - code
 * Evaluating Text Output in NLP: BLEU at your own risk - blog post
 * Introduction to Transformers Architecture - blog post
 * word-embeddings-for-nmt (GitHub) - code
 * How Do Task-Oriented Dialogue Systems Work and What Benefits They Bring for Business - blog post
 * Deep Learning for Chatbots, Part 1 – Introduction - blog post
 * Better Language Models and Their Implications - blog post