Natural Language Processing (NLP) Road Map

I have been asked question about how to get start and become “fluent” in natural language processing and i think it is probably a good idea to turn it into a blog post. Here are my “honest” answers to the question “I can programming in python and I am interesting NLP Where do I start?”

In my point of view, NLP is a very wide field which has a lot of applications such as dialog engine, machine translate, Information extraction, information retrieval…etc. So The following list is just a very general starting point and likely incomplete


I highly recommend you start with books. Today it’s far more popular for you to look for an online tutorial from youtube for and free online school; However when you get to start something; it’s better with book because it can provide you time to think and digest all the new knowledge. If you really keep calm and read the book you can get a good foundation in NLP

The first book i recommend is “Natural Language processing with python”. This book help you to get familiar with basic task in NLP such as tokenization, text normalization, part of speech tagging (POS) using only python and classical NLP library NLTK without any fancy stuffs such as “deep learning” to tackle basic classification problem in NLP by using hand craft features and shallow machine learning

The next book i recommend is “Speech and Language Processing”. This book would give you in-depth foundation theory about NLP. It tackle nearly every active research and application fields of NLP. After you read this book, you can understand what NLP is capable of and its limitations. This book is all about theory without a line of code so you should arrange “proper” time to read it

The third book is “Natural language Processing in Action”. This Book get you to the real word application of NLP such as chat bot system and also drive you to the most bleeding edge technology in NLP which is applying Deep learning in solving NLP tasks. You will learn about semantic word vector model (word2vec), study how to apply recurrent neural network and convolutional neural network to text classification problems and more…

Online courses

The online course is helpful only you get the foundation in NLP so that you are not “driven” by the video content.

Dan Jurafsky & Chris Manning: Natural Language Processing could be a very introductory course that help you lear and enjoy the contents at the same time. This is not hard in term of required knowledge and coding skills so i believe it is a fun and exciting journey

Sequence Models is an exciting course by out beloved professor Dr. Andrew Ng. In this course, you can get a chance to see how deep learning can be used to solve NLP task by considering the NLP elements which are sentences are sequence models. you will Understand how to build and train Recurrent Neural Networks (RNNs), and commonly-used variants such as GRUs and LSTMs, you can be able to apply sequence models to natural language problems, including text synthesis, also be able to apply sequence models to audio applications, including speech recognition and music synthesis

Natural Language Processing. This course covers a wide range of tasks in Natural Language Processing from basic to advanced: sentiment analysis, summarization, dialogue state tracking, to name a few. Upon completing, you will be able to recognize NLP tasks in your day-to-day work, propose approaches, and judge what techniques are likely to work well. The final project is devoted to one of the most hot topics in today’s NLP. You will build your own conversational chat-bot that will assist with search on StackOverflow website. The project will be based on practical assignments of the course, that will give you hands-on experience with such tasks as text classification, named entities recognition, and duplicates detection

Stanford CS224d: Deep Learning for Natural Language Processing. This cover more advanced ML algorithms, deep learning, and NN architectures for NLP

DIY projects and data sets

You can find a thorough list of public NLP data sets create by a NLP folk. Here are some projects i can recommend to NLP novice to get their hands dirty:

  • Implement semantic similarity between two given words in a collection of text, e.g. pointwise mutual information (PMI)
  • Implement a Naive Bayes classifier to filter spam
  • Implement a spell checker based on edit distances between words
  • Implement a Markov chain text generator
  • Implement a topic model using latent Dirichlet allocation (LDA)
  • Use word2vec to generate word embeddings from a large text corpus, e.g. Wikipedia
  • Use k-means to cluster tf-idf vectors of text, e.g. news articles
  • Implement a named-entity recognizer (NER) (also called a name tagger), e.g. following the CoNLL-2003 shared task

AI Researcher - NLP Practitioner