NLP Fundamentals — Sequence Modeling (P5)

A sequence is an ordered collection of items. Traditional machine learning assumes data points to be independently and identically distributed (IID), but in many situations, like with language, speech, and time-series data, one data item depends on the items that precede or follow it. Such data is also called sequence data. Sequential information is everywhere in human language. For example, speech can be considered a sequence of basic units called phonemes. In a language like English, words in a sentence are not haphazard. They might be constrained by the words that come before or after them.

In deep learning, modeling sequences involves maintaining hidden “state information,” or a hidden state. As each item in the sequence is encountered — for example, as each word in a sentence is seen by the model — the hidden state is updated. Thus, the hidden state (usually a vector) encapsulates everything seen by the sequence so far. This hidden state vector, also called a sequence representation, can then be used in many sequence modeling tasks in myriad ways depending on the task we are solving, ranging from classifying sequences to predicting sequences.

We begin by introducing the most basic neural network sequence model: the recurrent neural network (RNN). After this, we present an end-to-end example of the RNN in a classification setting.

Introduction to Recurrent Neural Networks

Let’s look at a slightly more specific description to understand what is happening in the Elman RNN. As shown in the unrolled view in Figure 6–1, also known as backpropagation through time (BPTT), the input vector from the current time step and the hidden state vector from the previous time step are mapped to the hidden state vector of the current time step.

Crucially, the hidden-to-hidden and input-to-hidden weights are shared across the different time steps. The intuition you should take away from this fact is that, during training, these weights will be adjusted so that the RNN is learning how to incorporate incoming information and maintain a state representation summarizing the input seen so far. The RNN does not have any way of knowing which time step it is on. Instead, it is simply learning how to transition from one time step to another and maintain a state representation that will minimize its loss function.

Because words and sentences can be of different lengths, the RNN or any sequence model should be equipped to handle variable-length sequences. One possible technique is to restrict sequences to a fixed length artificially.

Implementing an Elman RNN

Example for using RNN to classify name

Data could be found at

AI Researcher - NLP Practitioner

AI Researcher - NLP Practitioner