Create a Strong Text Classification with the Help from ELMo

What is ELMo indeed ?

Elmo embedding, developed by Allen NLP, is a state-of-the-art pre-trained model available on Tensorflow Hub. Elmo embeddings are learned from the internal state of a bidirectional LSTM and represent contextual features of the input text. It’s been shown to outperform previously existing pre-trained word embeddings like word2vec and glove on a wide variety of NLP tasks. Some of those tasks are Question Answering, Named Entity Extraction and Sentiment Analysis.

ELMo is a deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). These word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. They can be easily added to existing models and significantly improve the state of the art across a broad range of challenging NLP problems, including question answering, textual entailment and sentiment analysis.

What is the tensorflow hub?

TensorFlow Hub is a platform to publish, discover, and reuse parts of machine learning modules in TensorFlow. By a module, Imean a self-contained piece of a TensorFlow graph, along with its weights, that can be reused across other, similar tasks. By reusing a module, a developer can train a model using a smaller dataset, improve generalization, or simply speed up training. Let’s look at a couple examples to make this concrete.

The tensorflow hub plays a role as the future for machine learning application oriented application because it is a very easy way to help developer to integration the pre-trained ML into their applications effortlessly

Use Pretrained ELMo from tensorflow hub to create a text classifier

The data we use in this small project is from the kaggle competition “toxic comment classification”. the theme is following:

Discussing things you care about can be difficult. The threat of abuse and harassment online means that many people stop expressing themselves and give up on seeking different opinions. Platforms struggle to effectively facilitate conversations, leading many communities to limit or completely shut down user comments.

The Conversation AI team, a research initiative founded by Jigsaw and Google (both a part of Alphabet) are working on tools to help improve online conversation. One area of focus is the study of negative online behaviors, like toxic comments (i.e. comments that are rude, disrespectful or otherwise likely to make someone leave a discussion). So far they’ve built a range of publicly available models served through the Perspective API, including toxicity. But the current models still make errors, and they don’t allow users to select which types of toxicity they’re interested in finding (e.g. some platforms may be fine with profanity, but not with other types of toxic content).

In this competition, you’re challenged to build a multi-headed model that’s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate better than Perspective’s current models. You’ll be using a dataset of comments from Wikipedia’s talk page edits. Improvements to the current model will hopefully help online discussion become more productive and respectful.

Toxic Comment Classification Challenge

the data for this competition can be downloaded from the following linked

The very first step is to process the train and test data to make them list of string:

then you can create the embedding layer by using tensorflow hub

now we can build a 2 dense layers

it very surprising that at the very first epoch of training we already got the very promising result

And that’s it! There are lots of great models on tensorflow hub, make sure to experiment with them all!

AI Researcher - NLP Practitioner