NLP Fundamentals — Text Classifier(P2)

A taxonomy of text classification


A Pipeline for Building Text Classification Systems

text classification pine-line
  1. Collect or create a labeled dataset suitable for the task
  2. Split the dataset into two (train and test) or three parts — train, validation (a.k.a development) and test set, decide on evaluation metric(s)
  3. Transform raw text into feature vectors
  4. Train a classifier using the feature vectors and the corresponding labels from the training set
  5. Using the evaluation metric(s) from step 2, benchmark the model performance on the test set
  6. Deploy the model to serve the real-world use case and monitor its performance.

One Pipeline, Many Classifiers

Naive Bayes Classifier

  1. Since we extracted all possible features, we ended up in a large, sparse feature vector, where most features are too rare and end up being noise. Sparse feature set also makes training hard.
  2. There are very few examples of relevant articles (~20%) compared to the non-relevant articles (~80%) in the dataset. This class imbalance makes the learning process skewed towards the non-relevant articles category as there are very few examples of “relevant” articles.
  3. Perhaps we need a better learning algorithm
  4. Perhaps we need a better pre-processing and feature extraction mechanism
  5. Perhaps we should look for tuning the classifier’s parameters and hyper-parameters

Logistic Regression




AI Researcher - NLP Practitioner

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Albert Vectorization (With Tensorflow Hub)

Neural Networks: Error-Prediction Layers

Tutorial: A detailed notebook on Keras Sequential API (Tensorflow 2.0)

Image Classification using pre-trained DenseNet model in PyTorch

MLDS — Sequence Labelling

Machine Learning in the Enterprise: Lessons from the Front Lines

How to use Model Explanation in BigQueryML

Slowing it Down

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Duy Anh Nguyen

Duy Anh Nguyen

AI Researcher - NLP Practitioner

More from Medium

NLP with Disaster Tweets

CNN Application in NLP


Modeling UML Basic Level

Testing stripe webhook in flask.