NLP Fundamentals — Text Classifier(P2)

A taxonomy of text classification

Applications

A Pipeline for Building Text Classification Systems

text classification pine-line
  1. Collect or create a labeled dataset suitable for the task
  2. Split the dataset into two (train and test) or three parts — train, validation (a.k.a development) and test set, decide on evaluation metric(s)
  3. Transform raw text into feature vectors
  4. Train a classifier using the feature vectors and the corresponding labels from the training set
  5. Using the evaluation metric(s) from step 2, benchmark the model performance on the test set
  6. Deploy the model to serve the real-world use case and monitor its performance.

One Pipeline, Many Classifiers

Naive Bayes Classifier

  1. Since we extracted all possible features, we ended up in a large, sparse feature vector, where most features are too rare and end up being noise. Sparse feature set also makes training hard.
  2. There are very few examples of relevant articles (~20%) compared to the non-relevant articles (~80%) in the dataset. This class imbalance makes the learning process skewed towards the non-relevant articles category as there are very few examples of “relevant” articles.
  3. Perhaps we need a better learning algorithm
  4. Perhaps we need a better pre-processing and feature extraction mechanism
  5. Perhaps we should look for tuning the classifier’s parameters and hyper-parameters

Logistic Regression

--

--

--

AI Researcher - NLP Practitioner

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Energy Forecasting using Artificial Neural Networks

Cutting edge NLP for HRTech

AI Algorithms: A Walkthrough of the Strategy

Believe Half of what you see, and None of what of you Hear

Takeaway from AlexNet: ImageNet Classification with Deep Convolutional Neural Networks

Machine Learning-Based Data Quality — Next Frontier for Data Management

Camouflaged Object Detection Using SINet

CNN (Convolution Neural Network)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Duy Anh Nguyen

Duy Anh Nguyen

AI Researcher - NLP Practitioner

More from Medium

Research and Analysis of CyberKongz

Anime Recommendation Engine

Detecting bias in the Swedish language using NLP machine learning models

Bus Track — Case Study