0% found this document useful (0 votes)
11 views

Lect05

Uploaded by

rodrigoferraribr
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lect05

Uploaded by

rodrigoferraribr
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Text Classification

By Ivan Wong
Introduction
• In machine learning, classification is the problem of categorizing a
data instance into one or more known classes.
• The data point can be originally of different formats, such as text, speech,
image, or numeric.
• Text classification is a special instance of the classification
problem, where the input data point(s) is text and the goal is to
categorize the piece of text into one or more buckets (called a
class) from a set of pre-defined buckets (classes).
• The “text” can be of arbitrary length: a character, a word, a
sentence, a paragraph, or a full document.
Introduction
• Any supervised classification approach, including text
classification, can be further distinguished into three types based
on the number of categories involved:
• Binary [spam or not-spam],
• Multiclass [news categories], and
• multilabel classification [a tweet may have different emotions].
Applications
• Text classification has been of interest in a number of application
scenarios, ranging from identifying the author of an unknown text
in the 1800s to the efforts of USPS in the 1960s to perform optical
character recognition on addresses and zip codes.
• In the 1990s, researchers began to successfully apply ML
algorithms for text classification for large datasets.
• Email filtering, popularly known as “spam classification,” is one of
the earliest examples of automatic text classification, which
impacts our lives to this day.
Applications (Cont.)
• Content classification and organization: This refers to the task
of classifying/tagging large amounts of textual data. This, in turn,
is used to power use cases like content organization, search
engines, and recommendation systems, to name a few.
• Customer support: Customers often use social media to express
their opinions about and experiences of products or services. Text
classification is often used to identify the tweets that brands must
respond to and those that don’t require a response (i.e., noise).
Applications (Cont.)
• E-commerce: Customers leave reviews for a range of products on
e-commerce websites like Amazon, eBay, etc. An example use of
text classification in this kind of scenario is to understand and
analyze customers’ perception of a product or service based on
their comments.
• Other applications:
• language identification
• Authorship attribution
• Triaging posts in an online support forum
A Pipeline for
Building Text
Classification
Systems
Naive Bayes Classifier
• Naive Bayes is a probabilistic classifier that uses Bayes’ theorem
to classify texts based on the evidence seen in training data.
• It estimates the conditional probability of each feature of a given
text for each class based on the occurrence of that feature in that
class and multiplies the probabilities of all the features of a given
text to compute the final probability of classification for each
class.
• Finally, it chooses the class with maximum probability.
• Details: https://web.stanford.edu/~jurafsky/slp3/
Potential reasons for poor classifier
performance
Reason 1 Since we extracted all possible features, we ended up in a
large, sparse feature vector, where most features are too
rare and end up being noise. A sparse feature set also
makes training hard.
Reason 2 There are very few examples of relevant articles (~20%)
compared to the non-relevant articles (~80%) in the
dataset. This class imbalance makes the learning process
skewed toward the non-relevant articles category, as there
are very few examples of “relevant” articles.
Reason 3 Perhaps we need a better learning algorithm.
Reason 4 Perhaps we need a better pre-processing and feature
extraction mechanism.
Reason 5 Perhaps we should look to tuning the classifier’s
parameters and hyperparameters.
Logistic Regression
• Unlike Naive Bayes, which estimates probabilities based on
feature occurrence in classes, logistic regression “learns” the
weights for individual features based on how important they are to
make a classification decision.
• The goal of logistic regression is to learn a linear separator
between classes in the training data with the aim of maximizing
the probability of the data.
Our logistic regression classifier instantiation has an argument class_weight, which is
given a value “balanced”. This tells the classifier to boost the weights for classes in
inverse proportion to the number of samples for that class.
Logistic Regression (Cont.)
Support Vector Machine
• A support vector machine (SVM), first invented in the early 1960s,
is a discriminative classifier like logistic regression.
• However, unlike logistic regression, it aims to look for an optimal
hyperplane in a higher dimensional space, which can separate the
classes in the data by a maximum possible margin.
• Further, SVMs are capable of learning even non-linear separations
between classes, unlike logistic regression.
• However, they may also take longer to train.
Support Vector Machine (Cont.)

Ref.: https://scikit-learn.org/stable/modules/svm.html
Using Neural Embeddings in Text
Classification
• In feature engineering, techniques using neural networks, such as
word embeddings, character embeddings, and document
embeddings have the advantage that they create a dense, low-
dimensional feature representation instead of the sparse, high-
dimensional structure of BoW/TF-IDF and other such features.
• There are different ways of designing and using features based on
neural embeddings.
Word Embeddings
• Doc2Vec
• In the Doc2vec embedding scheme, we learn a direct representation for the
entire document (sentence/paragraph) rather than each word.
• Word2Vec
• We’ll use a pre-trained embedding model.
• Subword Embeddings and fastText
• If a word in our dataset was not present in the pre-trained model’s vocabulary,
how will we get a representation for this word? Out of vocabulary (OOV).
• fastText - They’re based on the idea of enriching word embeddings with
subword-level information.
• The embedding representation for each word is represented as a sum of the
representations of individual character n-grams.
Deep Learning for Text Classification
• Two of the most commonly used neural network architectures for
text classification are convolutional neural networks (CNNs) and
recurrent neural networks (RNNs).
• Long short-term memory (LSTM) networks are a popular form of
RNNs.
• Recent approaches also involve starting with large, pre-trained
language models and fine-tuning them for the task at hand.
Deep Learning for Text Classification (Cont.)
1. Tokenize the texts and convert them into word index vectors.
2. Pad the text sequences so that all text vectors are of the same
length.
3. Map every word index to an embedding vector. We do that by
multiplying word index vectors with the embedding matrix. The
embedding matrix can either be populated using pre-trained
embeddings or it can be trained for embeddings on this corpus.
4. Use the output from Step 3 as the input to a neural network
architecture.

You might also like