Text Classification
Text Classification
Have you seen your parents booking pizza for you? Or purchasing clothes for you? When you
look at the apps, you can see many categories like veg pizza, non-veg pizza, sides, desserts,
and many more options. Another example is your school classes, where you are classified
into specific classes according to grades.
Similarly, when we look at the online content there is huge data that needs to classify to use
it. So, how you classify that? Well, the machine-learning model can help you with that. To do
that first you need to train your model to classify the data. There are various algorithms, which
will help you train your model to classify the data. We will look into algorithms later first, let
us deep-dive into text classification.
For example, News articles are classified by topics like sports news, entertainment news,
politics, etc.
Did you know? Sentiment analysis that you learned in the previous class is also a type of text
classification, where you classified the text under three categories namely positive, negative,
and neutral classes.
That is assigning a number to each word for a given input. Let us take an example and
understand how it works. Here is our sentence.
“The quick brown fox jumped over the lazy dog” once we input the data, the
tokenization process is carried out. That is assigning a random number to each word. This
sentence includes nine words in which “the” is repeating. So, the repeating words are ignored
and calculated as one word as shown below. The words are numbered from 0 to 7 which is 8
words.
[ 1, 1, 1, 1, 1, 1, 1, 2]
As “the” is repeating twice we have encoded as 2. This process is repeated for all the input
data.
Term Frequency: This summarizes how often a given word appears within a document.
Inverse document frequency: This downscales words that appear a lot across the document.
To understand this let us add two more sentences to our example that is,
The last keyword “the” has the least weightage and it is least important according to IDF.
Step 4: After all these steps our machine-learning model uses an algorithm to train the model.