Text Classification

Text classification is a process that assigns predefined categories to text. It works by first inputting text data and assigning numbers to each unique word. It then calculates term frequency to see how often each word appears and inverse document frequency to downscale common words. The text is then encoded as a list of word counts. A machine learning model is trained on this encoded text using an algorithm to learn how to classify new text. The trained model can then be tested on new text data.

Uploaded by

Shravya M

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views

Text Classification

Uploaded by

Shravya M

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Text Classification

Have you seen your parents booking pizza for you? Or purchasing clothes for you? When you
look at the apps, you can see many categories like veg pizza, non-veg pizza, sides, desserts,
and many more options. Another example is your school classes, where you are classified
into specific classes according to grades.

Similarly, when we look at the online content there is huge data that needs to classify to use
it. So, how you classify that? Well, the machine-learning model can help you with that. To do
that first you need to train your model to classify the data. There are various algorithms, which
will help you train your model to classify the data. We will look into algorithms later first, let
us deep-dive into text classification.

What is text classification?

Text classification is a process of assigning a set of pre-defined categories to available text.

Text classification can be used to organize the structure and categorize the free-text.

For example, News articles are classified by topics like sports news, entertainment news,
politics, etc.

Did you know? Sentiment analysis that you learned in the previous class is also a type of text
classification, where you classified the text under three categories namely positive, negative,
and neutral classes.

How does it work?

Step 1: The first step in text classification is to input the data

Step 2: Word count / Count Vectorization

That is assigning a number to each word for a given input. Let us take an example and
understand how it works. Here is our sentence.

“The quick brown fox jumped over the lazy dog” once we input the data, the
tokenization process is carried out. That is assigning a random number to each word. This
sentence includes nine words in which “the” is repeating. So, the repeating words are ignored
and calculated as one word as shown below. The words are numbered from 0 to 7 which is 8
words.

The:7, lazy:4, Jumped:3, brown:0, over:5, quick: 6, dog:1, fox:2

Now, let us re-arrange the words and according to tokens.

Brown, dog, fox, jumped, lazy, over, quick, the

Once we re-arrange the tokens, we need to encode the sentence/input data. That is counting
the occurrence of each word. That the number of times the word “brown” in the sentence.

[Brown, dog, fox, jumped, lazy, over, quick, the]

[ 1, 1, 1, 1, 1, 1, 1, 2]

As “the” is repeating twice we have encoded as 2. This process is repeated for all the input
data.

Step 3: TF and IDF

Term Frequency: This summarizes how often a given word appears within a document.

Inverse document frequency: This downscales words that appear a lot across the document.

To understand this let us add two more sentences to our example that is,

“The quick brown fox jumped over the lazy dog”

“The dog”
“The fox”
Using the formula, IDF calculates the weight of each word. In this, you can see that the word
“the” is repeating 4 times.

[1.69, 1.28, 1.28, 1.69, 1.68, 1.69, 1.69, 1]

The last keyword “the” has the least weightage and it is least important according to IDF.

Step 4: After all these steps our machine-learning model uses an algorithm to train the model.

Step 5: Test the trained model.

This is how text classification works.

Text Classification steps:

Data Mining Project Report
100% (2)
Data Mining Project Report
5 pages
AIUnit 6 10
No ratings yet
AIUnit 6 10
8 pages
Subjective Ai 417 2023
No ratings yet
Subjective Ai 417 2023
43 pages
Detecting Spam in Emails. Applying NLP and Deep Learning For Spam - by Ramya Vidiyala - Towards Data Science
No ratings yet
Detecting Spam in Emails. Applying NLP and Deep Learning For Spam - by Ramya Vidiyala - Towards Data Science
23 pages
Machine Learning System Design PDF
100% (1)
Machine Learning System Design PDF
14 pages
Discriptive statics
No ratings yet
Discriptive statics
4 pages
OMAC Data Analyst
No ratings yet
OMAC Data Analyst
91 pages
Natural Language Processing
No ratings yet
Natural Language Processing
10 pages
ML Unit-1
No ratings yet
ML Unit-1
28 pages
Software Development Fundamentals
No ratings yet
Software Development Fundamentals
14 pages
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
No ratings yet
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
18 pages
AI - Phase 4
No ratings yet
AI - Phase 4
11 pages
Text Analysis Monkeylearncom
No ratings yet
Text Analysis Monkeylearncom
46 pages
UNIT 6- NLP NOTES
No ratings yet
UNIT 6- NLP NOTES
7 pages
Transformer Architecture explained in LLMs
No ratings yet
Transformer Architecture explained in LLMs
2 pages
Unit iv
No ratings yet
Unit iv
57 pages
Learning Python: From Zero To Hero: by TK
No ratings yet
Learning Python: From Zero To Hero: by TK
23 pages
03 The-Different-Methods-Deal-Text-Data-Predictive-Python
No ratings yet
03 The-Different-Methods-Deal-Text-Data-Predictive-Python
16 pages
Data Structures
No ratings yet
Data Structures
136 pages
NLP
No ratings yet
NLP
61 pages
NLP Practicals
No ratings yet
NLP Practicals
54 pages
STE Computer Programming - Q4 MODULE 7
No ratings yet
STE Computer Programming - Q4 MODULE 7
25 pages
Step 1: Ask Questions
No ratings yet
Step 1: Ask Questions
30 pages
Module 1 - ML
No ratings yet
Module 1 - ML
26 pages
Spam Classification
No ratings yet
Spam Classification
8 pages
UNIT 1
No ratings yet
UNIT 1
12 pages
TSP Unit1 Own
No ratings yet
TSP Unit1 Own
13 pages
NLP (4)
No ratings yet
NLP (4)
40 pages
DSA NOTES
No ratings yet
DSA NOTES
510 pages
ML LAB EXP 1
No ratings yet
ML LAB EXP 1
5 pages
Class 2
No ratings yet
Class 2
62 pages
Binary Search Homework
100% (1)
Binary Search Homework
5 pages
Concept of Variables For Kids
100% (1)
Concept of Variables For Kids
8 pages
Top 30 NLP Interview Questions and Answers: 1. What Do You Understand by Natural Language Processing?
No ratings yet
Top 30 NLP Interview Questions and Answers: 1. What Do You Understand by Natural Language Processing?
18 pages
13 Chapter 6 PSO GA DT
No ratings yet
13 Chapter 6 PSO GA DT
11 pages
03_welcome-object-oriented-java-programming-data-structures-and-beyond.en
No ratings yet
03_welcome-object-oriented-java-programming-data-structures-and-beyond.en
1 page
nlp
No ratings yet
nlp
71 pages
Solution of Oops
No ratings yet
Solution of Oops
13 pages
Classification
No ratings yet
Classification
21 pages
SentA Russir Day2
No ratings yet
SentA Russir Day2
33 pages
Deep Learning in Practice Project Two: NLP of The Holy Quran in Python
No ratings yet
Deep Learning in Practice Project Two: NLP of The Holy Quran in Python
11 pages
ISB - BA - W9 - Video Transcripts
No ratings yet
ISB - BA - W9 - Video Transcripts
12 pages
NLP For ML - Spam Classifier
No ratings yet
NLP For ML - Spam Classifier
14 pages
Machine Learning With Boosting
100% (1)
Machine Learning With Boosting
212 pages
Part 4: Implementing The Solution in Python
No ratings yet
Part 4: Implementing The Solution in Python
5 pages
UNIT V (1)
No ratings yet
UNIT V (1)
22 pages
Introduction To Machine Learning Top-Down Approach - Towards Data Science
No ratings yet
Introduction To Machine Learning Top-Down Approach - Towards Data Science
6 pages
Transformer Explained
No ratings yet
Transformer Explained
29 pages
3 Introduction To Machine Learning
No ratings yet
3 Introduction To Machine Learning
21 pages
Module 3
No ratings yet
Module 3
40 pages
Script 1 2
No ratings yet
Script 1 2
11 pages
Introduction and Descriptive Stats 1
No ratings yet
Introduction and Descriptive Stats 1
18 pages
Simoes Masters Document
No ratings yet
Simoes Masters Document
18 pages
K Means Kkwc3f
No ratings yet
K Means Kkwc3f
19 pages
Machine Learnimg Notes
No ratings yet
Machine Learnimg Notes
13 pages
UNIT-3 Foundations of Deep Learning
No ratings yet
UNIT-3 Foundations of Deep Learning
32 pages
Data Analysis Fundamentals
100% (4)
Data Analysis Fundamentals
56 pages
Training Guide Final
No ratings yet
Training Guide Final
34 pages
Business Writing Skills: 3 Quick & Easy ImprovementsYou can Make Today
From Everand
Business Writing Skills: 3 Quick & Easy ImprovementsYou can Make Today
Robert F. Abbott
No ratings yet
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
From Everand
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Alexandra George
No ratings yet
Branches of Science
100% (3)
Branches of Science
4 pages
A Study of Cash Waqf Contribution
No ratings yet
A Study of Cash Waqf Contribution
17 pages
Module 3: Responsible Use of Media and Information
No ratings yet
Module 3: Responsible Use of Media and Information
3 pages
M3 Lesson 3.1
No ratings yet
M3 Lesson 3.1
8 pages
1.3.2 Leading Authority and Responsibility Relationships, Delegation of Authority, and Decentralization
No ratings yet
1.3.2 Leading Authority and Responsibility Relationships, Delegation of Authority, and Decentralization
4 pages
2022 Annual Assessment Kieran Vella
No ratings yet
2022 Annual Assessment Kieran Vella
6 pages
Model Project Based Learning Berbantuan Media Powerpoint Pada Kurikulum Merdeka Di Sekolah Dasar
No ratings yet
Model Project Based Learning Berbantuan Media Powerpoint Pada Kurikulum Merdeka Di Sekolah Dasar
20 pages
دور المسؤولية الإجتماعية
No ratings yet
دور المسؤولية الإجتماعية
166 pages
The Malaysian School Inspectorate As An Institution of Quality PDF
No ratings yet
The Malaysian School Inspectorate As An Institution of Quality PDF
14 pages
Holy Child High School, Inc.: Teacher'S Program Senior High School Friday and Saturday
No ratings yet
Holy Child High School, Inc.: Teacher'S Program Senior High School Friday and Saturday
3 pages
Students Learning Contract
No ratings yet
Students Learning Contract
3 pages
Emotional Intelligence Notes
No ratings yet
Emotional Intelligence Notes
9 pages
Philosophy of Education 2023
No ratings yet
Philosophy of Education 2023
1 page
General Instructions:: Section A: Objective Type Questions
No ratings yet
General Instructions:: Section A: Objective Type Questions
8 pages
Ujjwal - Madawat - Resume - 15 06 2023 20 24 50
No ratings yet
Ujjwal - Madawat - Resume - 15 06 2023 20 24 50
2 pages
Ebooks File Hybrid Information Systems Non Linear Optimization Strategie With Artificial Intelligence 1st Edition Ramakant Bhardwaj All Chapters
100% (2)
Ebooks File Hybrid Information Systems Non Linear Optimization Strategie With Artificial Intelligence 1st Edition Ramakant Bhardwaj All Chapters
64 pages
9 - The Oxford Handbook of Qualitative Research by Patricia Leavy-162-185
No ratings yet
9 - The Oxford Handbook of Qualitative Research by Patricia Leavy-162-185
24 pages
Business Analytics
100% (1)
Business Analytics
10 pages
National Teachers College RESEARCH PROJECT PDF
No ratings yet
National Teachers College RESEARCH PROJECT PDF
19 pages
Reviews: British Journal of Educational Technology Vol 40 No 4 2009
No ratings yet
Reviews: British Journal of Educational Technology Vol 40 No 4 2009
15 pages
Gap Between Theory and Practice in The Nursing
No ratings yet
Gap Between Theory and Practice in The Nursing
7 pages
Arfi 2021
No ratings yet
Arfi 2021
15 pages
Chapter 10 Decision Support Systems
No ratings yet
Chapter 10 Decision Support Systems
42 pages
Inside The CRI Reference Guide
No ratings yet
Inside The CRI Reference Guide
252 pages
Unity and Fragmentattion in Psychology - Gaj. N. - 2016
No ratings yet
Unity and Fragmentattion in Psychology - Gaj. N. - 2016
203 pages
10 22492 Ijcs 7 1 02
No ratings yet
10 22492 Ijcs 7 1 02
15 pages
Trubib
No ratings yet
Trubib
4 pages
Pyers & Senghas, 2009
No ratings yet
Pyers & Senghas, 2009
9 pages
Detect Biases and Propaganda Devices Used by Speakers
No ratings yet
Detect Biases and Propaganda Devices Used by Speakers
18 pages
Con SJs
No ratings yet
Con SJs
4 pages