0% found this document useful (0 votes)

18 views49 pages

Nitsa Herzog's Guide to Text Analysis

Uploaded by

priteshbari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views49 pages

Nitsa Herzog's Guide to Text Analysis

Uploaded by

priteshbari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Text Analysis

Dr Nitsa Herzog

Nitsa Herzog Text Analysis 1

What is Text Analysis

• The process of computationally retrieving information from text, such as

books, articles, emails, speeches, and social media posts.
• Text analysis or text analytics refers to the representation, processing, and
modelling of textual data to derive useful insights.
• Text analysis uses natural language processing (NLP) to transform the free
(unstructured) text in documents and databases into normalized,
structured data suitable for analysis or to drive machine learning (ML)
algorithms.
• Text analysis often suffers from the curse of high dimensionality, where
each word is a dimension.

Nitsa Herzog Text Analysis 2

Natural Language Processing

• Natural language processing (NLP) combines computational linguistics,

machine learning, and deep learning models to process human language.
• Computational linguistics is the science of understanding and constructing
human language models with computers and software tools.

Nitsa Herzog Text Analysis 3

• Text to speech: Converting text-to-speech data,
then reproducing the text as natural-sounding
speech

NLP • Chatbots: Helping chatbots understand and

respond to customer inquiries
Business • Urgency detection: Analyzing language to
prioritize tasks
Application • Natural language understanding: Converting

s speech to text and analyzing its intent

• Autocorrect: Detecting and removing text errors
and suggesting alternatives
• Sentiment analysis: Revealing the perceptions
people have of your goods and services and
those of your competitors
• Speech recognition: Powering applications that
understand users’ voices and decipher their
meaning
Nitsa Herzog Text Analysis 4
Lexical analysis.

Syntactic analysis.

Semantic analysis.
NLP Steps
Discourse integration.

Pragmatic analysis.

Nitsa Herzog Text Analysis 5

NLP Steps – Lexical Analysis
• Lexicon describes the understandable vocabulary that makes up a language.
• Lexical analysis deciphers and segments language into units—or lexemes—like
paragraphs, sentences, phrases, and words.
• NLP algorithms categorize words into parts of speech (POS) and split lexemes into
morphemes—meaningful language units that you can’t further divide.
Example
When performing a lexical analysis on the paragraph, the analysis isolates and
divides the first sentence into lexeme phrases, like “the understandable
vocabulary that makes up a language.”
This analysis further divides the phrase into word lexemes, like “vocabulary” and
“language,” categorizing both as noun POS.
Then, the analysis derives free morphemes, like “words,” “vocabulary,” and
“understand-,” and bound morphemes, like “-able.”
Nitsa Herzog Text Analysis 6
NLP Steps – Syntactic Analysis
• Syntax describes how a language’s words and phrases are arranged to
form sentences. Syntactic analysis checks word arrangements for
proper grammar.
Example
The sentence “Dave wrote the paper” passes a syntactic analysis check because it’s grammatically
correct. Conversely, a syntactic analysis categorizes a sentence like “Dave do jumps” as syntactically
incorrect.

Nitsa Herzog Text Analysis 7

NLP Steps – Semantic Analysis
• Semantics describe the meaning of words, phrases, sentences, and
paragraphs.
• Semantic analysis attempts to understand the literal meaning of
individual language selections, not syntactic correctness.
• However, a semantic analysis doesn’t check language data before and
after a selection to clarify its meaning.
Example
“Manhattan calls out to Dave” passes a syntactic analysis because it’s a grammatically
correct sentence.
However, it fails a semantic analysis, because Manhattan is a place (and can’t literally call out
to people), and the sentence’s meaning doesn’t make sense.
Nitsa Herzog Text Analysis 8
NLP Steps – Discourse Integration
• Discourse describes communication between 2 or more individuals.
• Discourse integration analyzes prior words and sentences to
understand the meaning of ambiguous language.
Example
If one sentence reads, “Manhattan speaks to all its people,” and the following sentence reads, “It
calls out to Dave,” discourse integration checks the first sentence for context to understand that “It”
in the latter sentence refers to Manhattan.

Nitsa Herzog Text Analysis 9

NLP Steps – Pragmatic Analysis
• Pragmatism describes the interpretation of language’s intended
meaning.
• Pragmatic analysis attempts to derive language’s intended—not
literal—meaning.
Example
A pragmatic analysis can uncover the intended meaning of “Manhattan speaks to all its
people.”
Methods like neural networks assess the context to understand that the sentence isn’t
literal, and most people won’t interpret it as such.
A pragmatic analysis deduces that this sentence is a metaphor for how people emotionally
connect with places.
Nitsa Herzog Text Analysis 10
Natural Language Natural Language Understanding –

Processing analyses the syntactic structure of

language and derive semantic meaning
Approach Examples
○ Speech Recognition
○ Named Entity Recognition
○ Text Classification

Natural Language Generation –

produces natural written or spoken
language from structured and
unstructured data.
Examples
○ Text Generation (a college essay written by
PaLM or GPT)
○ Speech Generation (found in virtual
assistants)
○ Question Answering

Nitsa Herzog Text Analysis 11

Natural Language Generation: Humor
Berkeley School of Information (2020):
“Our analysis suggests that the state-of-the-art models perform well for
classifying jokes, but quality is uneven when generating. Combined with
what we think is a Clever Hans effect in our classification results, we
think deep learning does not "understand" humour but does pick up on
the patterns of jokes and puns.”

Nitsa Herzog Text Analysis 12

Screengrab from Google I/O 2022 Keynote with Alphabet CEO
Natural Language Sundar Pichai explaining the PaLM AI model and its ability to
understand jokes.
Generation: Google shows how PaLM understands a novel joke
Humor not found on the internet.

Nitsa Herzog Text Analysis 13

What is Text Mining
• Text mining is a component of text analysis.
• Text Mining is the process of
transforming unstructured text into a
structured format to identify
meaningful patterns and new insights.
• Text mining discovers relationships and specific
patterns in large text collections.
• It can be obtained from the web using
web scrapers or web crawlers.

Nitsa Herzog Text Analysis 14

Text Mining and Web Search
Text mining is different from traditional web search.
• In search, the user is typically looking for something already known
and written by someone else.
• In text mining, the users have to sift through all the material currently
irrelevant to their needs in finding the information.

Nitsa Herzog Text Analysis 15

Text Mining and Information Extraction
Text mining is different from information extraction.
• Information Extraction (IE) is about getting facts out of unstructured
information
• There are programs that can, with reasonable accuracy, extract information
from text such as names of people, organisations, locations and so on, and
find relations between them (e.g. John works for the BBC)
• IE is a major component of text mining, but it doesn't tell the whole story
For example,
In a criminal investigation, finding the facts (names of witnesses, alibis for the night of the
murder) is like the IE component. Text mining will be similar to the process of deducting
who could or could not have committed the murder.
Nitsa Herzog Text Analysis 16
Document selection and Involves identifying and retrieving
filtering (IR – information potentially relevant documents
from a large set (e.g., the web) to
retrieval techniques) reduce the search space.
Standard or semantically-
enhanced IR techniques can be
Text Mining Document pre-processing (NLP
used for this.
Involves cleaning and preparing
Stages – natural language processing
techniques)
the documents, e.g. removal of
extraneous information, error
correction, spelling normalisation,
tokenisation, Part-of-speech (POS)
tagging, etc.
Document processing (NLP /
Consists of information extraction
ML / statistical techniques) (Named entity recognition (NER),
relation/event recognition, etc.)
and potentially opinion mining.

Nitsa Herzog Text Analysis 17

Data collection is “free text” – Data is not
well-organized
• Semi-structured (web-pages, server logs, social
networks APIs) or unstructured (texts, news articles,
Challenge emails)

Natural language text contains ambiguities

s in Text on many levels
Mining • Lexical, syntactic, semantic, and pragmatic

Learning techniques for processing text

typically need annotated training examples
• Expensive to acquire at scale

Nitsa Herzog Text Analysis 18

What is Corpora
A corpus (plural:
corpora) is a large
collection of texts used
for various purposes in
Natural Language.

Example Corpora in NLP

Nitsa Herzog Text Analysis 19
.

Text
Preprocess
ing

Nitsa Herzog Text Analysis 20

The first step in text analysis is preprocessing (cleaning) the
corpus:

● Tokenize: parse documents into smaller units, such as

words or phrases
Text ● Remove stop words (e.g., a, the, and, etc.) and
punctuation
Preprocessi ● Use stemming and lemmatisation: standardize words
with similar meaning
ng (2017) ● Normalize: convert to lowercase (carefully: e.g., US vs.
us)

! Preprocessing should be customized based on the type

of corpus.
! Tweets should be preprocessed differently than
academic texts.

Nitsa Herzog Text Analysis 21

The first step in text analysis is preprocessing (cleaning) the
corpus:

● Lower casing
● Removal of Punctuations
● Removal of Stopwords
● Removal of Frequent words
Text ● Removal of Rare words
● Stemming

Preprocessing ● Lemmatization
● Removal of emojis (pictogram, logogram, ideogram, or

(2023) smiley embedded in text and used in electronic messages

and web pages – Wikipedia)
● Removal of emoticons (short for "emotion icon", is a
pictorial representation of a facial expression using
characters—usually punctuation marks, numbers, and
letters—to express a person's feelings, mood, or reaction
without needing to describe it in detail – Wikipedia)
● Conversion of emoticons to words
● Conversion of emojis to words
● Removal of URLs
● Removal of HTML tags
● Chat word conversions
● Spelling correction
Nitsa Herzog Text Analysis 22
POS tagging aims to build a model whose input is a
Part-of- sentence whose output is a tag.

Speech
Example:
(POS) • he saw a fox
Tagging Each tag marks the POS for the corresponding word,
such as:
• PRP VBD DT NN (According to the Penn Treebank
POS tags)
Four words are mapped to
• pronoun (personal), verb (past tense), determiner,
and noun (singular).

Nitsa Herzog Text Analysis 23

Stemming & Lemmatization
A well-known rule-based stemming algorithm is Porter’s stemming
algorithm.
• Goal: standardize words with a similar meaning
• Stemming reduces words to their base, or root, form
Lemmatization makes words grammatically comparable (e.g., am, are,
is be)

Nitsa Herzog Text Analysis 24

Normalization – Case
Folding and Removals
Examples:
• make all words lowercase
• remove any punctuation
• remove unwanted tags

Nitsa Herzog Text Analysis 25

Split up a document into tokens
• Common tokens
• Words: e.g., “hello”, “blue”, “no”, “laptop”, etc.
• Punctuation: e.g., . , “ ‘”!”” ?”, etc.
• Other tokens
Normalizati • A very uncommon word with an unknown token: <NKNOWN>
• End sentences (or sentence-like structures) with a stop token: <STOP>
on - • Replace all numbers with a single token: e.g., 100 → <NUM>
• Replace common words (“a”, “the”, etc.) with <SWRD>
Tokenizatio Example
n “The dog ran in the park joyously!” →
“<SWRD>”, “dog”, “ran”, “<SWRD>”, “<SWRD>”, “park”, “<UNKNOWN>”, “!”,
“<STOP>”

Nitsa Herzog Text Analysis 26

Text Modelling

Nitsa Herzog Text Analysis 27

Text Modelling
• Text modelling is based on topic modelling
• Topic modelling is a type of statistical
modelling that uses unsupervised Machine
Learning to identify clusters or groups of
similar words within a body of text.
• Topic modelling in NLP is a set of
algorithms that can be used to summarise
automatically over a large corpus of texts.
• Text modelling can be represented by a
vector of counts or features for each
distinct word.

Nitsa Herzog Text Analysis 28

Bag-of-Words (BOW) Model
Represents a corpus as an unordered set of
word counts, ignoring stop words.

Example

Nitsa Herzog Text Analysis 29

Bag-of-Words (BOW)
Model
BOW is represented by Term
Frequency (TF).
Term frequency represents the
weight of each term in a
document, and it is proportional to
the number of occurrences of the
term in that document.
The figure shows the 50 most
frequent words and the number of
occurrences from Shakespeare’s
Hamlet.
The frequency of a word is
inversely proportional to its rank in
the frequency table.
Nitsa Herzog Text Analysis 30
Word Clouds

Visualizes words in a document with sizes proportional to how

frequently the words are used

Nitsa Herzog Text Analysis 31

• Bag-of-words takes quite a naïve approach, as order plays
an important role in the semantics of text.
BOW – • With bag-of-words, many texts with different meanings
Final Notes are combined into one form.

For example,
The texts “a dog bites a man” and “a man bites a dog” have very
different meanings, but they share the same representation with
bag-of-words.

• The bag-of-words technique oversimplifies the problem,

but it is still considered a good approach to start with and
is widely used for text analysis.

Nitsa Herzog Text Analysis 32

• Besides extracting the terms, their morphological
features may need to be included.
Morphologi • The morphological features specify additional
cal information about the terms, which may include root
words, affixes, part-of-speech tags, named entities,
Features or intonation (variations of spoken pitch).
• The features from this step contribute to the
downstream analysis in classification or sentiment
analysis.

Nitsa Herzog Text Analysis 33

N-gram Model
N-grams are continuous sequences of words, symbols, or tokens in a document
(corpus).
In technical terms, they can be defined as the neighbouring sequences of items in a
document.
In N-grams, compared to the BOW, the word order is important.

Nitsa Herzog Text Analysis 34

TF-IDF: Term frequency-Inverse document frequency

Term frequency

Inverse document frequency

tfidf(t, d, D) = tf(t, d) * idf(t, D)

Term frequency (TF) is how common a word is, and inverse document frequency (IDF) is how
unique or rare a word is.
Useful for decreasing the weight of common, low-information words.
Nitsa Herzog Text Analysis 35
TF-IDF: Example
Consider a document containing 100 words wherein the word “apple”
appears 5 times. The term frequency (i.e., TF) for apple is then (5 / 100)
= 0.05.
Now, assume we have 10 million documents, and the word “apple”
appears in 1000 of these. Then, the inverse document frequency (i.e.,
IDF) is calculated as log(10,000,000 / 1,000) = 4.
Thus, the TF-IDF weight is the product of these quantities: 0.05 * 4 =
0.20.

Nitsa Herzog Text Analysis 36

TF, DF and IDF:
Example (from
Brown corpus’s
news category)
• TFIDF scores words higher that
appear more often in a
document but occur less often
across all documents in the
corpus.
• Note that TFIDF applies to a term
in a specific document, so the
same term will likely receive
different TFIDF scores in different
documents (because the TF
values may differ).

Nitsa Herzog Text Analysis 37

TF-IDF (Example)
• TFIDF can be used to highlight
the informative words in the
reviews.
• The figure shows a subset of the
reviews in which each word with
a larger font size corresponds to
a higher TFIDF value.
• Each review is considered a
document.

Nitsa Herzog Text Analysis 38

Review
Categorizati
on with LDA
Topic models such as
LDA can categorize the
reviews into topics.

The topics
Nitsa Herzog Text Analysis of 5-start reviews 39
Review
Categorizati
on with LDA

Nitsa Herzog Text Analysis 40

The topics of 1-star reviews
.

Text Analysis and Classification

Nitsa Herzog Text Analysis 41
Document
Classification
• Separates papers according to
the authors and topics
• Performs text analysis of the
paper
• word frequency,
• distribution,
• patterns,
• and meaning.

Nitsa Herzog Text Analysis 42

ML Methods in Text Analysis

Supervised Unsupervised
• Classic ML (the most popular: • Clustering
Naive Bayes(NB), Support • Deep Learning (the most
Vector Machine (SVM) popular: Convolutional
Neural Network (CNN),
Recurrent Neural Network
(RNN)

Nitsa Herzog Text Analysis 43

Document
Clustering
Topic modelling: assign topics
(politics, sports, fashion, etc.)
to documents (e.g., articles or
web pages)

Biomedical articles clustering. Source:

[Link]
Nitsa Herzog Text Analysis 44
Spam Detection (example)

Nitsa Herzog Text Analysis 45

Spam Detection (cont.)
• An email that contains the words hello and friend, but not money and
password:
8 0.0024
3 0.06831

Nitsa Herzog Text Analysis 46

Spam Detection (cont.)
• An email that contains the words hello, money, and password:
4 28 0.00336
1 03 0.0010692

Nitsa Herzog Text Analysis 47

End of Lecture

Nitsa Herzog Text Analysis 48

References
• [Link]
• [Link]
site/lectures/
• [Link]
[Link]
• [Link]
• [Link]
• [Link]
• [Link]
• [Link]
python/
• [Link]
Nitsa Herzog Text Analysis 49

Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
65 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
21 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
35 pages
CCS369: NLP Foundations and Applications
No ratings yet
CCS369: NLP Foundations and Applications
27 pages
Understanding NLP Lexicon and Methods
No ratings yet
Understanding NLP Lexicon and Methods
28 pages
NLP PDF
No ratings yet
NLP PDF
46 pages
NLP Overview and Applications Guide
No ratings yet
NLP Overview and Applications Guide
16 pages
NLP Basics Seminar Overview
No ratings yet
NLP Basics Seminar Overview
21 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
12 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
28 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
14 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
51 pages
NLP and LLM: Foundations and Applications
No ratings yet
NLP and LLM: Foundations and Applications
25 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
27 pages
NLP Mod 1
No ratings yet
NLP Mod 1
26 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
24 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
16 pages
Understanding NLP: Key Concepts & Applications
No ratings yet
Understanding NLP: Key Concepts & Applications
15 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
27 pages
NLP Unit1 Notes
No ratings yet
NLP Unit1 Notes
33 pages
Key Components of Natural Language Processing
No ratings yet
Key Components of Natural Language Processing
53 pages
Overview of Natural Language Processing
100% (1)
Overview of Natural Language Processing
3 pages
Understanding 'EG' in Chat Context
No ratings yet
Understanding 'EG' in Chat Context
13 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
15 pages
NLP Techniques and Applications in AI
No ratings yet
NLP Techniques and Applications in AI
18 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
86 pages
Seminar Report on Natural Language Processing
No ratings yet
Seminar Report on Natural Language Processing
25 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
3 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
27 pages
Natural Language Processing Course Overview
No ratings yet
Natural Language Processing Course Overview
80 pages
CCPM Unit 2 Notes
No ratings yet
CCPM Unit 2 Notes
19 pages
Generative AI 2
No ratings yet
Generative AI 2
31 pages
Components of Natural Language Processing
No ratings yet
Components of Natural Language Processing
5 pages
Lecture 28-27-10 25 Syntactic and Semantic Processing
No ratings yet
Lecture 28-27-10 25 Syntactic and Semantic Processing
11 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
30 pages
Text Analytics and NLP in AI
No ratings yet
Text Analytics and NLP in AI
6 pages
Importance and Phases of NLP Explained
No ratings yet
Importance and Phases of NLP Explained
7 pages
Natural Language Processing Overview
No ratings yet
Natural Language Processing Overview
28 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
16 pages
nlp_unit1
No ratings yet
nlp_unit1
19 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
19 pages
NLP Basics and Applications Overview
No ratings yet
NLP Basics and Applications Overview
7 pages
NLP Overview and Key Techniques
No ratings yet
NLP Overview and Key Techniques
16 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
63 pages
Text and Speech Analysis Overview
No ratings yet
Text and Speech Analysis Overview
28 pages
NLP Tasks and Challenges Overview
No ratings yet
NLP Tasks and Challenges Overview
15 pages
Phases of Natural Language Processing
No ratings yet
Phases of Natural Language Processing
40 pages
Text Mining and NLP Overview
No ratings yet
Text Mining and NLP Overview
22 pages
CCS369: Text and Speech Analysis Notes
No ratings yet
CCS369: Text and Speech Analysis Notes
27 pages
Natural Language Processing Lecture Notes
No ratings yet
Natural Language Processing Lecture Notes
103 pages
Introduction to NLP and Text Mining
No ratings yet
Introduction to NLP and Text Mining
28 pages
CS AI Module - 4
No ratings yet
CS AI Module - 4
18 pages
NLP Techniques and Applications Overview
No ratings yet
NLP Techniques and Applications Overview
29 pages
Natural Language Processing Course Overview
No ratings yet
Natural Language Processing Course Overview
80 pages
Lecture2 Introduction To NLP
No ratings yet
Lecture2 Introduction To NLP
10 pages
Key Components of Natural Language Processing
No ratings yet
Key Components of Natural Language Processing
6 pages
Pragmatic Analysis in NLP Research
No ratings yet
Pragmatic Analysis in NLP Research
13 pages
Hik Ipmd V2.0 201404
No ratings yet
Hik Ipmd V2.0 201404
191 pages
Rethinking Brain Function and Neuroscience
No ratings yet
Rethinking Brain Function and Neuroscience
15 pages
Campus Recruitment for Company Secretaries
No ratings yet
Campus Recruitment for Company Secretaries
5 pages
Gandhi's Unfulfilled Vision of Democracy
No ratings yet
Gandhi's Unfulfilled Vision of Democracy
20 pages
Aunt Jennifer's Tigers: A Feminist Analysis
No ratings yet
Aunt Jennifer's Tigers: A Feminist Analysis
3 pages
Equity and Business Valuation Insights
No ratings yet
Equity and Business Valuation Insights
90 pages
UTS Responses: Business Communication
No ratings yet
UTS Responses: Business Communication
8 pages
Magnetic Graphene Fillers for Thermal Management
No ratings yet
Magnetic Graphene Fillers for Thermal Management
34 pages
Signals & Systems Exam Questions
No ratings yet
Signals & Systems Exam Questions
2 pages
Translation of Tense in Vietnamese
No ratings yet
Translation of Tense in Vietnamese
18 pages
Testbank For Physics An Algebra Based Approach 1st CA Edition by McFarland
100% (2)
Testbank For Physics An Algebra Based Approach 1st CA Edition by McFarland
252 pages
Understanding Electricity for Class 10
100% (1)
Understanding Electricity for Class 10
9 pages
Bronze Age Warfare: Manufacture and Use of Weaponry
No ratings yet
Bronze Age Warfare: Manufacture and Use of Weaponry
20 pages
SSC CGL 2022 Error Detection Guide
No ratings yet
SSC CGL 2022 Error Detection Guide
30 pages
AUMA Multi-Turn Actuator Manual
No ratings yet
AUMA Multi-Turn Actuator Manual
52 pages
Understanding Budget at Completion (BAC)
No ratings yet
Understanding Budget at Completion (BAC)
56 pages
Terrestrial Plants in Ifondo Farms
No ratings yet
Terrestrial Plants in Ifondo Farms
14 pages
Ohio Wildlife Center: Opossum Care Guide
No ratings yet
Ohio Wildlife Center: Opossum Care Guide
2 pages
Manhattan Project Overview and Impact
No ratings yet
Manhattan Project Overview and Impact
9 pages
Sustainable Building Renovation Research
No ratings yet
Sustainable Building Renovation Research
7 pages
Cryptography and Protocol Engineering
No ratings yet
Cryptography and Protocol Engineering
50 pages
Titan: A Guide to Powerful Warriors
No ratings yet
Titan: A Guide to Powerful Warriors
10 pages
Phishing Email Examples and Tips
No ratings yet
Phishing Email Examples and Tips
4 pages
Goldbach Number and Matrix Programs
No ratings yet
Goldbach Number and Matrix Programs
130 pages
Advanced Technologies For Ammonia Plants - CASALE GROUP
No ratings yet
Advanced Technologies For Ammonia Plants - CASALE GROUP
83 pages
Alfa Laval Flex Separation Parameters Guide
No ratings yet
Alfa Laval Flex Separation Parameters Guide
17 pages
S-24C08C 2-Wire Serial E2PROM
No ratings yet
S-24C08C 2-Wire Serial E2PROM
33 pages
Inverse Time Overcurrent Relay Analysis
No ratings yet
Inverse Time Overcurrent Relay Analysis
5 pages
Overview of Porter Stemming Algorithm
No ratings yet
Overview of Porter Stemming Algorithm
20 pages
Bootstrap MCQs for Front-End Development
No ratings yet
Bootstrap MCQs for Front-End Development
29 pages

Nitsa Herzog's Guide to Text Analysis

Uploaded by

Nitsa Herzog's Guide to Text Analysis

Uploaded by

Text Analysis

Nitsa Herzog Text Analysis 1

• The process of computationally retrieving information from text, such as

Nitsa Herzog Text Analysis 2

• Natural language processing (NLP) combines computational linguistics,

Nitsa Herzog Text Analysis 3

NLP • Chatbots: Helping chatbots understand and

s speech to text and analyzing its intent

Nitsa Herzog Text Analysis 5

Nitsa Herzog Text Analysis 7

Nitsa Herzog Text Analysis 9

Processing analyses the syntactic structure of

Natural Language Generation –

Nitsa Herzog Text Analysis 11

Nitsa Herzog Text Analysis 12

Nitsa Herzog Text Analysis 13

Nitsa Herzog Text Analysis 14

Nitsa Herzog Text Analysis 15

Nitsa Herzog Text Analysis 17

Natural language text contains ambiguities

Learning techniques for processing text

Nitsa Herzog Text Analysis 18

Example Corpora in NLP

Nitsa Herzog Text Analysis 20

● Tokenize: parse documents into smaller units, such as

! Preprocessing should be customized based on the type

Nitsa Herzog Text Analysis 21

(2023) smiley embedded in text and used in electronic messages

Nitsa Herzog Text Analysis 23

Nitsa Herzog Text Analysis 24

Nitsa Herzog Text Analysis 25

Nitsa Herzog Text Analysis 26

Nitsa Herzog Text Analysis 27

Nitsa Herzog Text Analysis 28

Nitsa Herzog Text Analysis 29

Visualizes words in a document with sizes proportional to how

Nitsa Herzog Text Analysis 31

• The bag-of-words technique oversimplifies the problem,

Nitsa Herzog Text Analysis 32

Nitsa Herzog Text Analysis 33

Nitsa Herzog Text Analysis 34

Inverse document frequency

tfidf(t, d, D) = tf(t, d) * idf(t, D)

Nitsa Herzog Text Analysis 36

Nitsa Herzog Text Analysis 37

Nitsa Herzog Text Analysis 38

Nitsa Herzog Text Analysis 40

Text Analysis and Classification

Nitsa Herzog Text Analysis 42

Nitsa Herzog Text Analysis 43

Biomedical articles clustering. Source:

Nitsa Herzog Text Analysis 45

Nitsa Herzog Text Analysis 46

Nitsa Herzog Text Analysis 47

Nitsa Herzog Text Analysis 48

You might also like