0% found this document useful (0 votes)

2 views

NLP

The document provides an introduction to Natural Language Processing (NLP) and its applications, including text summarization, sentiment analysis, and chatbots. It discusses the Natural Language Toolkit (nltk) in Python, which offers tools and datasets for processing human language data. Additionally, it covers concepts like tokenization, bag-of-words models, and sentiment analysis using a Naive Bayes Classifier.

Uploaded by

vagifsamadov2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

NLP

Uploaded by

vagifsamadov2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

11/23/2023

Natural Language Processing:

nltk

Intro to Natural Language Processing (NLP)

▪ Algorithms to analyze, understand and derive meaning from

human language

▪ Hard computational problem because human language is

ambiguous, needs context and ability to link concepts

▪ Applications: summarize text, generate keywords, identify

sentiment of text

1
11/23/2023

Real Life examples of NLP

▪ Speech recognition engines like Siri,

Google Now or Alexa
▪ Automatic translation like Google
Translate or Facebook automatic
translation of statuses
▪ Chat bots that can answer question via
Facebook Messenger, for example
provided by Techcrunch, Disney or Whole
Foods

nltk

Natural Language Toolkit in Python

▪ Work with human language data

▪ Includes over 50 datasets
▪ Complete library of easy to use
algorithms for processing text
▪ Available for free under open source
license

http://nltk.org

2
11/23/2023

Natural Language Processing:

nltk corpora

corpus (plural corpora) is a collection of

text in digital form, assembled for text
processing

nltk provides a download interface to pre-

processed text datasets.

3
11/23/2023

List of nltk corpora

nltk movie reviews corpus

nltk.download("movie_reviews")

ls nltk_data/corpora/movie_reviews

README neg pos

2000 files:

▪ 1000 positive reviews in the pos/ folder

▪ 1000 negative reviews in the neg/ folder

4
11/23/2023

nltk movie reviews corpus

nltk.download("movie_reviews")

2000 files:

▪ 1000 positive reviews in the pos/ folder

▪ 1000 negative reviews in the neg/ folder
▪ average 800 words per review

Natural language processing:

tokenize

5
11/23/2023

Tokenization

The first step in analyzing text is to split it

into words: Tokenization

Corner cases:

▪ punctuation
▪ contractions
▪ hyphenated words

Example: "New York-based"

First Attempt without nltk

Naively just split on whitespace

See Tokenize text in words

6
11/23/2023

Tokenize with nltk

nltk.word_tokenize

Sophisticated tokenizer specific to English,

it requires the punkt corpus.

It correctly identifies also punctuation.

Natural language processing:

build a bag-of-words model

7
11/23/2023

Bag-of-words Model

Bag-of-words =text as unordered

collection of words

▪ simple model
▪ discards sentence structure
▪ useful to identify topic or sentiment

Building Features with Words

outstanding movie family worse uninvolving interesting

Review 1 True True False False False False

Review 2 False True False True True False

Review 3 True True True False False False

8
11/23/2023

Filter out Stopwords and Punctuation

• The movie_reviewstokenized words also include punctuation and

stopwords.

• Stopwords are very common words that have no intrinsic meaning like
"the", "is","which".

Natural Language Processing:

Plotting Frequency of Words

9
11/23/2023

Number of Words in Movie Reviews Corpus

▪ ~1.6 million words

▪ just 710 thousand after filtering

punctuation and stopwords

Using Counter

▪ Part of the collections package in the

Python Standard Library
▪ Counts how many time an item is
repeated

counter =Counter(filtered_words)
counter["movie"]
5771

10
11/23/2023

Plotting Word
Frequency

Histogram of Word Counts

▪ Use hist from matplotlib to create a

histogram
▪ Choose bin number and optionally log
axes

11
11/23/2023

Natural Language Processing:

Sentiment Analysis

What is Sentiment Analysis

▪ Identify attitude or emotion encoded in

a text
▪ Can be implemented as a Machine
Learning Classifier

▪ Example: prediction on the appearance

of words in a review

12
11/23/2023

Build features/label pairs

• The function implemented previously creates a set of features.

• Create a pair of feature and positive/negative label for each review.

Naive Bayes Classifier

Naive Bayes Classifier is a simple classifier

based on Conditional Probabilities.

In the training phase, it detects the

probability that each feature (word)
appears in a category (positive or
negative).

Once trained, it collects the "votes" for all

words in the new review and finds the most
probable label.

Water Evaporation Rate-UAE
100% (1)
Water Evaporation Rate-UAE
3 pages
FUEL TANK 500 LITER-Model
No ratings yet
FUEL TANK 500 LITER-Model
1 page
Tariff Handbook
No ratings yet
Tariff Handbook
2,034 pages
Module-I_NLP (1)
No ratings yet
Module-I_NLP (1)
35 pages
Minorproject Ishant
No ratings yet
Minorproject Ishant
18 pages
UNIT III
No ratings yet
UNIT III
6 pages
unit 4 (1)
No ratings yet
unit 4 (1)
39 pages
Introducing Natural Language Processing
No ratings yet
Introducing Natural Language Processing
13 pages
Topic 2: Introduction To Natural Language Processing (NLP)
No ratings yet
Topic 2: Introduction To Natural Language Processing (NLP)
16 pages
module-1
No ratings yet
module-1
49 pages
176_DL
No ratings yet
176_DL
11 pages
Chapter - 6 Communicating, Perceiving, and Acting
No ratings yet
Chapter - 6 Communicating, Perceiving, and Acting
30 pages
Chapter 7.1 - Introducing Natural Language Processing
No ratings yet
Chapter 7.1 - Introducing Natural Language Processing
39 pages
Module I NLP
No ratings yet
Module I NLP
65 pages
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
No ratings yet
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
37 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
31 pages
SCO409 Lecture Notes
No ratings yet
SCO409 Lecture Notes
64 pages
NLP handwritten notes_copy
No ratings yet
NLP handwritten notes_copy
26 pages
nlp_1
No ratings yet
nlp_1
11 pages
Natural Language Processing: All You Need To Know About
No ratings yet
Natural Language Processing: All You Need To Know About
45 pages
Google NLP: NLP (Natural Language Processing)
No ratings yet
Google NLP: NLP (Natural Language Processing)
8 pages
NLP - Srilakshmi H - PPT Assignment
No ratings yet
NLP - Srilakshmi H - PPT Assignment
29 pages
Natural Language Processing 101
No ratings yet
Natural Language Processing 101
26 pages
Natural Language Processing_NOTES
No ratings yet
Natural Language Processing_NOTES
4 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
17 pages
NLP DL
No ratings yet
NLP DL
26 pages
NLP - 1_250119_222702 (1)
No ratings yet
NLP - 1_250119_222702 (1)
71 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
NLP PREP
No ratings yet
NLP PREP
14 pages
Module2.4 Text Processing
No ratings yet
Module2.4 Text Processing
17 pages
Natural Language Processing
No ratings yet
Natural Language Processing
12 pages
AI-2
No ratings yet
AI-2
7 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
7 pages
Natural Language Processing
No ratings yet
Natural Language Processing
30 pages
1_NLP.docx
No ratings yet
1_NLP.docx
26 pages
Natural Language Processing With Python
100% (1)
Natural Language Processing With Python
504 pages
NLP PDF
No ratings yet
NLP PDF
25 pages
Nlp Materia
No ratings yet
Nlp Materia
29 pages
NLP Notes
No ratings yet
NLP Notes
16 pages
NLP Lecture 1
No ratings yet
NLP Lecture 1
3 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
211 pages
Natural Language Processing (NLP) With Python - Tutorial
No ratings yet
Natural Language Processing (NLP) With Python - Tutorial
72 pages
Intro To NLP: Natural Language Toolkit
No ratings yet
Intro To NLP: Natural Language Toolkit
11 pages
Natural Language Processing manual
No ratings yet
Natural Language Processing manual
39 pages
Natural Language Processing (NLP) (A Complete Guide)
No ratings yet
Natural Language Processing (NLP) (A Complete Guide)
26 pages
NLP Lab Manual-1
No ratings yet
NLP Lab Manual-1
18 pages
NLP Intro
No ratings yet
NLP Intro
74 pages
AI Zone: Log in Sign Up
No ratings yet
AI Zone: Log in Sign Up
24 pages
NLP01 IntroNLP
No ratings yet
NLP01 IntroNLP
68 pages
NLP LectureNotes UNIT 1
No ratings yet
NLP LectureNotes UNIT 1
55 pages
Natural Language Processing - Session 1 - Introduction
No ratings yet
Natural Language Processing - Session 1 - Introduction
55 pages
Chapter 6.
No ratings yet
Chapter 6.
31 pages
NLP Unit-1 - 1
No ratings yet
NLP Unit-1 - 1
24 pages
Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
Chapter 6-NLPs
No ratings yet
Chapter 6-NLPs
31 pages
Raymond S. T. Lee - Natural Language Processing. A Textbook With Python Implementation-Springer (2024)
No ratings yet
Raymond S. T. Lee - Natural Language Processing. A Textbook With Python Implementation-Springer (2024)
454 pages
Massp2023 NLP
No ratings yet
Massp2023 NLP
26 pages
Unit 4
No ratings yet
Unit 4
8 pages
NLP FINAL
No ratings yet
NLP FINAL
33 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
NLP Lect Unit I
No ratings yet
NLP Lect Unit I
140 pages
Natural Language Processing with Python and spaCy: A Practical Introduction
From Everand
Natural Language Processing with Python and spaCy: A Practical Introduction
Yuli Vasiliev
No ratings yet
Learn Rust Programming: Safe Code, Supports Low Level and Embedded Systems Programming with a Strong Ecosystem (English Edition)
From Everand
Learn Rust Programming: Safe Code, Supports Low Level and Embedded Systems Programming with a Strong Ecosystem (English Edition)
Claus Matzinger
No ratings yet
chapter 5
No ratings yet
chapter 5
22 pages
chapter 1
No ratings yet
chapter 1
13 pages
LinearRegression
No ratings yet
LinearRegression
64 pages
Association-Rules
No ratings yet
Association-Rules
33 pages
Diagnosing Organizational System CH 6
No ratings yet
Diagnosing Organizational System CH 6
12 pages
Eggs Spoilage Storage Preservation
No ratings yet
Eggs Spoilage Storage Preservation
22 pages
Resumen en Ingles Del Libro La Isla Del Docor Moreau
No ratings yet
Resumen en Ingles Del Libro La Isla Del Docor Moreau
2 pages
Clippings Songs(1)
No ratings yet
Clippings Songs(1)
159 pages
Road Wheel Track Pads
No ratings yet
Road Wheel Track Pads
2 pages
Facilities MGMT Work Order Workpalce Writing Minor 4
No ratings yet
Facilities MGMT Work Order Workpalce Writing Minor 4
1 page
Study Comparing Ceftriaxone With Azithromycin For The Treatment of Uncomplicated Typhoid Fever in Children of India
No ratings yet
Study Comparing Ceftriaxone With Azithromycin For The Treatment of Uncomplicated Typhoid Fever in Children of India
6 pages
Amha List
No ratings yet
Amha List
4 pages
Neonatal Cns Examination
No ratings yet
Neonatal Cns Examination
33 pages
Design Considerations For Underground 11101240
No ratings yet
Design Considerations For Underground 11101240
12 pages
(FREE PDF Sample) A History of Palliative Care 1500 1970 Concepts Practices and Ethical Challenges 1st Edition Michael Stolberg (Auth.) Ebooks
100% (8)
(FREE PDF Sample) A History of Palliative Care 1500 1970 Concepts Practices and Ethical Challenges 1st Edition Michael Stolberg (Auth.) Ebooks
57 pages
MIDSEM II Exams Time Table Feb
No ratings yet
MIDSEM II Exams Time Table Feb
11 pages
Proc 1159 Gma
No ratings yet
Proc 1159 Gma
4 pages
Aromatherapy, Massage PDF
100% (1)
Aromatherapy, Massage PDF
233 pages
Bom of Povidone Iodine Solution 5% 2040Ltr
No ratings yet
Bom of Povidone Iodine Solution 5% 2040Ltr
1 page
Lesson Plan What Did You Do Last Weekend
No ratings yet
Lesson Plan What Did You Do Last Weekend
4 pages
Loves Vocab
No ratings yet
Loves Vocab
2 pages
AQ18FCN Service Manual - Shematic Diagram
No ratings yet
AQ18FCN Service Manual - Shematic Diagram
9 pages
Chapter 8
100% (2)
Chapter 8
71 pages
Gastrointestinal Bleeding
No ratings yet
Gastrointestinal Bleeding
17 pages
BSPHCL Application Form
No ratings yet
BSPHCL Application Form
4 pages
NG
No ratings yet
NG
474 pages
Design of Beam Sections For Shear
100% (1)
Design of Beam Sections For Shear
5 pages
March 2022 (v2) MS - Paper 1 CAIE Geography IGCSE
No ratings yet
March 2022 (v2) MS - Paper 1 CAIE Geography IGCSE
16 pages
Listening Part 4
No ratings yet
Listening Part 4
3 pages
Reshaping the University Society for Research Into Higher Education 1st Edition Ronald Barnett download
No ratings yet
Reshaping the University Society for Research Into Higher Education 1st Edition Ronald Barnett download
63 pages
Unisim Model Description
No ratings yet
Unisim Model Description
10 pages

NLP

Uploaded by

NLP

Uploaded by

11/23/2023

Natural Language Processing:

Intro to Natural Language Processing (NLP)

▪ Algorithms to analyze, understand and derive meaning from

▪ Hard computational problem because human language is

▪ Applications: summarize text, generate keywords, identify

Real Life examples of NLP

▪ Speech recognition engines like Siri,

Natural Language Toolkit in Python

▪ Work with human language data

Natural Language Processing:

corpus (plural corpora) is a collection of

nltk provides a download interface to pre-

List of nltk corpora

nltk movie reviews corpus

README neg pos

▪ 1000 positive reviews in the pos/ folder

nltk movie reviews corpus

▪ 1000 positive reviews in the pos/ folder

Natural language processing:

The first step in analyzing text is to split it

Example: "New York-based"

First Attempt without nltk

Naively just split on whitespace

See Tokenize text in words

Tokenize with nltk

Sophisticated tokenizer specific to English,

It correctly identifies also punctuation.

Natural language processing:

Bag-of-words =text as unordered

Building Features with Words

outstanding movie family worse uninvolving interesting

Review 1 True True False False False False

Review 2 False True False True True False

Review 3 True True True False False False

Filter out Stopwords and Punctuation

• The movie_reviewstokenized words also include punctuation and

Natural Language Processing:

Number of Words in Movie Reviews Corpus

▪ ~1.6 million words

▪ just 710 thousand after filtering

▪ Part of the collections package in the

Histogram of Word Counts

▪ Use hist from matplotlib to create a

Natural Language Processing:

What is Sentiment Analysis

▪ Identify attitude or emotion encoded in

▪ Example: prediction on the appearance

Build features/label pairs

• The function implemented previously creates a set of features.

• Create a pair of feature and positive/negative label for each review.

Naive Bayes Classifier

Naive Bayes Classifier is a simple classifier

In the training phase, it detects the

Once trained, it collects the "votes" for all

You might also like