0% found this document useful (0 votes)
34 views

Natural Language Processing

The document discusses natural language processing (NLP). It provides an introduction to NLP, describing it as the intersection of linguistics, computer science, and artificial intelligence. It also outlines some of the key components and applications of NLP systems, including natural language understanding, natural language generation, and applications like machine translation, question answering, and sentiment analysis. The document then describes some of the main challenges in NLP like ambiguities that can occur at the lexical, syntactic, semantic, discourse, and pragmatic levels.

Uploaded by

shuchis785
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Natural Language Processing

The document discusses natural language processing (NLP). It provides an introduction to NLP, describing it as the intersection of linguistics, computer science, and artificial intelligence. It also outlines some of the key components and applications of NLP systems, including natural language understanding, natural language generation, and applications like machine translation, question answering, and sentiment analysis. The document then describes some of the main challenges in NLP like ambiguities that can occur at the lexical, syntactic, semantic, discourse, and pragmatic levels.

Uploaded by

shuchis785
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Natural Language Processing

CSE4022

Lecture-02: Intro to NLP - Module 1

Dr. Durgesh Kumar


Assistant Professor, SCOPE, VIT Vellore
Table of contents

1 NLP - Introduction

2 Levels/Stage of NLP

3 Ambiguities and Computational challenges in


NLP

4 Heroes of NLP and related online courses

5 Project Component

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 1 / 29
NLP - Intro

Natural Language Processing (NLP)

NLP is the intersection of Linguistic Science, Computer Science and


Artificial Intelligence.
It deals with the processing and and understanding of human
languages by computer.

Related Research Areas


Speech Processing, Natural Language Understanding (NLU)
Search Engine, Information Retrieval, Information Extraction
Social Network Analysis, Recommeder System

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 2 / 29
Real Life Applications of NLP

NLP - Applications
Google Language Translation
Grammar and Spelling correction - Grammarly, Language tool
Information Retrieval - Google News, Semantic Scholar
Name Entity Recognition (NER) - Explostion AI NER
Parts of Speech Recognition (POS) - POS online tool
Sentence Autocomlete
Document Summarization
Sentiment Analysis
Question Answering
Chatbots - ILA: SBI oneline chatbot
Document Generations
Generating Images based on description- DALLE-2

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 3 / 29
NLP systems

I/O of NLP systems


The inputs and output of NLP system can be:
Speech
Text

Components of NLP systems


Natural Language Understanding (NLU) - understanding the
meaning of Natural language by the virtue of word meaning,
word-sentence-paragraph combination, and context.
Natural Language Generation (NLG) - process of producing
meaningful phrases and sentences in the form of natural language.
Text planning: It includes retrieving the relevant content from
knowledge base.
Sentence planning: It includes choosing required words, forming
meaningful phrases, setting tone of the sentence.
Text Realization: It is mapping sentence plan into sentence structure.

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 4 / 29
NLP terminologies

Phonology: It is study of organizing sound systematically.


Morphology: It is a study of construction of words from primitive
meaningful units.
Morpheme: It is primitive unit of meaning in a language.
Syntax: It refers to arranging words to make a sentence. It also
involves determining the structural role of words in the sentence and
in phrases.
Semantics: It is concerned with the meaning of words and how to
combine words into meaningful phrases and sentences.
Pragmatics: It deals with using and understanding sentences in
different situations and how the interpretation of the sentence is
affected.
Discourse: It deals with how the immediately preceding sentence
can affect the interpretation of the next sentence.
World Knowledge: It includes the general knowledge about the
world.
Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 5 / 29
Stages of NLP

Lexical Analysis: Analysis of word forms


Syntax Analysis: Structure processing
Semantic Analysis: Meaning representation
Discourse: Processing of interrelated
sentences
Pragmatics: The purposeful use of
sentences in situations using world’s
knowledge.

Figure: Stages of NLP

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 6 / 29
Stages of NLP - Lexical Analysis

Lexical Analysis - Identifying and analyzing the structure of words. The


lexical analysis divides the text into paragraphs, sentences, and words.
After breaking the into words, lexicon normalization is performed. The
most common lexicon normalization techniques are:
Stemming: Stemming is the process of reducing derived words to
their word stem, base, or root form—generally a written word form
like-“ing”, “ly”, “es”, “s”, etc.
Lemmatization: Lemmatization is the process of reducing a group
of words into their lemma or dictionary form.

Table: Stemming vs Lemmtization

word Suffix Stemming Lemmatization


studies es studi study
studying ing study study

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 7 / 29
Stages of NLP - Syntactic Analysis

Syntactic Analysis Syntactic Analysis is used to check grammar,


arrangements of words, and the interrelationship between the words.
Mumbai goes to Sara. Syntactically wrong
Incorrect Syntax: Rise in sun the east.
Correct Syntax: Sun rises in the east.
Parts-of-Speech (POS) Tagging, Dependency Parser, and Grammar
checking are important for Syntactic Analysis.

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 8 / 29
Stages of NLP - Semantic Analysis

Syntactic Analysis Semantic analysis extracts only meaningful information


from the text and rejects/ignores the sentences that do not make sense.
“The apple ate a banana”. Semantically wrong - Rejected
“Truck is eating Oranges“. Semantically wrong - Rejected

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 9 / 29
Stages of NLP - Discourse Integration

Discourse Integration Its scope is not only limited to a word or sentence,


rather discourse integration helps in studying the whole text.
”John got ready at 9 AM. Later he took the train to California”.
the machine is able to understand that the word “he” in the second
sentence is referring to “John”

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 10 / 29
Stages of NLP - Pragmatic Analysis

Pragmatic Analysis It is a complex phase where machines should have


knowledge not only about the provided text but also about the real
world. There can be multiple scenarios where the intent of a sentence
can be misunderstood if the machine doesn’t have real world knowledge.
”Thank you for coming so late, we have wrapped up the meeting”
(Contains sarcasm).
”Can you share your screen?” (here the context is about
computer’s screen share during a remote meeting).

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 11 / 29
Ambiguities in NLP

Different levels of Ambiguity:


1 Lexical ambiguity: It is at very primitive level such as word-level. For
example, treating the word blueboard as noun or verb? Solution:
POS tagging Word Sense Disambiguation
2 Syntax Level ambiguity: A sentence can be parsed in different ways.
For example, “He lifted the beetle with red cap.” - Did he use cap
to lift the beetle or he lifted a beetle that had red cap? Solution:
probabilistic parsing
3 Semantic Level Ambiguity: This occurs when the meaning of the
words themselves can be misinterpreted. Even after the syntax and
the meanings of the individual words have been resolved, there are
two ways of interpreting the sentence. Consider the example,
“Seema loves her mother and Sriya does too”. =⇒ Sriya loves
whose mother? - her own or Seema’s mother.

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 12 / 29
Ambiguities in NLP (Contd.)
Different levels of Ambiguity
4 Discourse level ambiguity: where the interpretation is ambiguous by
the virtue of context (previous word/sentence/paragraph).
e.g.Referential Ambiguity.
Referential Ambiguity
Referring to something using pronouns. For example, Rima went to
Gauri. She said, “I am tired.” =⇒ Exactly who is tired? Solution:
Co-reference resolution. Referential ambiguity is also known as
Anaphoric Ambiguity.
5 Pragmatic level ambiguity: Pragmatic ambiguity refers to a situation
where the context of a phrase gives it multiple interpretation and it
require real world knowledge for correct interpretation.
One of the hardest tasks in NLP.
The problem involves processing user intention, sentiment, belief,
world, etc.- all of which are highly complex tasks.
Pragmatic Ambiguity occurs when context does not provide enough
information to clarify the statement.

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 13 / 29
Ambiguities in NLP (Contd.)

Pragmatic Ambiguity example

#Panchayat 2 along with water


tank resembles the situation of
Panchayat 2 (Web series) hero
where he feels disconnected with
the village life and his ambitions.

Figure: An WhatsApp status

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 14 / 29
Ambiguities in NLP - Lexical Ambiguities
A word as different parts-of-speech (POS)

1 board as noun or verb?


2 book as noun or verb?
I board on the flight. =⇒ Verb
The board has discussed the finances of 2022. =⇒ Noun
I book a flight ticket. =⇒ Noun
I am reading a book. =⇒ Noun.
Solution: Parts-of-speech tagging.

A word with different sense


1 make as to create or engage? - make meaning from WordNet
make love not war. implies do or engage in.
make a mess in one’s office. implies create.
make a mistake implies carry out or commit.
Solution: Word Sense Disambiguation (WSD) eg. Wordnet.
Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 15 / 29
Ambiguities in NLP - Syntax level Ambiguity
“He lifted the beetle with red cap.”.
Did he lifted a beetle that had red cap?

Figure: beetle with red cap

Did he use cap to lift the beetle

Figure: beetle with red cap

Solution: probabilistic parsing


Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 16 / 29
Ambiguities in NLP - Syntax level Ambiguity
(Contd.)
Fed raises interest rates.” Phrase tree or sentence dependency tree or
parse tree
Fed raise interest rates. [Fed] raise [interest rates]

Figure: Raise as main verb.

Fed raise interest rates. [Fed raise] interest [rates]

Figure: Interest as main verb.


Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 17 / 29
Different type of Models and Algorithm for
NLP tasks
1 State Machines: Deterministic and non-deterministic finite state
automata, finite state transducers, weighted automata. e.g. Markov
models and Hidden Markov models which combines States machines
with probabilistic model.
2 Formal rule systems: regular grammars, regular relations, Context
Free Grammars (CFG), feature-augmented grammars. It is used
while dealing with phonology, morphology and syntax. e.g. Compiler
to check the syntax.
3 Logic-based models: first order logic, predicate calculus. It helps in
dealing with semantics, pragmatics, and discourse.
4 Probabilistic Models: helps to resolve ambiguities. It is one class of
Machine learning based models.
The algorithm involving both state-machine and formal rule system
involves search through as state of space representing hypothesis
about input.
Dynamic Programming is used to optimize this search process.
Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 18 / 29
Challenges of NLP

Ambiguity makes the NLP hard and interesting


Violinist Linked to JAL Crash Blossoms - Headline of 2009 Japanese
newspaper named Today.

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 19 / 29
Challenges of NLP

Ambiguity makes the NLP hard and interesting


Violinist Linked to JAL Crash Blossoms - Headline of 2009 Japanese
newspaper named Today.
Violinist Linked to JAL Crash Blossoms.
Violinist Linked to JAL Crash Blossoms.

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 19 / 29
Challenges of NLP

Ambiguity makes the NLP hard and interesting


Violinist Linked to JAL Crash Blossoms - Headline of 2009 Japanese
newspaper named Today.
Violinist Linked to JAL Crash Blossoms.
Violinist Linked to JAL Crash Blossoms.
Teacher Strikes Idle Kids

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 19 / 29
Challenges of NLP

Ambiguity makes the NLP hard and interesting


Violinist Linked to JAL Crash Blossoms - Headline of 2009 Japanese
newspaper named Today.
Violinist Linked to JAL Crash Blossoms.
Violinist Linked to JAL Crash Blossoms.
Teacher Strikes Idle Kids
Teacher Strikes to Idle Kids.
Teacher Strikes Idle Kids.

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 19 / 29
Challenges of NLP

Ambiguity makes the NLP hard and interesting


Violinist Linked to JAL Crash Blossoms - Headline of 2009 Japanese
newspaper named Today.
Violinist Linked to JAL Crash Blossoms.
Violinist Linked to JAL Crash Blossoms.
Teacher Strikes Idle Kids
Teacher Strikes to Idle Kids.
Teacher Strikes Idle Kids.
Red Tape Holds Up New Bridges

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 19 / 29
Challenges of NLP

Ambiguity makes the NLP hard and interesting


Violinist Linked to JAL Crash Blossoms - Headline of 2009 Japanese
newspaper named Today.
Violinist Linked to JAL Crash Blossoms.
Violinist Linked to JAL Crash Blossoms.
Teacher Strikes Idle Kids
Teacher Strikes to Idle Kids.
Teacher Strikes Idle Kids.
Red Tape Holds Up New Bridges
holds up =⇒ support.
hold up =⇒ hold or block.

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 19 / 29
Challenges of NLP

Ambiguity makes the NLP hard and interesting


Violinist Linked to JAL Crash Blossoms - Headline of 2009 Japanese
newspaper named Today.
Violinist Linked to JAL Crash Blossoms.
Violinist Linked to JAL Crash Blossoms.
Teacher Strikes Idle Kids
Teacher Strikes to Idle Kids.
Teacher Strikes Idle Kids.
Red Tape Holds Up New Bridges
holds up =⇒ support.
hold up =⇒ hold or block.

First two are Lexical POS dis-ambiguity.


last one is Lexical word sense dis-ambiguity.

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 19 / 29
Challenges of NLP (Contd.)

Figure: Challenges of NLP

1 Content of this slide is borrowed from Prof. Dan Jurafsky lecture slides
Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 20 / 29
Real world Application of NLP

Question-Answering: IBM’s Watson.


Information Extraction
Sentiment Analysis.
Machine Translation
POS tagging, NER tagging
Spam detection
Co-reference resolution
Question Answering
Paraphrasing
Dialogue
2
.

2 Page 2-6 of Lecture-1 slides of Prof. Dan Jurafksy


Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 21 / 29
Heroes of NLP

1 Prof. Christopher D ManningProfessor of Computer Science and


Linguistics at Stanford University,
One of the highest cited author in the world.
Author of GLOVE word vectors and Information Retrieval book.
Director of Stanford Artificial Intelligence Laboratory (SAIL).
Instructor of standford CS224N NLP with Deep Learning.
Co-Instructor of NLP with Prof. Daniel Jurafsky
2 Kathleen McKeown, Professor of Computer Science at Columbia
University
Founding Director of the Data Science Institute at Columbia.
Designed Newsblaster multi-document summarization program to
derive summary news stories from the contents of several news sites
which is also multi-lingual.

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 22 / 29
Heroes of NLP (Contd.)
3 Oren Etzioni, CEO of the Allen Institute for Artificial Intelligence.
introduced a tool called Semantic scholar which will summarize large
textual PDF files.
Professor at Emeritus, the University of Washington
Awarded as Seattle’s Geek of the Year (2013).
Xo-founded several companies, including Farecast (acquired by
Microsoft)
4 Quoc Le, Research Scientist at Google Brain
Initially worked on deep-learning based Image classification.
From 2014 onwards he has worked towards textcolorblueAutomating
Machine Learning (AutoML).
5 Prof. Dan Jurafsky - Professor of Computer Science and Linguistics
at Stanford University.
Recipient of the 2002 MacArthur Fellowship.
author of book titled “Speech and Language Processing: An
Introduction to Natural Language Processing, Computational
Linguistics, and Speech Recognition”.
developed first automated system for Semantic role labeling with
Daniel Gildea.
Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 23 / 29
Heroes of NLP (Contd.)

Further reading on Heroes of NLP


Deep Learning AI contains the video interviews of top 4 personalities
in the list with Prof. Andrew NG about their journey and future
research.
Medium blog post by Ekta Shah
Colearning blog post b Ekta Shah

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 24 / 29
Related Online Courses

NLP 2012 by Prof. Dan Jurafsky and Prof. Chris Manning.


slides of the above course.
Standford CS224N NLP with Deep Learning by Prof Chris Manning.

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 25 / 29
Project Component

Google form for Title and Team finalization - Deadline next class
Review 1 – 15 marks (for satisfactory demonstration)
Review 2 – 25 marks (20 marks for satisfactory demo, +5 for
additional work)
Review 3 – 45 marks (40 marks for satisfactory demo, +5 for
additional work)
Submission of review or research paper – 15 marks (subject to
plagiarism report)

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 26 / 29
Summary Slides

What is NLP?
Stages of NLP
Dis-ambiguity in NLP
Challenges of NLP.
Models and Algorithm categories in NLP.
Real world Applications of NLP
Heroes of NLP and related online courses.

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 27 / 29
Next lectures

Module 2 : Text Processing


Word tokenization
Sentence tokenization

Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 28 / 29
Dr. Durgesh Kumar Lecture-02 — NLP — CSE4022 July 28, 2022July 28, 2022 29 / 29

You might also like