0% found this document useful (0 votes)
27 views33 pages

NLp_lab1

Natural Language Processing (NLP) is a subfield of AI that focuses on the interaction between computers and human language, aiming to understand and generate language. It encompasses various tasks such as text categorization, sentiment analysis, information extraction, and machine translation, each facing unique challenges like ambiguity and context interpretation. The document outlines the stages of language processing, including phonetics, lexical analysis, syntactic and semantic analysis, and discusses the difficulties in achieving effective NLP.

Uploaded by

cocap80620
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views33 pages

NLp_lab1

Natural Language Processing (NLP) is a subfield of AI that focuses on the interaction between computers and human language, aiming to understand and generate language. It encompasses various tasks such as text categorization, sentiment analysis, information extraction, and machine translation, each facing unique challenges like ambiguity and context interpretation. The document outlines the stages of language processing, including phonetics, lexical analysis, syntactic and semantic analysis, and discusses the difficulties in achieving effective NLP.

Uploaded by

cocap80620
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

lab1 :

Natural Language Processing


NLP
What's the difference between Machine Learning
(ML), AI, and NLP?

• AI = building systems that can do intelligent things

•NLP = building systems that can understand language ⊊ AI

•ML = building systems that can learn from experience ⊊ AI

•NLP ⋂ ML = building systems that can learn how to


understand language
What is NLP?
➢ NLP is a subfield of linguistics, computer science, and artificial intelligence
concerned with the interactions between computers and human language, how to
program computers to process and analyze large amounts of natural language
data.

➢ Concerned with the interactions between


computers and human (natural) languages.

➢ NLP has 2 Goals

1. Science Goal : Understand the way language operates


2. Engineering Goal: Build systems that analyze and generate
language; reduce the main machine gap
What is NLP?
➢ is a subfield of linguistics, computer science, and artificial intelligence concerned
with the interactions between computers and human language, in particular how
to program computers to process and analyze large amounts of natural language
data.

➢Concerned actions between


- why did you go to the bank?
Which mean of bank?
➢ NLP has 2 Goals

1. Science Goal : Understand the way language operates


2. Engineering Goal: Build systems that analyze and generate
language; reduce the man machine gap
Component of NLP?
NLP can be divided into two basic components.
• Natural Language Understanding
• Natural Language Generation
Understanding VS. Generation
▪ Natural language understanding (NLU) : mapping the given input (i.e. text)
into a useful representation:
- By “understand” we do not mean that the computer has humanlike thoughts,
feelings, and knowledge.

- But can recognize and use information expressed in a human language.

- the system needs to disambiguate the input sentence to produce the


machine representation language (appropriate syntactic and semantic schema)

• NLU faces the challenge of understanding a text without ambiguity.


EX. Automatically tagging part of speech of words (easy), automatic grading of student essays (hard )
Understanding VS. Generation
▪ Natural language understanding (NLU) :
❖ Lexical Ambiguity can occur when a word carries different sense, i.e. having more than one
meaning and the sentence in which it is contained can be interpreted differently depending on its
correct sense. Lexical ambiguity can be resolved to some extent using parts-of-speech tagging
techniques. <<For instance, the word "bank" << "financial institution" and "edge of a river".>>

❖ Syntactical Ambiguity means when we The chicken is ready to eat


see more than one meaning in a sequence
the chicken ready to eat his food or the
of words. It is also termed as grammatical chicken ready for someone else to eat?
ambiguity.
❖ Referential Ambiguity: Very often a text Ahmed met Ali and Yassine. They went to restaurant
mentions as entity (something/someone),
and then refers to it again, possibly in a They refer to Ali and Yassine or
all?
different sentence, using another word.
Pronoun causing ambiguity when it is not clear which noun it is referring to
Understanding VS. Generation (cont)

• Natural language generation(NLG) : starts from the data to product


a text which is the result of the interpretation and analysis of this data. our goal
is much more complex: we must, from data placed here and there, product text
– but in what order, about what subject and in what form?
Ex. Automatic Summarization.

.
Application of NLP

1- Text Categorization
Is the document about?

Plants?
Sports?
Document Political?
Health and fitness?
..
..
..
Stock market?

Ex. Uclassify
Application of NLP (cont)

2- Sentiment classification
Is the over all sentiments
in the document
positive?
Document OR
negative?

In general, sentiment classification appears to be harder than


categorizing by topic. ( EX. “Epinion” “consumer review”)
Application of NLP (cont)

3- Information Extraction (IE)


Who:
Information Where:
extraction What:
System When:
How:
Text collection
Subject: curriculum meeting
Event: Curriculum mtg
Date: January 15, 2012 Date: Jan-16-2012
To: Dan Jurafsky Start: 10:00am
End: 1:30am
Hi Dan, we’ve now scheduled the curriculum W mheeereti:ng.Gates 159
It will be in Gates 159 tomorrow from 10:00-11:30.
-Chris
Application of NLP (cont)

3- Information Extraction (IE) cont.


➢ Recognition, tagging, and extraction into a structured
representation, certain key elements of information, e.g.
persons, companies, locations, organizations, from large
collections of text.

➢ These extractions can then be utilized for a range of


applications including question-answering, visualization, and
data mining.
➢ Ex, Monster.com, HotJobs.com (Job finders) .
Application of NLP (cont)

4- Question-Answering
➢ In contrast to Information Retrieval, which provides a list
of potentially relevant documents in response to a user’s
query.
➢ provides the user with either just the text of the answer
itself or answer-providing passages.

➢ Ex. Ask Jeeves


Application of NLP (cont)

5- Summarization
➢ reduces a larger text into a shorter, yet richly constituted
abbreviated narrative representation of the original
document.
➢ Very context-dependent!
➢ Ex. Tools for noobs.
Application of NLP (cont)

6- Machine translation
➢ perhaps the oldest of all NLP applications, various levels
of NLP have been utilized in MT systems, ranging from the
‘word-based’ approach to applications that include higher
levels of analysis.
➢ EX, Google, SysTtran
Level of difficulties
Mostly Solved Good progress Still Hard
Easy intermediate Hard
Easy intermediate Hard

Paraphrasing

Automatic short answer


scoring

4/21/20 22 22
The Problem of NLP

➢ When people see text, they understand its meaning (by and large)
➢ When computers see text, they get only character strings (and perhaps HTML
tags)
➢ We'd like computer agents to see meanings and be able to intelligently process
text
➢ These desires have led to many proposals for structured, semantically marked
up formats
➢ But often human beings still resolutely make use of text in human languages
➢ This problem isn’t likely to just go away.
➢ Ambiguities (Syntactic , Semantic)
General NLP—Too Difficult!
• Word-level ambiguity
• “design” can be a noun or a verb (Ambiguous POS)
• “root” has multiple meanings (Ambiguous sense)
• Syntactic ambiguity
• “natural language processing” (Modification)
• “A man saw a boy with a telescope.” (PP Attachment)
• Anaphora resolution
• “John persuaded Bill to buy a TV for himself.”
(himself = John or Bill?)
• Presupposition (Assumption)
• “He has quit smoking.” implies that he smoked before.

Humans rely on context to interpret (when possible).


This context may extend beyond a given document!
General NLP—Too Difficult!
• Word-level ambiguity
• “design” can be a noun or a verb (Ambiguous POS)
• “root” has multiple meanings (Ambiguous sense)
• Syntactic ambiguity
• “natural language processing” (Modification)
• “A man saw a boy with a telescope.” (PP Attachment)
• Anaphora resolution
• “John persuaded Bill to buy a TV for himself.”
(himself = John or Bill?)
• Presupposition (Assumption)
• “He has quit smoking.” implies that he smoked before.

Humans rely on context to interpret (when possible).


This context may extend beyond a given document!
Language Processing Tasks

• Processing spoken language involves all NLP stages, plus speech


recognition

• Processing written text using lexical, syntactic and semantic


knowledge about the language, as well as the required real world
information

• Another dimension understanding (analysis,Parsing) vs. generation


(synthesis)
Stages of language processing
1- Phonetics and phonology Speech sound

Dividing the whole chunk of txt into


2- Lexical Analysis
paragraphs, sentences, and words

3- Morphology & Lexicon Words & their forms

4- Syntactic Analysis Structure of sentences

5- Semantic Analysis Meaning of words & sentences

6- Pragmatics Meaning in context & for a purpose

Connected sentence processing in a


7- Discourse larger body of text
Stages of language processing (Cont.)

Phonetics and phonology

• How words are related to their sound


• Every language has an “alphabet” of sound called phonemes
• Phoneme is the smallest unit of sound
• Sound waves are continuous but phonemes are discrete.
• In order to understand a speech, a computer must segment the
continuous stream of speech into discrete sounds, then classify
each sound as a particular phoneme.
Stages of language processing (Cont.)

Phonetics and phonology Human Speech


• Difficult medium
– Background noise
– Words can be pronounced very differently
• different people: accents, age, sex
• same person: emotional state, illness
– Words maybe pronounced alike with different meaning
• Week → weak
• To → two
• Sandwich → sand which
• Computer speech relies heavily on waveform analysis and pattern recognition
Stages of language processing (Cont.)

Lexical analysis Tokenization


• A sentence is a sequence of tokens ended by a period, a colon,
a semicolon, an exclamation point, or a question mark
• The process of segmenting a string of characters into words is
known as tokenization, and maybe assign part of speech
(POS) to each word
• A sequence of tokens separated by blanks. Blank characters
are white spaces, carriage returns, tabulations, etc.
Stages of language processing (Cont.)

Lexical analysis Tokenization


How to use sentence tokenize in NLTK?
After installing nltk and nltk_data , you can launch python and import
sent_tokenize tool from nltk:
>>> text = “this’s a sent tokenize test. this is sent two. is this sent three? sent 4
is cool! Now it’s your turn.”
>>> from nltk.tokenize import sent_tokenize
>>> sent_tokenize_list = sent_tokenize(text)
>>> len(sent_tokenize_list)
5
>>> sent_tokenize_list
[“this’s a sent tokenize test.”, ‘this is sent two.’, ‘is this sent three?’, ‘sent 4 is
cool!’, “Now it’s your turn.”]
Stages of language processing (Cont.)

Lexical analysis Tokenization


Tokenizing text into words
Tokenizing text into words in NLTK is very simple, just
called word_tokenize from nltk.tokenize module:

>>> from nltk.tokenize import word_tokenize

>>> word_tokenize(‘Hello World.’)


[‘Hello’, ‘World’, ‘.’]

>>> word_tokenize(“this’s a test”)


[‘this’, “‘s”, ‘a’, ‘test’]
Stages of language processing (Cont.)

Morphological Analysis
• Purpose determine meanings of individual word. is the study of how root words
and affixes – the morphemes – are composed to form words- Morpheme − It is
primitive unit of meaning in a language.
• Analyzing words into their linguistic components
– Replace original word by root+affixes
• unbreakable → un + break + able ( ‘under’)
• Lookup the root in a database of meanings : a lexicon
• Problem word level ambiguity words may have several meanings, the correct
one cannot be chosen
– Example : the word “bank”, the word “mean”
– Further problem domain specialized meanings
Stages of language processing (Cont.)

Syntactic Analysis

• Parsing : It involves analysis of words in the sentence for grammar and arranging
words in a manner that shows the relationship among the words.

• Parsing: given a sentence and a grammar


- Checks that the sentence is correct according with the grammar and if so
returns a parse tree representing the structure of the sentence.
Stages of language processing (Cont.)

Semantic Analysis

• It draws the exact meaning or the dictionary meaning from the text. The text is
checked for meaningfulness. It is done by mapping syntactic structures and
objects in the task domain.

• The semantic analyzer disregards sentence such as


“hot ice-cream”.
Stages of language processing (Cont.)

Pragmatic Analysis

• During this, what was said is re-interpreted on what it actually meant. It


involves deriving those aspects of language which require real world
knowledge.

EX. Backward & forward references – Coreference resolution


“The man went near the dog. It hits him.”
Often co reference & ambiguity go together as in –
“The dog went near the cat. It hits it.”
Stages of language processing (Cont.)

Discourse

• The meaning of any sentence depends upon the meaning of the sentence
just before it. In addition, it also brings about the meaning of
immediately succeeding sentence.

EX. User situation & context


“Is that water?” – the action to be performed is different in a chemistry lab and
on a dining table.

You might also like