Natural Language Processing PDF
Natural Language Processing PDF
Key Points
• What & why NLP?
• Steps involves in language processing:
Morphological, syntactic, semantic, pragmatic,
discourse integration
• Parsing: bottom up vs. top down
• Parser Implementation: ATN
• Why semantic is important?
• CFG/CSG/TGG
Natural Language Processing (NLP)
• Natural Language?
– Refers to the language spoken/written by people, e.g. English,
Japanese, Bengali, as opposed to artificial languages, like C++,
Java, LISP, Prolog, etc.
• Natural Language Processing
– Applications that deal with natural language in a way or another
– Computers use (analyze, understand, generate) natural language
– A somewhat applied field
▪ Computational Linguistics
– Doing linguistics on computers
– More on the linguistic side than NLP, but closely related
– More theoretical
Why NLP?
• Language is meant for
communicating about
the world
– NLP offers insights into
language
• Language is the
medium of the web
• Help in communication
– With computers
Kismet: MIT Media Lab
– With other humans
(MT)
– HCI/HRI
Why NLP?
• Classify text into categories
• Index and search large texts
Applications for • Automatic translation
processing large • Speech understanding
amounts of texts – Understand phone conversations
-require NLP
• Information extraction
expertise
– Extract useful information from resumes
• Automatic summarization
– Condense 1 book into 1 page
• Question answering
• Knowledge acquisition
• Text generations / dialogues
NLP Involves ….
• Two tasks:
❑Processing written text
-Lexical
-Syntactic
-Semantic
-Pragmatic
❑Processing spoken language,
-Lexical, Syntactic, Semantic, Pragmatic+++
- Phonology
NLP Involves ….
Natural Language Understanding (NLU): understanding & reasoning
while the input is NL
-Internal structure of the input NL
Natural Language Generation (NLG): generate other language
Understanding Generation
Linguistics and Language Processing
High-level
Low-level
Steps in the NLP
• Morphological/Lexical Analysis: Individual words are
analyzed into their components
-non-word tokens such as punctuation are separated from
the words.
-Divide the text into paragraphs, sentences & words
• Case grammars:
(printed (agent Susan)
(object File))
Example continued
• Syntactic structure are same
• Case1: Mother is the subject of baked
• Case2: the pie is the subject