NLP_Conventional
NLP_Conventional
1
NLP : Broad Classification
2
NLP : History
3
Forms of Natural Language
• The input/output of a NLP system can be:
– written text
– speech
• We will mostly concerned with written text (not speech).
• To process written text, we need:
– lexical, syntactic, semantic knowledge about the language
– discourse information, real world knowledge
• To process spoken language, we need everything required
to process written text, plus the challenges of speech recognition
and speech synthesis.
4
Components of NLP
• Natural Language Understanding
– Mapping the given input in the natural language into a useful representation.
– Different level of analysis required:
morphological analysis,
syntactic analysis,
semantic analysis,
discourse analysis, …
• Natural Language Generation
– Producing output in the natural language from some internal representation.
– Different level of synthesis required:
deep planning (what to say),
syntactic generation
• NL Understanding is much harder than NL Generation.
But, still both of them are hard.
5
Why NL Understanding is hard?
• Natural language is extremely rich in form and structure, and
very ambiguous.
– How to represent meaning,
– Which structures map to which meaning structures.
• One input can mean many different things. Ambiguity can be at
different levels.
– Lexical (word level) ambiguity -- different meanings of words
– Syntactic ambiguity -- different ways to parse the sentence
– Interpreting partial information -- how to interpret pronouns
– Contextual information -- context of the sentence may affect the meaning of that
sentence.
• Many input can mean the same thing.
• Interaction among components of the input is not clear.
6
Knowledge of Language
• Phonology – concerns how words are related to the sounds that
realize them.
8
Ambiguity
9
Ambiguity (cont.)
• Some interpretations of : I made her duck.
1. I cooked duck for her.
2. I cooked duck belonging to her.
3. I created a toy duck which she owns.
4. I caused her to quickly lower her head or body.
5. I used magic and turned her into a duck.
• duck – morphologically and syntactically ambiguous:
noun or verb.
• her – syntactically ambiguous: dative or possessive.
• make – semantically ambiguous: cook or create.
• make – syntactically ambiguous:
– Transitive – takes a direct object. => 2
– Di-transitive – takes two objects. => 5
– Takes a direct object and a verb. => 4
10
Resolve Ambiguities
• We will introduce models and algorithms to resolve ambiguities
at different levels.
• part-of-speech tagging -- Deciding whether duck is verb or
noun.
• word-sense disambiguation -- Deciding whether make is
create or cook.
• lexical disambiguation -- Resolution of part-of-speech and
word-sense ambiguities are two important kinds of lexical
disambiguation.
• syntactic ambiguity -- her duck is an example of syntactic
ambiguity, and can be addressed by probabilistic parsing.
11
Resolve Ambiguities (cont.)
I made her duck
S S
NP VP NP VP
I V NP NP I V NP
her duck
12
Models to Represent Linguistic Knowledge
• We use certain formalisms (models) to represent the required
linguistic knowledge.
• State Machines -- FSAs, FSTs, HMMs, ATNs, RTNs
• Formal Rule Systems -- Context Free Grammars, Unification
Grammars, Probabilistic CFGs.
• Logic-based Formalisms -- first order predicate logic, some
higher order logic.
• Models of Uncertainty -- Bayesian probability theory.
13
Language and Intelligence
Turing Test
Computer Human
Human Judge
14
Some NLP Applications
• Machine Translation – Translation between two natural
languages.
– See the Babel Fish translations system on Alta Vista.
• Information Retrieval – Web search (uni-lingual or multi-lingual).
• Query Answering/Dialogue – Natural language interface with a
database system, or a dialogue system.
• Report Generation – Generation of reports such as weather
reports.
• Some Small Applications –
– Grammar Checking, Spell Checking, Spell Corrector
15
Natural Language Understanding
Words
Morphological Analysis
Morphologically analyzed words (another step: POS tagging)
Syntactic Analysis
Syntactic Structure
Semantic Analysis
Context-independent meaning representation
Discourse Processing
Final meaning representation
16
Natural Language Generation
Meaning representation
Utterance Planning
Meaning representations for sentences
Sentence Planning and Lexical Choice
Syntactic structures of sentences with lexical choices
Sentence Generation
Morphologically analyzed words
Morphological Generation
Words
17
Morphological Analysis
• Analyzing words into their linguistic components (morphemes).
• Morphemes are the smallest meaningful units of language.
cars car+PLU
giving give+PROG
geliyordum gel+PROG+PAST+1SG - I was coming
• Ambiguity: More than one alternatives
flies flyVERB+PROG
flyNOUN+PLU
18
Morphological Analysis (cont.)
• Relatively simple for English. But for some languages such as
Turkish, it is more difficult.
uygarlaştıramadıklarımızdanmışsınızcasına
uygar-laş-tır-ama-dık-lar-ımız-dan-mış-sınız-casına
uygar +BEC +CAUS +NEGABLE +PPART +PL +P1PL +ABL +PAST +2PL +AsIf
“(behaving) as if you are among those whom we could not civilize/cause to become civilized”
+BEC is “become” in English
+CAUS is the causative voice marker on a verb
+PPART marks a past participle form
+P1PL is 1st person plural possessive marker
+2PL is 2nd person plural
+ABL is the ablative (from/among) case marker
+AsIf is a derivational marker that forms an adverb from a finite verb form
+NEGABLE is “not able” in English
20
Lexical Processing
• The purpose of lexical processing is to determine meanings of
individual words.
• Basic methods is to lookup in a database of meanings -- lexicon
• We should also identify non-words such as punctuation marks.
• Word-level ambiguity -- words may have several meanings, and
the correct one cannot be chosen based solely on the word itself.
– bank in English
– yüz in Turkish
• Solution -- resolve the ambiguity on the spot by POS tagging
(if possible) or pass-on the ambiguity to the other levels.
21
Syntactic Processing
• Parsing -- converting a flat input sentence into a hierarchical
structure that corresponds to the units of meaning in the sentence.
• There are different parsing formalisms and algorithms.
• Most formalisms have two main components:
– grammar -- a declarative representation describing the syntactic structure of
sentences in the language.
– parser -- an algorithm that analyzes the input and outputs its structural
representation (its parse) consistent with the grammar specification.
• CFGs are in the center of many of the parsing mechanisms. But
they are complemented by some additional features that make the
formalism more suitable to handle natural languages.
22
Semantic Analysis
• Assigning meanings to the structures created by syntactic
analysis.
• Mapping words and structures to particular domain objects in way
consistent with our knowledge of the world.
• Semantic can play an import role in selecting among competing
syntactic analyses and discarding illogical analyses.
– I robbed the bank -- bank is a river bank or a financial institution
• We have to decide the formalisms which will be used in the
meaning representation.
23
Knowledge Representation for NLP
• Which knowledge representation will be used depends on the
application -- Machine Translation, Database Query System.
• Requires the choice of representational framework, as well as the
specific meaning vocabulary (what are concepts and relationship
between these concepts -- ontology)
• Must be computationally effective.
• Common representational formalisms:
– first order predicate logic
– conceptual dependency graphs
– semantic networks
– Frame-based representations
24
Discourse
• Discourses are collection of coherent sentences (not arbitrary set
of sentences)
• Discourses have also hierarchical structures (similar to sentences)
• anaphora resolution -- to resolve referring expression
– Mary bought a book for Kelly. She didn’t like it.
• She refers to Mary or Kelly. -- possibly Kelly
• It refers to what -- book.
– Mary had to lie for Kelly. She didn’t like it.
• Discourse structure may depend on application.
– Monologue
– Dialogue
– Human-Computer Interaction
25
Natural Language Generation
• NLG is the process of constructing natural language outputs from
non-linguistic inputs.
• NLG can be viewed as the reverse process of NL understanding.
• A NLG system may have two main parts:
– Discourse Planner -- what will be generated. which sentences.
– Surface Realizer -- realizes a sentence from its internal
representation.
• Lexical Selection -- selecting the correct words describing the
concepts.
26
Machine Translation
• Machine Translation -- converting a text in language A into the
corresponding text in language B (or speech).
• Different Machine Translation architectures:
– interlingua based systems
– transfer based systems
• How to acquire the required knowledge resources such as
mapping rules and bi-lingual dictionary? By hand or acquire them
automatically from corpora.
• Example Based Machine Translation acquires the required
knowledge (some of it or all of it) from corpora.
27