0% found this document useful (0 votes)
2 views

Lec1 Introduction

NPL on report seminar new

Uploaded by

S Manoj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lec1 Introduction

NPL on report seminar new

Uploaded by

S Manoj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Natural Language Processing

Natural Language Processing 1


What is Natural Language Processing?
• Natural Language Processing (NLP) : The process of computer analysis of input
provided in a human language (natural language), and conversion of this input into
a useful form of representation.
• The field of NLP is primarily concerned with getting computers to perform useful
and interesting tasks with human languages.
• The field of NLP is secondarily concerned with helping us come to a better
understanding of human language.

• The goal of NLP field is to get computers to perform useful tasks involving human
language, tasks like enabling human-machine communication, improving human-
human communication, or simply doing useful processing of text or speech.

Natural Language Processing 2


Forms of Natural Language
• The input/output of a NLP system can be:
– written text
– speech
• We will mostly concerned with written text in this course (not speech).
• To process written text, we need:
– lexical, syntactic, semantic knowledge about the language
– discourse information, real world knowledge
• To process spoken language, we need everything required to process written text,
plus the challenges of speech recognition and speech synthesis.

Natural Language Processing 3


NLP Tasks
• An application that requires the use of knowledge about human languages can be seen
as a NLP task.
– Word count is a NLP application since we need to know what a word is. That’s knowledge
of language.
– Line or byte count is not a NLP application.

• Some big NLP Tasks require a tremendous amount of knowledge of language.


– Conversational agents
– Machine translation
– Question answering
– Information extraction

• … and many more NLP tasks

Natural Language Processing 4


NLP Tasks: Conversational agents
• HAL computer in the movie ``2001: A Space Odyssey`` is an artificial agent capable of
such advanced language-processing behavior as speaking and understanding English.
• We call programs like HAL that converse with humans via natural language
conversational agents or dialogue systems.

• These kinds of applications require a tremendous amount of knowledge of language.


– Speech recognition and synthesis
– Knowledge of the English words involved
– How groups of words clump and what the clumps mean?
– Discourse

Natural Language Processing 5


NLP Tasks: Machine translation
• The goal of machine translation is to automatically translate a document from one
language to another.
• Translation from Stanford’s Phrasal:

• Google Translate

Natural Language Processing 6


NLP Tasks: Question answering
• Question answering task is to find answers for the complete questions ranging from
easy to hard questions.
– What does “divergent” mean?
– What year was Abraham Lincoln born?
– How many states were in the United States that year?
– How much Chinese silk was exported to England by the end of the 18th century?
– What do scientists think about the ethics of human cloning?
• Some of these question, such as definition questions, or simple factoid questions like
dates and locations can be easily answered.
• Answering more complicated questions might require extracting information that is
embedded in the text, or doing inference (drawing conclusions based on known facts),
or synthesizing and summarizing information from multiple sources.

Natural Language Processing 7


NLP Tasks: Information extraction

• Information extraction is the extraction of events and its attributes from natural
language texts.

Natural Language Processing 8


Language Technology

Natural Language Processing 9


Knowledge in Language Processing
• What distinguishes language processing applications from other data processing
systems is their use of knowledge of language.
• Some simple NLP tasks require limited knowledge of language.
• Big NLP tasks such as conversational agents, machine translation systems, robust
question-answering systems, require much broader and deeper knowledge of language.

• Phonology – concerns how words are related to the sounds that realize them.

• Morphology – concerns how words are constructed from more basic meaning units
called morphemes. A morpheme is the primitive unit of meaning in a language.

• Syntax – concerns how can be put together to form correct sentences and determines
what structural role each word plays in the sentence and what phrases are subparts of
other phrases.

Natural Language Processing 10


Knowledge in Language Processing
• Semantics – concerns what words mean and how these meaning combine in sentences
to form sentence meaning. The study of context-independent meaning.

• Pragmatics – concerns how sentences are used in different situations and how use
affects the interpretation of the sentence.

• Discourse – concerns how the immediately preceding sentences affect the


interpretation of the next sentence.
– For example, interpreting pronouns and interpreting the temporal aspects of the
information.

• World Knowledge – includes general knowledge about the world.


– What each language user must know about the other’s beliefs and goals.

Natural Language Processing 11


Why NLP is hard?
• Natural language is extremely rich in form and structure, and very ambiguous.
– How to represent meaning,
– Which structures map to which meaning structures.
• One input can mean many different things and Ambiguity can be at different levels.
– Lexical (word level) ambiguity -- different meanings of words
– Syntactic ambiguity -- different ways to parse the sentence
– Interpreting partial information -- how to interpret pronouns
– Contextual information -- context of the sentence may affect the meaning of that sentence.
• Many input can mean the same thing.
• Interaction among components of the input is not clear.

Natural Language Processing 12


Ambiguity

I made her duck.

• How many different interpretations does this sentence have?


• What are the reasons for the ambiguity?
• The categories of knowledge of language can be thought of as ambiguity resolving
components.
• How can each ambiguous piece be resolved?
• Does speech input make the sentence even more ambiguous?
– Yes – deciding word boundaries

Natural Language Processing 13


Ambiguity (cont.)
• Some interpretations of : I made her duck.
1. I cooked duck for her.
2. I cooked duck belonging to her.
3. I created a toy duck which she owns.
4. I caused her to quickly lower her head or body.
5. I used magic and turned her into a duck.

• duck – morphologically and syntactically ambiguous: noun or verb.


• her – syntactically ambiguous: dative or possessive.
• make – semantically ambiguous: cook or create.
• make – syntactically ambiguous:
– Transitive – takes a direct object. => 2
– Di-transitive – takes two objects. => 5
– Takes a direct object and a verb. => 4

Natural Language Processing 14


Ambiguity in a Turkish Sentence
• Some interpretations of: Adamı gördüm.
1. I saw the man.
2. I saw my island.
3. I visited my island.
4. I bribed the man.
• Morphological Ambiguity:
– ada-m-ı ada+P1SG+ACC
– adam-ı adam+ACC
• Semantic Ambiguity:
– gör to see
– gör to visit
– gör to bribe

Natural Language Processing 15


Resolve Ambiguities
• We will introduce models and algorithms to resolve ambiguities at different levels.

• part-of-speech tagging -- Deciding whether duck is verb or noun.


• word-sense disambiguation -- Deciding whether make is create or cook.
• lexical disambiguation -- Resolution of part-of-speech and word-sense ambiguities
are two important kinds of lexical disambiguation.
• syntactic ambiguity -- her duck is an example of syntactic ambiguity, and can be
addressed by probabilistic parsing.

Natural Language Processing 16


Resolve Ambiguities (cont.)

Natural Language Processing 17


Models to Represent Linguistic Knowledge
• We will use certain formalisms (models) to represent the required linguistic
knowledge.

• State Machines -- FSAs, FSTs, HMMs, ATNs, RTNs

• Formal Rule Systems -- Context Free Grammars, Unification Grammars,


Probabilistic CFGs.

• Logic-based Formalisms -- first order predicate logic, some higher order logic.

• Models of Uncertainty -- Bayesian probability theory.

• Vector-space models – to represent meanings of words

Natural Language Processing 18


Algorithms to Manipulate Linguistic Knowledge
• We will use algorithms to manipulate the models of linguistic knowledge to produce
the desired behavior.
• Most of the algorithms we will study are transducers and parsers.
– These algorithms construct some structure based on their input.
• Since the language is ambiguous at all levels, these algorithms are never simple
processes.
• Categories of most algorithms that will be used can fall into following categories.
– state space search
– dynamic programming

Natural Language Processing 19


Language and Intelligence
Turing Test

Computer Human

Human Judge

• Human Judge asks tele-typed questions to Computer and Human.


• Computer’s job is to act like a human.
• Human’s job is to convince Judge that he is not machine.
• Computer is judged “intelligent” if it can fool the judge
• Judgment of intelligence is linked to appropriate answers to questions from the system.

Natural Language Processing 20


Natural Language Understanding

Words
Morphological Analysis

Morphologically analyzed words (another step: POS tagging)


Syntactic Analysis

Syntactic Structure
Semantic Analysis

Context-independent meaning representation


Discourse Processing

Final meaning representation

Natural Language Processing 21


Morphological Analysis
• Analyzing words into their linguistic components (morphemes).
• Morphemes are the smallest meaningful units of language.
cars car+PLU
giving give+PROG
geliyordum gel+PROG+PAST+1SG - I was coming

• Ambiguity: More than one alternatives


flies flyVERB+AOR
flyNOUN+PLU

adamı adam+ACC - the man (accusative)


adam+P3SG - his/her man
ada+P1SG+ACC - my island (accusative)

Natural Language Processing 22


Morphological Analysis (cont.)
• Relatively simple for English. But for some languages such as Turkish, it is more
difficult.
uygarlaştıramadıklarımızdanmışsınızcasına
uygar-laş-tır-ama-dık-lar-ımız-dan-mış-sınız-casına
uygar +BEC +CAUS +NEGABLE +PPART +PL +P1PL +ABL +PAST +2PL +AsIf
“(behaving) as if you are among those whom we could not civilize/cause to become civilized”
+BEC is “become” in English
+CAUS is the causative voice marker on a verb
+PPART marks a past participle form
+P1PL is 1st person plural possessive marker
+2PL is 2nd person plural
+ABL is the ablative (from/among) case marker
+AsIf is a derivational marker that forms an adverb from a finite verb form
+NEGABLE is “not able” in English

• Inflectional and Derivational Morphology.


• Common tools: Finite-state transducers

Natural Language Processing 23


Part-of-Speech (POS) Tagging
• Each word has a part-of-speech tag to describe its category.
• Part-of-speech tag of a word is one of major word groups (or its subgroups).
– open classes -- noun, verb, adjective, adverb
– closed classes -- prepositions, determiners, conjunctions, pronouns, participles
• POS Taggers try to find POS tags for the words.
• duck is a verb or noun? (morphological analyzer cannot make decision).
• A POS tagger may make that decision by looking the surrounding words.
– Duck! (verb)
– Duck is delicious for dinner. (noun)

Natural Language Processing 24


Lexical Processing
• The purpose of lexical processing is to determine meanings of individual words.
• Basic methods is to lookup in a database of meanings -- lexicon
• We should also identify non-words such as punctuation marks.
• Word-level ambiguity -- words may have several meanings, and the correct one cannot
be chosen based solely on the word itself.
– bank in English
– yüz in Turkish
• Solution -- resolve the ambiguity on the spot by POS tagging (if possible) or pass-on
the ambiguity to the other levels.

Natural Language Processing 25


Syntactic Processing
• Parsing -- converting a flat input sentence into a hierarchical structure that
corresponds to the units of meaning in the sentence.
• There are different parsing formalisms and algorithms.
• Most formalisms have two main components:
– grammar -- a declarative representation describing the syntactic structure of sentences in
the language.
– parser -- an algorithm that analyzes the input and outputs its structural representation (its
parse) consistent with the grammar specification.
• CFGs are in the center of many of the parsing mechanisms. But they are
complemented by some additional features that make the formalism more suitable to
handle natural languages.

Natural Language Processing 26


Semantic Analysis
• Assigning meanings to the structures created by syntactic analysis.
• Mapping words and structures to particular domain objects in way consistent with our
knowledge of the world.
• Semantic can play an import role in selecting among competing syntactic analyses and
discarding illogical analyses.
– I robbed the bank -- bank is a river bank or a financial institution

• We have to decide the formalisms which will be used in the meaning representation.

Natural Language Processing 27


Knowledge Representation for NLP
• Which knowledge representation will be used depends on the application .
– Requires the choice of representational framework, as well as the specific meaning
vocabulary (what are concepts and relationship between these concepts -- ontology)
– Must be computationally effective.
• Common representational formalisms:
– first order predicate logic
– conceptual dependency graphs
– semantic networks
– Frame-based representations
– Vector-space models

Natural Language Processing 28


Discourse
• Discourses are collection of coherent sentences (not arbitrary set of sentences)
• Discourses have also hierarchical structures (similar to sentences)
• anaphora resolution -- to resolve referring expression
– Mary bought a book for Kelly. She didn’t like it.
• She refers to Mary or Kelly. -- possibly Kelly
• It refers to what -- book.
– Mary had to lie for Kelly. She didn’t like it.

• Discourse structure may depend on application.


– Monologue
– Dialogue
– Human-Computer Interaction

Natural Language Processing 29


Natural Language Generation (NLG)
• Natural Language Generation (NLG) is the process of constructing natural language
outputs from non-linguistic inputs.

• NLG can be viewed as the reverse process of NL understanding.

• A NLG system may have two main parts:


– Discourse Planner -- what will be generated: which sentences.
– Surface Realizer -- realizes a sentence from its internal representation.
• Lexical Selection -- selecting the correct words describing the concepts.

Natural Language Processing 30

You might also like