Chapter 1
Chapter 1
Mestry
Ch. 1 Introduction
Natural Language Processing (NLP)
Natural Language Processing (NLP) is a field of research and application that determines the way
computers can be used to understand and manage natural language text or speech to do useful things.
History of NLP
1950- NLP started when Alan Turing published an article called "Machine and Intelligence."
1960- The work of Chomsky and others on formal language theory and generative syntax
Background
Solving the language-related problems and others like them, is the main concern of the fields known as
Natural Language Processing, Computational Linguistics, and Speech Recognition and Synthesis, which
together we call Speech and Language Processing(SLP).
Generic NLP
Natural Language Processing Notes By Prof. Suresh R. Mestry
Levels of NLP
The NLP can broadly be divided into various levels as shown in figure
1. Phonology: It deals with interpretation of speech sound within and across words.
2. Morphology: It is a study of the way words are built up from smaller meaning-bearing units called
morphemes. For example, the word ‘fox’ has single morpheme while the word ‘cats’ have two
morphemes ‘cat’ and morpheme ‘–s’ represents singular and plural concepts. Morphological
lexicon is the list of stem and affixes together with basic information, whether the stem is a Noun
stem or a Verb stem.
3. Syntax: It is a study of formal relationships between words. It is a study of: how words are
clustered in classes in the form of Part-of Speech (POS), how they are grouped with their
neighbors into phrases, and the way words depend on each other in a sentence.
4. Semantics: It is a study of the meaning of words that are associated with grammatical structure. It
consists of two kinds of approaches: syntax-driven semantic analysis and semantic grammar. The
detailed explanation of this level is discussed in chapter 4. In discourse context, the level of NLP
works with text longer than a sentence. There are two types of discourse- anaphora resolution and
discourse/text structure recognition. Anaphora resolution is replacing of words such as pronouns.
Discourse structure recognition determines the function of sentences in the text which adds
meaningful representation of the text.
5. Reasoning: To produce an answer to a question which is not explicitly stored in a database;
Natural Language Interface to Database (NLIDB) carries out reasoning based on data stored in the
database. For example, consider the database that holds the academic information about student,
and user posed a query such as: ‘Which student is likely to fail in Maths subject?’. To answer the
query, NLIDB needs a domain expert to narrow down the reasoning process.
What distinguishes these language processing applications from other data processing systems is their use
of knowledge of language.
Consider the Unix wc program, which is used to count the total number of bytes, words, and lines in a
text file. When used to count bytes and lines, wc is an ordinary data processing application. However,
when it is used to count the words in a file it requires knowledge about what it means to be a word, and
thus becomes a language processing system. Of course, wc is an extremely simple system with an
extremely limited and impoverished knowledge of language.
To summarize, the knowledge of language needed to engage in complex language behavior can be separated into
six distinct categories.
1. Phonetics and Phonology – The study of linguistic sounds.
_
4. Semantics – The study of meaning. Pragmatics – The study of how language is used to accomplish goals.
_ _
Ambiguity can occur at all NLP levels. It is a property of linguistic expressions. If an expression
(word/phrase/sentence) has more than one interpretation we can refer it as ambiguous.
For eg: Consider the sentence,
“The chicken is ready to eat.”
The interpretations in the above phrase can be,
The chicken (bird) is ready to be feeder or The chicken (food) is ready to be eaten.
Stages in NLP
Natural Language Processing Notes By Prof. Suresh R. Mestry
Syntactic Analysis
Syntax concerns the proper ordering of words and its affect on meaning
This involves analysis of the words in a sentence to depict the grammatical structure of the
sentence
The words are transformed into structure that shows how the words are related to each other
Eg. “the girl the go to the school”. This would definitely be rejected by the English syntactic
analyzer
E.g. “Ravi apple eats”
Semantic Analysis
Semantics concerns the (literal) meaning of words, phrases, and sentences
This abstracts the dictionary meaning or the exact meaning from context
The structures which are created by the syntactic analyzer are assigned meaning
E.g.. “colorless blue idea” .This would be rejected by the analyzer as colorless blue do not make
any sense together
E.g. “Stone eat apple”
Discourse Integration
Sense of the context
The meaning of any single sentence depends upon the sentences that precedes it and also invokes
the meaning of the sentences that follow it
E.g. the word “it” in the sentence “she wanted it” depends upon the prior discourse context
Pragmatic Analysis
Pragmatics concerns the overall communicative and social context and its effect on interpretation
It means abstracting or deriving the purposeful use of the language in situations
Importantly those aspects of language which require world knowledge
The main focus is on what was said is reinterpreted on what it actually means
E.g. “close the window?” should have been interpreted as a request rather than an order 1
Challenges of NLP
Ambiguity
o Lexical/morphological: change (V,N), training (V,N), even (ADJ, ADV) …
o Syntactic: Helicopter powered by human flies
o Semantic: He saw a man on the hill with a telescope.
o Discourse: anaphora,
Classical solution
o Using a later analysis to solve ambiguity of an earlier step
Natural Language Processing Notes By Prof. Suresh R. Mestry
Applications of NLP
Machine Translation
Database Access
Information Retrieval
o Selecting from a set of documents the ones that are relevant to a query
Text Categorization
question-answering systems, where natural language is used to query a database (for example, a
query system to a personnel database)
automated customer service over the telephone (for example, to perform banking transactions or
order items from a catalogue)
tutoring systems, where the machine interacts with a student (for example, an automated
mathematics tutoring system)
spoken language control of a machine (for example, voice control of a VCR or computer)