1.introduction To Natural Language Processing (NLP)
1.introduction To Natural Language Processing (NLP)
Processing (NLP)
Why NLP ?
• According to industry estimates, only 21% of the available
data is present in structured form (Report : 12 Jan, 2017).
• Data is being generated as we speak, as we tweet, as we send
messages on Whatsapp and in various other activities.
• Majority of this data exists in the textual form, which is highly
unstructured in nature.
• Few notorious examples include – posts on social media, user
to user chat conversations, news, blogs and articles, product
or services reviews and patient records in the healthcare
sector. A few more recent ones includes chatbots and other
voice driven bots.
Why NLP ?
• Despite having high dimension data, the information present in it
is not directly accessible unless it is processed (read and
understood) manually or analyzed by an automated system.
• Apart from common word processor operations that treat text like
a mere()ص رف
ِ sequence of symbols, NLP considers the hierarchical
structure of language: several words make a phrase, several
phrases make a sentence and, ultimately, sentences convey ideas.
• By analyzing language for its meaning, NLP systems have long filled
useful rules, such as correcting grammar, converting speech to text
and automatically translating between languages.
What is Natural Language Processing?
• NLP is used to analyze text, allowing machines to understand
how human’s speak
• Spelling correctors
• Optical Character Recognition software
• Grammar and style checkers
Big Applications
• Question answering
• Conversational agents (live chat etc )
• Text summarization
• Machine translation
Modern Applications
NEXT We will discuss
• Knowledge of language
• Ambiguity
• Models and algorithms
Note : The field of NLP involves making computers to perform useful tasks with
the natural languages humans use. The input and output of an NLP system can
be -
• Speech
• Written Text
Knowledge of Language
• Phonetics and phonology: speech sounds, their
production, and the rule systems that govern their
use
• Morphology: words and their composition from
more basic units
- Cat, cats (inflectional morphology)
- Child, children
- Friend, friendly (derivational morphology)
Knowledge of Language
• Comparison:
- Phonetics: Analyzes the production of all human speech
sounds, regardless of language.
- Phonology : Analyzes the sound patterns of a particular
language by determining which phonetic sounds
are significant, and explaining how these sounds
are interpreted by the native speaker.
Note
• Phonology Is the basis for further work in morphology, syntax, discourse, and
orthography design.
• Constructing words from phonemes (e.g. “th”+”i”+”ng”=thing)
Knowledge of Language
• Syntax: the structuring of words into legal
larger phrases and sentences
Note :
Syntax means structure of the sentences
Knowledge of Language
• Semantics: The meaning of words and phrases
- Word-sense disambiguation:
River bank vs. financial bank
Knowledge of Language
• Pragmatics: It deals with using and understanding
sentences in different situations and how the interpretation
of the sentence is affected. It relating to a practical point of
view or practical considerations.
• Uses context of utterance
– Where, by who, to whom, why, when it was said
– Intentions: inform, request, promise, criticize,..
Examples
– Do you have a stapler?
– What is the time by your watch ?
Handling ambiguity
– Pragmatic ambiguity: “you’re late”: What’s the speaker’s
intention: informing or criticizing?
Knowledge of Language
• Discourse: It deals with how the immediately preceding sentence
can affect the interpretation of the next sentence. Discourse is spoken or
written communication between people, especially serious discussion of a
particular subject.
• Discourse defines what statements can be said about a topic.
Example :
– Sue took the trip to New York. She had a great
time there.
• Sue/she;
• New York/there;
• took/had (time)
Ambiguity
• There is ambiguity at all levels of language
Example :
• I saw the woman with the telescope
• Syntactically ambiguous:
– I saw (NP the woman with the telescope)
– I saw (NP the woman) (PP with the
telescope)
Models and Algorithms
• Models (as we are using the term here):
– Formalisms to represent linguistic knowledge
• Algorithms:
– Used to manipulate the representations and
produce the desired behavior
• choosing among possibilities and
combining pieces
Note :
World Knowledge : It includes the general knowledge
about the world.
Google Translator
• https://translate.google.com/#zh-CN/ko/my%20name%20is%20asad%2C%20and
%20ur%20name
Note
Reference Materials
1. Daniel Jurafsky and James H. Martin. 2008. Speech and Language Processing:
An Introduction to Natural Language Processing, Computational Linguistics
and Speech Recognition. Second Edition. Prentice Hall.