Introducing Natural Language Processing
Introducing Natural Language Processing
Winter 2023
Source: https://www.ibm.com/topics/data-governance
1
Natural language processing strives to
build machines that understand and
respond to text or voice data—and respond
with text or speech of their own—in much
the same way humans do.
2
What is Natural Language Processing?
• Branch of Artificial Intelligence
• Concerned with giving computers the ability to understand text and
spoken words in much the same way human beings can.
• Combines computational linguistics (rule-based modeling of human
language) with statistical, machine learning, and deep learning
models.
• These enable computers to process human language in the form of
text or voice data and to ‘understand’ its full meaning, complete with
the speaker or writer’s intent and sentiment.
3 Source:https://www.ibm.com/topics/natural-language-processing
Natural language processing (NLP) refers to the branch of computer science—and more
specifically, the branch of artificial intelligence or AI—concerned with giving computers the ability
to understand text and spoken words in much the same way human beings can.
3
What is Natural Language Processing?
• Drives computer programs that:
• translate text from one language to another,
• respond to spoken commands, and
• summarize large volumes of text rapidly
• Plays a growing role in enterprise solutions that:
• help streamline business operations,
• increase employee productivity, and
• simplify mission-critical business processes.
4 Source:https://www.ibm.com/topics/natural-language-processing
NLP drives computer programs that translate text from one language to another, respond to
spoken commands, and summarize large volumes of text rapidly—even in real time. There’s a
good chance you’ve interacted with NLP in the form of voice-operated GPS systems, digital
assistants, speech-to-text dictation software, customer service chatbots, and other consumer
conveniences. But NLP also plays a growing role in enterprise solutions that help streamline
business operations, increase employee productivity, and simplify mission-critical business
processes.
4
Natural Language Processing - Tasks
• Human language is filled with ambiguities
• It is incredibly difficult to write software that accurately determines the
intended meaning of text or voice data.
• Homonyms, homophones, sarcasm, idioms, metaphors,
grammar and usage exceptions, variations in sentence
structure
• Programmers must teach natural language-driven applications to
recognize and understand accurately from the start
5 Source:https://www.ibm.com/topics/natural-language-processing
Human language is filled with ambiguities that make it incredibly difficult to write software that
accurately determines the intended meaning of text or voice data. Homonyms, homophones,
sarcasm, idioms, metaphors, grammar and usage exceptions, variations in sentence structure—
these just a few of the irregularities of human language that take humans years to learn, but that
programmers must teach natural language-driven applications to recognize and understand
accurately from the start, if those applications are going to be useful.
5
Natural Language Processing - Tasks
NLP tasks break down human text and voice data to help the computer make
sense of what it's ingesting, including:
• Speech recognition
• Part of speech tagging
• Word sense disambiguation
• Named entity recognition
• Co-reference resolution
• Sentiment analysis
• Natural language generation
6 Source:https://www.ibm.com/topics/natural-language-processing
Several NLP tasks break down human text and voice data in ways that help the computer make
sense of what it's ingesting. Some of these tasks include the following:
• Speech recognition, also called speech-to-text, is the task of reliably converting voice data into
text data. Speech recognition is required for any application that follows voice commands or
answers spoken questions. What makes speech recognition especially challenging is the way
people talk—quickly, slurring words together, with varying emphasis and intonation, in
different accents, and often using incorrect grammar.
• Part of speech tagging, also called grammatical tagging, is the process of determining the part
of speech of a particular word or piece of text based on its use and context. Part of speech
identifies ‘make’ as a verb in ‘I can make a paper plane,’ and as a noun in ‘What make of car do
you own?’
• Word sense disambiguation is the selection of the meaning of a word with multiple meanings
through a process of semantic analysis that determine the word that makes the most sense in
the given context. For example, word sense disambiguation helps distinguish the meaning of
the verb 'make' in ‘make the grade’ (achieve) vs. ‘make a bet’ (place).
• Named entity recognition, or NEM, identifies words or phrases as useful entities. NEM
identifies ‘Kentucky’ as a location or ‘Fred’ as a man's name.
• Co-reference resolution is the task of identifying if and when two words refer to the same
entity. The most common example is determining the person or object to which a certain
pronoun refers (e.g., ‘she’ = ‘Mary’), but it can also involve identifying a metaphor or an idiom
in the text (e.g., an instance in which 'bear' isn't an animal but a large hairy person).
• Sentiment analysis attempts to extract subjective qualities—attitudes, emotions, sarcasm,
confusion, suspicion—from text.
• Natural language generation is sometimes described as the opposite of speech recognition or
6
speech-to-text; it's the task of putting structured information into human language.
See the blog post “NLP vs. NLU vs. NLG: the differences between three natural language
processing concepts” for a deeper look into how these concepts relate.
6
NLP – Tools and Approaches
Python and the Natural Language Toolkit (NLTK)
• Wide range of tools and libraries for attacking specific NLP tasks - many found
in the Natural Language Toolkit, or NLTK
• The NLTK includes libraries for many of the NLP tasks listed above, plus
libraries for subtasks: sentence parsing, word segmentation, stemming and
lemmatization and tokenization
Statistical NLP, machine learning, and deep learning
• Combines computer algorithms with machine learning and deep learning
models to automatically extract, classify, and label elements of text and voice
data
7 Source:https://www.ibm.com/topics/natural-language-processing
The NLTK includes libraries for many of the NLP tasks listed above, plus libraries for subtasks, such
as sentence parsing, word segmentation, stemming and lemmatization (methods of trimming
words down to their roots), and tokenization (for breaking phrases, sentences, paragraphs and
passages into tokens that help the computer better understand the text). It also includes libraries
for implementing capabilities such as semantic reasoning, the ability to reach logical conclusions
based on facts extracted from text.
Enter statistical NLP, which combines computer algorithms with machine learning and deep
learning models to automatically extract, classify, and label elements of text and voice data and
then assign a statistical likelihood to each possible meaning of those elements. Today, deep
learning models and learning techniques based on convolutional neural networks (CNNs) and
recurrent neural networks (RNNs) enable NLP systems that 'learn' as they work and extract ever
more accurate meaning from huge volumes of raw, unstructured, and unlabeled text and voice
data sets.
7
For a deeper dive into the nuances between these technologies and their learning approaches,
see “AI vs. Machine Learning vs. Deep Learning vs. Neural Networks: What’s the Difference?”
7
NLP – Use Cases
Natural language processing is the driving force behind machine
intelligence in many modern real-world applications:
• Spam detection
• Machine translation
• Virtual agents and chatbots
• Social media sentiment analysis
• Text summarization
Natural language processing is the driving force behind machine intelligence in many modern
real-world applications. Here are a few examples:
• Spam detection: You may not think of spam detection as an NLP solution, but the best spam
detection technologies use NLP's text classification capabilities to scan emails for language
that often indicates spam or phishing. These indicators can include overuse of financial terms,
characteristic bad grammar, threatening language, inappropriate urgency, misspelled
company names, and more. Spam detection is one of a handful of NLP problems that experts
consider 'mostly solved' (although you may argue that this doesn’t match your email
experience).
• Virtual agents and chatbots: Virtual agents such as Apple's Siri and Amazon's Alexa use speech
recognition to recognize patterns in voice commands and natural language generation to
respond with appropriate action or helpful comments. Chatbots perform the same magic in
8
response to typed text entries. The best of these also learn to recognize contextual clues
about human requests and use them to provide even better responses or options over time.
The next enhancement for these applications is question answering, the ability to respond to
our questions—anticipated or not—with relevant and helpful answers in their own words.
• Social media sentiment analysis: NLP has become an essential business tool for uncovering
hidden data insights from social media channels. Sentiment analysis can analyze language
used in social media posts, responses, reviews, and more to extract attitudes and emotions in
response to products, promotions, and events–information companies can use in product
designs, advertising campaigns, and more.
• Text summarization: Text summarization uses NLP techniques to digest huge volumes of digital
text and create summaries and synopses for indexes, research databases, or busy readers who
don't have time to read full text. The best text summarization applications use semantic
reasoning and natural language generation (NLG) to add useful context and conclusions to
summaries.
8
References
9
References
Natural Language Processing:
NLP vs. NLU vs. NLG: the differences between three natural language processing concepts
https://www.ibm.com/blogs/watson/2020/11/nlp-vs-nlu-vs-nlg-the-differences-between-three-natural-
language-processing-concepts/
AI vs. Machine Learning vs. Deep Learning vs. Neural Networks: What’s the Difference?
https://www.ibm.com/cloud/blog/ai-vs-machine-learning-vs-deep-learning-vs-neural-networks
10
10