NLP Notes
NLP Notes
What is Natural Language Processing? Explain ambiguity in natural languages with suitable examples.
Natural Language Processing (NLP) is a branch of computer science and artificial intelligence that
helps computers understand, interpret, and interact with human language. NLP enables machines to
process text or speech in ways that are meaningful, such as translating languages, understanding
spoken commands, or summarizing long documents.
Think of NLP as the technology behind systems like Siri, Google Translate, or chatbots that respond to
your questions. The goal of NLP is to teach machines to understand the way humans naturally
communicate.
1. Chatbots
Enables systems to have conversations with users, answering questions or performing tasks.
Chatbots are a form of artificial intelligence that are programmed to interact with humans in such a
way that they sound like humans themselves. Depending on the complexity of the chatbots, they can
either just respond to specific keywords or they can even hold full conversations that make it tough
to distinguish them from humans. Chatbots are created using Natural Language Processing
and Machine Learning, which means that they understand the complexities of the English language
and find the actual meaning of the sentence and they also learn from their conversations with
humans and become better with time. Chatbots work in two simple steps. First, they identify the
meaning of the question asked and collect all the data from the user that may be required to answer
the question. Then they answer the question appropriately.
Have you noticed that search engines tend to guess what you are typing and automatically complete
your sentences? For example, On typing “game” in Google, you may get further suggestions for
“game of thrones”, “game of life” or if you are interested in maths then “game theory”. All these
suggestions are provided using autocomplete that uses Natural Language Processing to guess what
you want to ask. Search engines use their enormous data sets to analyze what their customers are
probably typing when they enter particular words and suggest the most common possibilities. They
use Natural Language Processing to make sense of these words and how they are interconnected to
form different sentences.
3. Voice Assistants
These days voice assistants are all the rage! Whether its Siri, Alexa, or Google Assistant, almost
everyone uses one of these to make calls, place reminders, schedule meetings, set alarms, surf the
internet, etc. These voice assistants have made life much easier. But how do they work? They use a
complex combination of speech recognition, natural language understanding, and natural language
processing to understand what humans are saying and then act on it. The long term goal of voice
assistants is to become a bridge between humans and the internet and provide all manner of
services based on just voice interaction. However, they are still a little far from that goal seeing as Siri
still can’t understand what you are saying sometimes!
4. Language Translator
Want to translate a text from English to Hindi but don’t know Hindi? Well, Google Translate is the
tool for you! While it’s not exactly 100% accurate, it is still a great tool to convert text from one
language to another. Google Translate and other translation tools as well as use Sequence to
sequence modeling that is a technique in Natural Language Processing. Earlier, language translators
used Statistical machine translation (SMT) which meant they analyzed millions of documents that
were already translated from one language to another (English to Hindi in this case) and then looked
for the common patterns and basic vocabulary of the language. However, this method was not that
accurate as compared to Sequence to sequence modeling.
5. Sentiment Analysis
Almost all the world is on social media these days! And companies can use sentiment analysis to
understand how a particular type of user feels about a particular topic, product, etc. They can use
natural language processing, computational linguistics, text analysis, etc. to understand the general
sentiment of the users for their products and services and find out if the sentiment is good, bad, or
neutral. Companies can use sentiment analysis in a lot of ways such as to find out the emotions of
their target audience, to understand product reviews, to gauge their brand sentiment, etc.
6. Grammar Checkers
Grammar and spelling is a very important factor while writing professional reports for your superiors
even assignments for your lecturers. After all, having major errors may get you fired or failed! That’s
why grammar and spell checkers are a very important tool for any professional writer. They can not
only correct grammar and check spellings but also suggest better synonyms and improve the overall
readability of your content. And guess what, they utilize natural language processing to provide the
best possible piece of writing! The NLP algorithm is trained on millions of sentences to understand
the correct format.
Emails are still the most important method for professional communication. However, all of us still
get thousands of promotional Emails that we don’t want to read. Thankfully, our emails are
automatically divided into 3 sections namely, Primary, Social, and Promotions which means we never
have to open the Promotional section! But how does this work? Email services use natural language
processing to identify the contents of each Email with text classification so that it can be put in the
correct section.
Natural Language Processing (NLP) involves making computers understand and interact with human
language, which is incredibly complex and nuanced. Here are some of the major challenges NLP
faces:
1. Ambiguity in Language
Problem: Words, phrases, and sentences can have multiple meanings depending on context.
Example: The word "bank" can mean a financial institution or the side of a river.
Challenge: Teaching machines to correctly interpret the intended meaning requires deep
context understanding.
2. Diversity of Language
Problem: Different languages have unique grammar, syntax, and vocabulary. Even within one
language, there are variations in dialects, slang, and regional expressions.
Example: American English uses "elevator," while British English uses "lift."
Challenge: Building systems that work across all languages and their nuances is extremely
resource-intensive.
Problem: Machines struggle to grasp context, tone, and implied meaning in conversations.
Example: "I can’t recommend this movie enough" sounds positive but is sarcastic here.
Problem: Human language constantly evolves with new words, phrases, and abbreviations.
Example: Words like "selfie" and "yeet" didn’t exist years ago but are now commonly used.
5. Morphological Complexity
Problem: Words change forms based on tense, plurality, gender, or case, especially in
morphologically rich languages.
Challenge: Developing NLP systems that can handle such transformations accurately.
Problem: NLP models require large, annotated datasets for training, which are often
unavailable for many languages or domains.
Example: It’s easy to find English datasets, but much harder for regional Indian languages.
Problem: One word can have multiple meanings (polysemy), and different words can mean
the same thing (synonymy).
Example: The word "light" can mean not heavy or bright. Similarly, "happy" and "joyful" are
synonyms.
Problem: Identifying proper nouns (e.g., people, places) in text is tricky, especially for new or
uncommon names.
Example: In "Apple acquired Beats," distinguishing between the company Apple and the fruit
can be challenging.
Challenge: NER systems often fail with overlapping entities or unseen names.
9. Noise in Data
Problem: Real-world text data is often messy, containing spelling errors, incomplete
sentences, and irrelevant content.
Example: Tweets or social media posts with hashtags, emojis, and abbreviations like "OMG
ur gr8!"
Challenge: Cleaning and preprocessing noisy data is essential for accurate NLP.
Problem: Understanding sarcasm, irony, and mixed sentiments is hard for machines.
Example: "What a fantastic day" could mean the opposite if spoken sarcastically.
Problem: Real-world applications often combine text, speech, and visual inputs.
Challenge: Integrating data from multiple modalities into a single coherent system.
Example: Gender bias in machine translation (e.g., translating "nurse" to female pronouns
and "doctor" to male pronouns).
Problem: Many NLP applications, like virtual assistants, require instant responses.
Example: A delay in responding to a voice command can ruin the user experience.
Problem: Many languages (especially regional ones) lack sufficient training data and tools for
NLP development.
Example: Indian regional languages like Assamese or Odia often lack robust NLP systems.
Ambiguity happens when a word, phrase, or sentence has more than one possible meaning. This is a
common challenge in NLP because computers need to figure out the correct meaning based on the
context, just like humans do.
Discuss the challenges in various stages of natural language processing.
What happens: The input text is divided into smaller units (tokens) like words, phrases, or
symbols.
Challenges:
What happens: Analyzes the grammatical structure of the text and builds a parse tree.
o Verb: barked.
Challenges:
o Grammatical ambiguities: Sentences like "The old man and the woman sat down"
can have multiple interpretations.
3. Semantic Analysis
What happens: The system interprets the meaning of the text based on word meanings and
relationships.
Challenges:
o Word sense disambiguation: Choosing the correct meaning of ambiguous words like
"bank" or "light."
o Idiomatic expressions: Phrases like "kick the bucket" (meaning "to die") cannot be
interpreted literally.
o Context dependency: The same words can mean different things in different
contexts.
4. Pragmatic Analysis
What happens: Determines the real-world context and intent behind the text.
Example: For "Can you pass the salt?" pragmatic analysis understands:
Challenges:
o Sarcasm and irony: Machines struggle to detect sarcasm or hidden meanings, e.g.,
"Oh great, another meeting!".
o Context understanding: Requires background knowledge or conversation history to
infer intent.
o Cultural differences: Pragmatic meaning can vary widely across cultures and
languages.
5. Morphological Analysis
What happens: Words are broken into their root forms or morphemes.
o Root: run
o Suffix: -ing.
Challenges:
o Irregular forms: Handling irregular words like "went" (past tense of "go").
o Complex languages: Languages like Finnish or Turkish have highly inflected words
with many suffixes and prefixes.
o Homographs: Words like "lead" (to guide) and "lead" (a metal) have the same form
but different meanings.