NLP SEM IMP
NLP SEM IMP
Lemmatization
Lemmatization is a process in Natural Language Processing (NLP) that, like stemming, reduces words
to their base or root form. However, unlike stemming, which simply strips off suffixes, lemmatization
uses vocabulary and morphological analysis to find the correct base or dictionary form of a word,
called a lemma.
The key difference between lemmatization and stemming is that lemmatization ensures that the
output is a valid word in the language, while stemming may produce non-existent words.
Lemmatization takes into account the word's meaning and its use in context to derive the correct
lemma.
Example of Lemmatization:
• "running" → "run" (the base form of the verb)
• "better" → "good" (since "better" is the comparative form of "good")
• "leaves" → "leaf" (the singular form of "leaves")
• "went" → "go" (the base form of the verb in the past tense)
• "dogs" → "dog" (the singular form of the noun)
Lemmatization vs. Stemming:
• Stemming: Strips off suffixes, may lead to non-dictionary forms.
o "running" → "run" (correct), "better" → "bett" (incorrect).
• Lemmatization: Reduces the word to its proper base form based on its meaning and part of
speech.
o "running" → "run" (correct), "better" → "good" (correct).
How Lemmatization Works:
Lemmatization involves a deeper understanding of the word and its context. It typically uses a
lemmatizer, which is often based on:
• Part-of-speech (POS) tagging: The lemma depends on whether the word is a noun, verb,
adjective, etc.
o For example, the lemma of "running" (verb) is "run," but the lemma of "running"
(noun) could be "running" (as the action of running).
Example of Lemmatization in Action:
• Input Text:
"The cats are running swiftly across the field."
• After Lemmatization:
"The cat be run swiftly across the field."
The words "cats" and "running" are lemmatized to "cat" and "run," and "are" is reduced to "be," the
base form of the verb.
Discriminative Models
A discriminative model is a type of machine learning model that directly learns the boundary
between different classes (e.g., tags, categories, etc.) rather than modeling the distribution of each
class separately. In other words, it focuses on what makes one class different from another.
How Discriminative Models Work:
Discriminative models learn the relationship between the input (e.g., words in a sentence) and the
output (e.g., part-of-speech tags) directly. They try to predict the boundary between different classes
rather than the overall distribution of the data.
• Focus: Given an input, the model learns to classify it into one of the possible categories
based on the input features.
• Formula: The model learns the conditional probability P(y∣x)P(y | x)P(y∣x), which represents
the probability of the output (label) yyy given the input xxx.
Discriminative vs. Generative Models
• Discriminative Models: Learn the boundary between classes.
o Example: Logistic Regression, Support Vector Machines (SVM), Conditional Random
Fields (CRF).
• Generative Models: Learn how the data is generated (both input and output) and model the
entire data distribution.
o Example: Naive Bayes, Hidden Markov Models (HMM).
Key Difference:
• Discriminative models try to predict directly: “What is the most likely output for this input?”
• Generative models try to model the whole data: “How did this input-output pair come to
be?”
Parse tree:
S
├── NP
│ ├── Det ("the")
│ └── N ("cat")
└── VP
└── V ("chased")
4. Most Probable Parse Tree:
o In PCFG parsing, we choose the parse tree with the highest probability.
Shift-Reduce Parser
A Shift-Reduce Parser is a type of bottom-up parser used for syntax analysis in compilers and natural
language processing. It processes the input string from left to right and tries to construct a parse tree
by repeatedly applying shift and reduce operations.
Key Operations
1. Shift:
o Move the next input symbol to the stack.
o This means you are not yet ready to apply any grammar rule, so you move forward in
the input.
2. Reduce:
o Replace a sequence of symbols (from the stack) that matches the right-hand side of
a grammar rule with its left-hand side.
o For example, if the stack contains NP VP, and there is a rule S→NP VPS → NP \,
VPS→NPVP, replace NP VP with S.
3. Accept:
o The parser stops when the stack contains the start symbol (SSS), and the input is fully
processed.
4. Error:
o If no valid action (shift or reduce) can be performed, the parser reports an error.
Example of a Shift-Reduce Parse
Grammar:
1. S → NP VP
2. NP → Det N
3. VP → V NP
4. Det → the
5. N → dog
6. V → sees
Input:
the dog sees
Steps:
Step Action Stack Input
1 Shift [the] dog sees
2 Reduce (Det→the) [Det] dog sees
3 Shift [Det, dog] sees
4 Reduce (N→dog) [Det, N] sees
5 Reduce (NP→Det N) [NP] sees
6 Shift [NP, sees]
Step Action Stack Input
7 Reduce (V→sees) [NP, V]
8 Reduce (VP→V NP) [S]
9 Accept [S]
How it Works
1. Start with an empty stack and the input string.
2. Decide between shift or reduce based on:
o Whether you can match a grammar rule to reduce symbols in the stack.
o If no rule applies, perform a shift to move the next input symbol to the stack.
3. Keep processing until:
o The stack contains the start symbol SSS (parse completed).
o Or no valid action can be performed (syntax error).
Lexical Semantics
Lexical semantics is about understanding the meaning of words and how they relate to each other in
a language. It helps us figure out what individual words mean and how different words connect or
interact to form the meaning of a sentence.
Key Points:
1. Word Meaning:
o Words have meanings, and lexical semantics helps us understand what those
meanings are.
o For example, the word "dog" refers to a type of animal.
2. Word Relationships:
o Words can be related to other words in different ways:
▪ Synonyms (similar meanings): happy ↔ joyful.
▪ Antonyms (opposite meanings): hot ↔ cold.
▪ Homonyms (same word, different meanings): bank (river side) ↔ bank
(financial institution).
3. Ambiguity:
o Some words have multiple meanings, and we need to figure out which meaning is
correct based on the context.
o Example: "I went to the bank" could mean a financial institution or the side of a
river.
4. Context Matters:
o The meaning of a word can change depending on where and how it is used in a
sentence.
o Example: "He is a cool guy" means something different than "It’s cool outside."
Example to Understand It:
Step 1: Input Sentence
• Sentence: "He went to the bank to fish."
The word "bank" can mean two things:
1. A financial institution (place where you keep money).
2. The edge of a river or lake (where you might fish).
We need to figure out which meaning of "bank" is being used in this sentence.
Step 2: Context Clues
• The word "fish" in the sentence suggests that the bank here refers to the edge of a river,
because people fish near rivers.
Step 3: Checking Word Relationships
• Words like "fish" and "river" are related to "bank" when it means the edge of a river. This
helps us decide which meaning of "bank" to choose.
Corpus Study
Corpus Study refers to the analysis of a large collection of texts (called a corpus) to study language
patterns, structures, or trends. It is widely used in Natural Language Processing (NLP), linguistics,
and data analysis.
What is a Corpus?
A corpus is a structured collection of texts that are used for linguistic research. The texts can be
anything:
• Books
• News articles
• Conversations
• Social media posts
• Scientific papers
Corpus studies help us:
1. Understand how language is used in the real world.
2. Find patterns, such as the most common words or phrases.
3. Analyze trends over time, like how the use of certain words changes.
4. Train AI models for tasks like translation, chatbots, and sentiment analysis.
What Do Corpus Studies Analyze?
1. Word Frequency:
o What are the most common words?
Example: In English, words like "the", "and", and "is" are very frequent.
2. Collocations:
o What words often appear together?
Example: "Fast food" appears together more often than "quick food."
3. Grammatical Patterns:
o How is grammar used?
Example: Finding how often people use passive voice.
4. Semantic Analysis:
o What do words mean in context?
Example: The word "bank" can mean a financial institution or the side of a river.
5. Trends Over Time:
o How has language changed?
Example: New words like "selfie" becoming common in the past decade.
What Is WordNet?
• WordNet is a big collection of words grouped by their meanings.
• It doesn’t just give definitions; it shows how words are connected:
o Synonyms: Words with similar meanings (happy ↔ joyful).
o Antonyms: Words with opposite meanings (hot ↔ cold).
o Categories: Specific words under broader terms (dog is an animal).
o Parts of something: wheel is a part of a car.
How It Helps
1. Find Word Meanings and Connections:
o Example: The word bank can mean a place for money or the side of a river. WordNet
helps figure out the right meaning based on context.
2. Synonyms for Writing:
o Need another word for big? WordNet suggests large, huge, or gigantic.
3. Understand Relationships:
o It shows how words are linked. Example:
▪ Rose is a type of flower.
▪ Flower is a type of plant.
What is BabelNet?
BabelNet is like a super dictionary that works for many languages. It not only tells you the meanings
of words but also shows how they are connected to other words and concepts across the world. It
combines dictionaries (like WordNet), encyclopedias (like Wikipedia), and other sources to create
one big knowledge network.
Key Features of BabelNet
1. Multilingual:
o Works in many languages (e.g., English, Hindi, Spanish, etc.).
o Example: The word dog in English links to perro in Spanish, kutte in Hindi, and chien
in French.
2. Multiple Meanings:
o Helps you understand all possible meanings of a word.
o Example: Apple could mean the fruit or the company.
3. Connections Between Words:
o Shows relationships like:
▪ Synonyms (big ↔ large).
▪ Opposites (hot ↔ cold).
▪ Broader terms (car ↔ vehicle).
4. Encyclopedia Information:
o Includes facts and details about famous people, places, and things.
o Example: Apple Inc. includes info about its founders, products like iPhone, etc.
Homonymy
Homonymy refers to a situation where two or more words have the same spelling or pronunciation,
but they have different meanings. These words are called homonyms.
Types of Homonymy
There are two main types of homonymy:
1. Homophones:
o Words that sound the same but have different meanings and may also have different
spellings.
o Example:
▪ Flour (used for baking) and Flower (a plant).
▪ Both are pronounced the same but mean different things.
2. Homographs:
o Words that are spelled the same but have different meanings. They may or may not
be pronounced the same.
o Example:
▪ Lead (to guide) and Lead (a type of metal). They are spelled the same but
have different meanings. Lead (guide) is pronounced with a long "e" sound,
while Lead (metal) is pronounced with a short "e."
3. True Homonyms:
o Words that sound the same and are spelled the same, but have entirely different
meanings.
o Example:
▪ Bat:
▪ A flying mammal.
▪ A piece of sports equipment used in baseball.
Examples of Homonymy
1. Homophones:
o Right (correct) and Write (to form words).
o Sea (large body of water) and See (to look at something).
2. Homographs:
o Bow:
▪ A curved weapon for shooting arrows (bow).
▪ To bend forward as a gesture of respect (bow).
o Tear:
▪ To rip something (tear).
▪ A drop of water from the eye (tear).
Polysemy
Polysemy refers to a situation where a single word has multiple meanings that are related to each
other. These different meanings are all connected by some core idea or concept, making them
different from homonymy, where words have unrelated meanings.
Key Features of Polysemy
1. One Word, Multiple Related Meanings:
o A polysemous word has different meanings, but all of these meanings are linked by a
common concept.
o Example:
▪ Head:
▪ The top part of the body (e.g., "He hurt his head").
▪ The leader of a group or organization (e.g., "She is the head of the
company").
▪ The front part of something (e.g., "He is at the head of the line").
All these meanings of head are related to the idea of being the top or front of something.
2. Meaning Changes Based on Context:
o The exact meaning of a polysemous word depends on the context in which it's used.
Examples of Polysemy
1. Bank:
o Financial institution: "I need to go to the bank to withdraw money."
o Side of a river: "The fish swam near the riverbank."
o Both meanings are related by the concept of a location or place.
2. Mouth:
o The opening of the body: "He opened his mouth to speak."
o The opening of a river: "The river's mouth is wide."
o Both meanings are related by the idea of an opening.
3. Light:
o Visible illumination: "The room was full of light."
o Not heavy: "This box is light to carry."
o Both meanings are connected by the idea of being less dense or easy to perceive.
Synonymy
Synonymy refers to the relationship between two or more words that have the same or very similar
meanings. These words can be used interchangeably in many contexts, although sometimes one
word might be preferred over another depending on the situation.
Key Features of Synonymy
1. Same or Similar Meaning:
o Synonyms are words that convey similar meanings, but they might not be exactly
the same in every context.
o Example: Happy and Joyful are synonyms because they both describe a positive
emotional state.
2. Context Matters:
o Even though synonyms have similar meanings, context matters in choosing the right
word.
o Example:
▪ "He is wealthy" vs. "He is rich."
▪ Both words mean having a lot of money, but wealthy sounds a bit more
formal than rich.
3. Subtle Differences:
o Some synonyms might have slightly different connotations or be used in different
contexts.
o Example:
▪ Big and huge both mean large, but huge usually emphasizes an even larger
size than big.
Examples of Synonymy
1. Happy ↔ Joyful
o Both words describe a state of positive emotion.
2. Fast ↔ Quick
o Both describe speed, but quick is often used for actions, and fast can describe
objects (e.g., a fast car or a fast runner).
3. Smart ↔ Intelligent
o Both refer to someone who is mentally capable.
4. House ↔ Home
o Both words refer to a place where you live, but home has a warmer, more personal
connotation than house.
Hyponymy
Hyponymy refers to a relationship between words where one word (the hyponym) represents a
more specific concept that is included under a broader category (the hypernym). In simple terms,
hyponymy shows how one word is a subcategory or kind of another, more general word.
Key Features of Hyponymy
1. Specific vs. General:
o The hyponym is a more specific term, while the hypernym is the general category
under which the hyponym falls.
o Example:
▪ Dog (hyponym) is a type of Animal (hypernym).
▪ Carrot (hyponym) is a type of Vegetable (hypernym).
2. Hierarchical Relationship:
o Hyponymy creates a hierarchy between words, where the hyponym is a subclass or
type of the broader hypernym.
3. Multiple Hyponyms for a Single Hypernym:
o A hypernym can have many hyponyms, each representing a different kind within
that broader category.
o Example:
▪ Fruit (hypernym) can have hyponyms like Apple, Banana, Orange, etc.
Semantic Ambiguity
Semantic ambiguity occurs when a word or phrase has multiple meanings and its meaning is unclear
without additional context. In other words, a word or sentence can be interpreted in different ways
because it has more than one possible meaning.
Key Features of Semantic Ambiguity
1. Multiple Meanings of a Word:
o A word with semantic ambiguity can have different meanings depending on how it's
used in a sentence.
o Example: The word "bat" can refer to:
▪ A flying mammal.
▪ A sports equipment used in baseball.
2. Context Determines Meaning:
o To resolve semantic ambiguity, we need to look at the context in which the word is
used.
o Example:
▪ "She saw the bat in the sky." (Here, bat refers to a flying mammal.)
▪ "He hit the ball with a bat." (Here, bat refers to the sports equipment.)
3. Different Senses:
o The different meanings of a word in semantic ambiguity are often called senses.
o The same word may have distinct senses based on how it's used in different
contexts.
Examples of Semantic Ambiguity
1. Bank:
o Bank can mean:
▪ A financial institution where you store money.
▪ The side of a river.
o Without context, it's unclear which meaning is intended.
2. Light:
o Light can mean:
▪ Not heavy (e.g., "The box is light").
▪ Visible illumination (e.g., "The room is full of light").
o The meaning depends on the context of the sentence.
Example
Input:
• Seed data for "bank":
o Riverbank: "river bank", "beside the water".
o Financial: "bank account", "loan from the bank".
• Unlabeled sentences:
1. "He sat on the bank near the river."
2. "She opened a bank account last week."
3. "The bank is flooded after the heavy rain."
Process:
1. Step 1: Start with the labeled examples and define patterns:
o Riverbank: Words like "river", "water", "flood".
o Financial: Words like "account", "loan", "money".
2. Step 2: Analyze the unlabeled data:
o Sentence 1: Contains "river" → Assign "riverbank" sense.
o Sentence 2: Contains "account" → Assign "financial institution" sense.
o Sentence 3: Contains "flood" → Assign "riverbank" sense.
3. Step 3: Use the newly labeled data to refine the patterns.
4. Step 4: Repeat the process with updated patterns until no significant changes occur.
Output:
• Labeled Data:
o Sentence 1: "He sat on the bank near the river." → Riverbank.
o Sentence 2: "She opened a bank account last week." → Financial.
o Sentence 3: "The bank is flooded after the heavy rain." → Riverbank.
Why is Yarowsky Semi-Supervised?
• It starts with a small set of labeled data (supervised).
• It then uses unlabeled data and self-training to expand the labeled dataset, making it semi-
supervised.
2. Discourse Processing
• Discourse looks at how sentences connect to make a meaningful conversation or text.
• It answers: "How do sentences work together?"
Examples:
1. Coreference:
o Text: "John bought a car. He loves it."
o "He" refers to John, "it" refers to the car.
2. Coherence:
o Good: "I was hungry, so I ate."
o Bad: "I was hungry. The sky is blue."
3. Ellipsis:
o Text: "I’ll go to the market, and she will too."
o Full meaning: "She will go to the market."
Reference Resolution (Simplified)
Reference Resolution is the process of figuring out what a word or phrase refers to in a sentence or
conversation. It is crucial in understanding the meaning of text, especially when pronouns (like "he,"
"she," "it") or other vague words are used.
Types of References
1. Anaphora:
o Refers back to something mentioned earlier.
o Example:
▪ "John ate an apple. He enjoyed it."
▪ "He" = John, "it" = apple.
2. Cataphora:
o Refers to something mentioned later.
o Example:
▪ "When he arrived, John greeted everyone."
▪ "He" = John.
3. Coreference:
o Multiple expressions refer to the same entity.
o Example:
▪ "Mary loves her dog. She plays with it every day."
▪ "Mary" and "She" refer to the same person.
4. Exophora:
o Refers to something outside the text, relying on context.
o Example:
▪ "Look at that!"
▪ "That" depends on what’s being pointed at in the environment.
Summary:
1. Pronouns (like "he" or "she") might refer to different people.
2. Ambiguous nouns (like "bank") can have multiple meanings.
3. Anaphora (like "this" or "those") can refer to different things in the sentence.
Reference Phenomena
Reference phenomena deal with how words or phrases in a sentence refer to people, objects, or
events in the context of a conversation or text. This helps link ideas and maintain coherence.
Types of Reference Phenomena
1. Anaphora:
o Refers to something mentioned earlier in the text.
o Example:
▪ "Lisa bought a car. She loves it."
▪ "She" = Lisa, "it" = the car.
2. Cataphora:
o Refers to something mentioned later in the text.
o Example:
▪ "Before he arrived, John called to say he was late."
▪ "He" = John (appears later).
3. Coreference:
o When two or more words refer to the same entity.
o Example:
▪ "Mark said Mark’s car broke down. He called a mechanic."
▪ "Mark" and "He" refer to the same person.
4. Exophora:
o Refers to something outside the text, relying on external context.
o Example:
▪ "Pass me that."
▪ "That" refers to an object in the environment.
5. Endophora:
o Refers to something inside the text (includes anaphora and cataphora).
o Example:
▪ "The cake was delicious. Everyone loved it."
▪ "It" = the cake.
6. Ellipsis Resolution:
o Resolves missing information implied by the context.
o Example:
▪ "John likes pizza, and Mary does too."
▪ Full meaning: "Mary likes pizza too."
7. Bridging:
o Refers to something indirectly connected to what’s mentioned.
o Example:
▪ "I saw a car. The wheels were shiny."
▪ "The wheels" are part of the car but not explicitly mentioned earlier.
Key Differences:
• Hobbs Algorithm: Focuses only on the closest noun in the sentence (a simple approach).
• Cantering Algorithm: Looks at larger context (more sentences) to decide what the pronoun
refers to.
Challenges
• Ambiguity: Some words have multiple meanings, which can confuse the system.
o Example: "Apple" could mean the fruit or the tech company.
• Relevance: The system needs to understand what is most relevant to your question, not just
similar words.
Text Summarization
Text summarization is the process of creating a shorter version of a text while keeping its important
information. It's like reading a long article and then writing a shorter version that includes the key
points. There are two main types: extractive and abstractive.
1. Extractive Summarization
• What It Is: It picks important sentences or phrases directly from the text and combines them
to make a summary.
• How It Works: The system looks at the text, scores sentences based on importance, and then
selects the best ones to form the summary.
Example:
• Original Text: "The dog ran in the park. It was a sunny day. The dog played with a ball."
• Extractive Summary: "The dog ran in the park. The dog played with a ball."
Advantages:
• Simple and easy to do.
• Often accurate because it uses original sentences.
Disadvantages:
• May not sound smooth or flow well since it just selects sentences.
2. Abstractive Summarization
• What It Is: It creates new sentences that explain the main ideas of the text, often using
simpler words or rephrasing.
• How It Works: The system understands the meaning of the text and then generates a short
version using its own words.
Example:
• Original Text: "The dog ran in the park. It was a sunny day. The dog played with a ball."
• Abstractive Summary: "The dog had fun playing outside."
Advantages:
• The summary sounds more natural and easier to read.
• It can shorten the text more effectively by rephrasing.
Disadvantages:
• It's harder to do and may sometimes make mistakes in meaning.
Q Explain the ambiguities associated at each level with example for Natural
Language processing
1. Lexical Ambiguity (Word Level)
A word can have different meanings depending on the context.
• Example:
o "Bank" can mean a financial institution ("I went to the bank.") or the side of a river
("We sat by the river bank.").
2. Syntactic Ambiguity (Sentence Level)
A sentence can be unclear because its structure allows multiple meanings.
• Example:
o "I saw the man with the telescope."
▪ Does it mean I saw a man who had a telescope?
▪ Or does it mean I used a telescope to see the man?
3. Semantic Ambiguity (Meaning Level)
A sentence can be unclear because a word can mean different things in that context.
• Example:
o "He gave her a ring."
▪ Did he give her a piece of jewelry (a ring)?
▪ Or did he call her on the phone (ring)?
4. Pragmatic Ambiguity (Context Level)
The meaning of a sentence can depend on the situation it's used in.
• Example:
o "Can you pass the salt?"
▪ Does it mean "Are you able to pass the salt?"
▪ Or is it a polite request for someone to pass the salt?
5. Discourse Ambiguity (Conversation Level)
A sentence can be unclear because of what came before or after it in a conversation.
• Example:
o "John went to the store. He bought some bread."
▪ Who is "he"? Is it John, or someone else?
6. Quantifier Ambiguity (Amount or Number)
Words like "all," "some," or "many" can be unclear about what they are referring to.
• Example:
o "Some students passed the exam."
▪ Does it mean only a few students passed, or just any group of students?
7. Anaphoric Ambiguity (Pronoun Reference)
A pronoun like "he," "she," or "it" can be unclear about who or what it refers to.
• Example:
o "Maria gave Sara her book."
▪ Whose book is it? Maria’s or Sara’s?
8. Temporal Ambiguity (Time Reference)
A sentence can be unclear about when something happens.
• Example:
o "She will meet him tomorrow."
▪ Does it mean she will meet him tomorrow or she already planned it but is
confirming the time?
Q Explain FSA for nouns and verbs. Also Design a Finite State Automata (FSA)
for the words of English numbers 1-99.
Finite State Automaton (FSA) for Nouns and Verbs in Simple Terms
A Finite State Automaton (FSA) is like a simple machine or system that reads words and follows
specific rules to decide whether a word belongs to a certain category, like nouns or verbs.
Q What are five types of referring expressions? Explain with the help of
example
In Natural Language Processing (NLP), referring expressions are used to refer to entities, people,
objects, or concepts within a conversation or text. These expressions can vary depending on the
context and role they play in the sentence. The five main types of referring expressions are:
1. Pronouns
Pronouns are words that stand in for a noun or noun phrase. They are commonly used to avoid
repetition and make sentences more concise.
• Example:
o "John went to the store. He bought some groceries."
Here, "He" is a pronoun referring to John.
2. Proper Nouns
Proper nouns refer to specific names of people, places, or things. They are usually capitalized and do
not require additional context to identify the entity.
• Example:
o "Alice is going to the party."
Here, "Alice" is a proper noun referring to a specific person.
3. Definite Descriptions
A definite description refers to a particular entity known to both the speaker and listener, often using
"the" or "this."
• Example:
o "I saw the man who lives next door."
Here, "the man" refers to a specific man who is known to both the speaker and
listener, likely because of prior context.
4. Indefinite Descriptions
Indefinite descriptions refer to non-specific entities, often using words like "a" or "an." These
expressions introduce new information to the conversation.
• Example:
o "I saw a cat in the garden."
Here, "a cat" refers to any cat, not a specific one, introducing a new entity to the
conversation.
5. Demonstratives
Demonstratives are words that point to specific things, often based on proximity or context.
Common demonstratives include "this," "that," "these," and "those."
• Example:
o "Can you pass me this pen?"
Here, "this" points to a specific pen near the speaker.
Summary:
• Open class words are the "meaning" words like nouns, verbs, and adjectives. They can grow
and change.
• Closed class words are the "grammar" words like pronouns, prepositions, and conjunctions.
They don't change much or get new words added.