0% found this document useful (0 votes)

23 views15 pages

NLP-PT 1

Uploaded by

abhijeetyadav210104

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views15 pages

NLP-PT 1

Uploaded by

abhijeetyadav210104

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

NLP

Q)N-GRAM

An n-gram is a contiguous sequence of n items from a given sample of text or

speech. In Natural Language Processing (NLP), n-grams are used to model the
structure and patterns of language by capturing the relationships between words
or characters in a sequence.

Unigram

A unigram is a single word or character from a text. It represents the most

basic unit of analysis and focuses on individual elements without considering
their context.

For example, in the sentence "The cat sleeps," the unigrams are "The," "cat,"
and "sleeps."

Unigrams are useful for understanding individual words' frequencies and their
standalone occurrences.

Bigram

A bigram consists of two consecutive words or characters. It captures the

relationship between adjacent elements and provides a sense of the local
context within a text.

For instance, in "The cat sleeps," the bigrams are "The cat" and "cat sleeps."

Bigrams are useful for modelling and predicting sequences based on the
immediate preceding item.

Trigram

A trigram is a sequence of three consecutive words or characters, offering a

more detailed view of the context by considering the relationships among
three adjacent elements.
For the sentence "The cat sleeps," the trigram is "The cat sleeps."

Trigrams help in capturing more context and can improve predictive models
by incorporating a broader range of surrounding information.

n-gram

An n-gram is a general term for a sequence of n consecutive words or

characters. It provides flexibility to capture various levels of context
depending on the value of n.

For example, a 4-gram in the text "The cat sleeps on the mat" would be "The
cat sleeps on."

N-grams are essential for modeling and analyzing language patterns by

considering different lengths of sequences to understand and predict textual
data.

Applications of N-grams

Language Modelling

N-grams predict the next word or character in a sentence based on the

previous ones. For example, a bigram model could suggest "AI" as the next
word after "I love" because it's a common phrase.

Text Classification

N-grams help in sorting texts, like categorizing emails as "spam" or "not

spam," by identifying patterns in word sequences.

Machine Translation

N-grams are used to match and translate phrases between languages,

helping to translate common phrases accurately.

Speech Recognition

N-grams improve speech-to-text accuracy by modeling how words or sounds

typically follow each other in speech.
Information Retrieval

N-grams enhance search engines by better matching search queries with

relevant documents based on word sequences.

DO N-GRAM SUMS

Q) Levels in NLP

Phonology

Phonology concerns how words are related to the sounds that realize them.
Phonemes are the basic units of sound that differentiate words in a language.
For example, the phoneme /k/ appears in both "skit" and "kit."

Morphology

Morphology deals with how words are constructed from basic meaning units
called morphemes. A morpheme is the smallest unit of meaning in a
language. For instance, the word "unhappiness" can be broken down into
three morphemes: the prefix "un-" (meaning "not"), the stem "happy," and the
suffix "-ness" (indicating a state of being).

Levels of NLP

Lexical Level Lexical analysis in NLP focuses on the meaning and

part-of-speech of words. It deals with lexemes, which are the fundamental
units of lexical meaning. For example, in the sentence "She will park the car,"
the word "park" can function as a verb (to park a car) or a noun (a park with
trees), depending on the context.

Syntax (Parsing) Syntax concerns how words are arranged to form

grammatically correct sentences, determining each word's structural role and
how phrases relate to one another. For example, the sentence "The chicken
is ready to eat" could mean the chicken is about to eat something, or it could
mean the chicken is cooked and ready to be eaten.
Semantics Semantics deals with the meanings of words and how they
combine to form sentence meaning, focusing on context-independent
meaning. For example, the phrase "cold fire" doesn’t make sense
semantically because fire is usually associated with heat.

Discourse Discourse analysis involves how preceding sentences affect the

interpretation of the next sentence, such as understanding pronouns and
temporal aspects. For example, in the sentence "John went to the store. He
bought some milk," the word "He" refers to John.

Pragmatics Pragmatics studies how sentences are used in different

situations and how context affects interpretation. For instance, the command
"Get me a cup of coffee" versus "Shut the door" reflects different pragmatic
uses.

World Knowledge World knowledge includes general knowledge about the

world that language users must share to understand each other's beliefs,
goals, and communication effectively.

Q) explain morphological analysis / Role of FSA

Morphological analysis in NLP involves studying the structure of words to
understand how they are formed from morphemes, the smallest units of
meaning in a language.

This process includes breaking down words into their constituent parts—such
as roots, prefixes, and suffixes—to analyze their meaning and grammatical
properties.

For example, consider the word "unhappiness."

Morphological analysis would decompose this word into three morphemes:

"un-" (a prefix meaning "not"), "happy" (the root or stem, which is a free
morpheme as it can stand alone), and "-ness" (a suffix indicating a state or
condition). By understanding these components, morphological analysis
helps determine the word's meaning, grammatical role, and how it can be
used in different contexts.
This analysis is crucial in various NLP tasks such as lemmatization, where
words are reduced to their base or dictionary form (e.g., "running" to "run"),
and in understanding how words relate to each other in terms of meaning
and grammar, especially in languages with complex inflectional systems.

FSA:

Finite State Automata (FSA), also known as Finite State Machines (FSM),
play a crucial role in various Natural Language Processing (NLP) tasks,
particularly in morphological analysis, syntax parsing, and text processing.

An FSA is a computational model consisting of a finite number of states,

transitions between those states, and actions based on input symbols.

1. Morphological Analysis:

In morphological analysis, FSA can be used to model the structure of words

and their valid forms.

For instance, an FSA can represent the different ways a verb can be
conjugated or how prefixes and suffixes can be attached to a root word.

This allows the system to recognize and generate correct word forms by
transitioning through states that represent valid morphological constructions.

2. Lexical Analysis:

FSAs are used to tokenize input text into words, recognize patterns, or
identify parts of speech.

For example, an FSA can be used to match regular expressions in text, such
as identifying all instances of a particular pattern (e.g., email addresses or
phone numbers) within a body of text.

3. Syntax Parsing:

FSAs are sometimes employed in syntax parsing, especially in simpler

models like regular grammars.

They can help to validate whether a sequence of words forms a syntactically

correct sentence according to predefined rules.
For instance, an FSA might ensure that an article is followed by a noun or
that a verb phrase follows a noun phrase.

4. Spell Checking and Correction:

FSAs can also be used in spell checkers to recognize valid word forms and
suggest corrections for misspelled words by transitioning through states that
represent valid word sequences.

5. Finite State Transducers (FSTs):

An extension of FSAs, called Finite State Transducers, is often used in tasks

like morphological generation and speech recognition. FSTs map one
sequence of symbols to another, which is useful in converting base forms of
words into their inflected forms or transcribing spoken input into text.

Q) Porter Stemmer ALGORITHM

The Porter Stemmer Algorithm is a widely used algorithm in Natural Language

Processing (NLP) for reducing words to their root or base form, known as
stemming.

The algorithm removes common morphological and inflectional endings from

words in English. It operates by applying a series of predefined rules that
systematically strip suffixes from words.

Consonants and Vowels in the Algorithm

The algorithm distinguishes between consonants and vowels to define word

structures. Here's how they are identified:

Vowels: The letters "a", "e", "i", "o", "u" are always considered vowels. The
letter "y" is also considered a vowel when it is surrounded by consonants
(e.g., "cry" where "y" acts as a vowel).

Consonants: Any letter that is not a vowel is considered a consonant. The

letter "y" is considered a consonant when it is at the beginning of a word or
when it follows another vowel (e.g., "yes" where "y" acts as a consonant).
The Steps of the Porter Stemmer Algorithm

The Porter Stemmer works in several steps, each applying specific rules to
modify the word. Here's an outline:

Step 1a: Plural Removal

Remove "sses" to "ss" (e.g., "caresses" -> "caress").

Remove "ies" to "i" (e.g., "ponies" -> "poni").

Remove "ss" to "ss" (e.g., "caress" -> "caress").

Remove "s" to "" if the preceding part contains a vowel (e.g., "cats" -> "cat").

Step 1b: Ending Removal

If a word ends with "eed," and the part before "eed" contains a vowel, replace
"eed" with "ee" (e.g., "agreed" -> "agree").

If a word ends with "ed" or "ing" and the preceding part contains a vowel,
remove "ed" or "ing" (e.g., "hopping" -> "hop").

After removing "ed" or "ing," if the word ends with "at," "bl," or "iz," add "e"
(e.g., "hopping" -> "hope").

If the word ends with a double consonant (except "l", "s", "z"), remove the last
consonant (e.g., "hopping" -> "hop").

If the word is in the form of consonant-vowel-consonant (CVC) and ends with

a consonant that is not "w," "x," or "y," add "e" (e.g., "hop" -> "hope").

Step 1c: Y to I

If a word ends with "y" and the preceding part contains a vowel, replace "y"
with "i" (e.g., "happy" -> "happi").

Step 2: Double Suffix Removal

Remove common suffixes like "ational" to "ate" or "tional" to "tion" (e.g.,

"relational" -> "relate").

This step applies various rules to reduce longer suffixes to simpler forms.
Step 3: Final Suffix Removal

Simplifies suffixes such as "icate" to "ic" (e.g., "duplicate" -> "duplic").

The rule applies based on the structure of the word and the presence of
vowels in certain positions.

Step 4: "E" Removal

Remove an "e" if the preceding part contains more than one consonant (e.g.,
"alike" -> "alik").

Do not remove "e" if the word ends with "le" (e.g., "single" -> "singl").

Step 5: Clean Up

If a word ends with a consonant-vowel-consonant sequence where the final

consonant is not "l," "s," or "z," add an "e" (e.g., "hop" -> "hope").

If the word ends in a single "e" and the preceding part contains more than
one vowel, remove the "e" (e.g., "agree" -> "agre").

Example

Let’s walk through an example with the word "relational":

Step 1a: No change needed (no plural "s").

Step 1b: No change needed (no "eed," "ed," or "ing").

Step 1c: No change needed (no ending "y").

Step 2: "ational" is converted to "ate" -> "relate."

Step 3: No change needed (no suffix to simplify).

Step 4: No change needed (no final "e" to remove).

Step 5: No change needed (word is already simplified).

Final stemmed form: "relate."

** also look at the example in the PDF

Q) POS Tagging and Its challenges

Parts of Speech (POS) Tagging is a fundamental task in Natural Language

Processing (NLP) that involves labeling each word in a sentence with its
corresponding part of speech, such as noun, verb, adjective, adverb, etc.

POS tagging is essential for various NLP tasks, including syntactic parsing,
sentiment analysis, and machine translation, as it provides crucial
grammatical information.

How POS Tagging Works

POS tagging typically involves two main approaches:

Rule-Based Tagging:

This approach uses a set of predefined linguistic rules to assign tags to

words. For example, a rule might state that if a word ends in "ing," it is likely a
verb (e.g., "running").

Rule-based systems rely heavily on handcrafted rules and lexicons, which

can be effective but may struggle with new or uncommon word forms.

Statistical and Machine Learning-Based Tagging:

This approach uses probabilistic models, such as Hidden Markov Models

(HMMs), Conditional Random Fields (CRFs), or neural networks, trained on
annotated corpora to predict the most likely tag for each word based on its
context.

These models learn patterns from large datasets and can generalize better to
unseen data compared to rule-based systems.

Challenges in POS Tagging

Ambiguity:
Words can belong to multiple parts of speech depending on the context. For
example, "book" can be a noun ("I read a book") or a verb ("I will book a
ticket"). Disambiguating the correct POS tag based on context is one of the
main challenges.

Out-of-Vocabulary (OOV) Words:

Words that are not present in the training data or lexicon, such as newly
coined terms, slang, or proper nouns, can be difficult to tag accurately. These
OOV words require the POS tagger to rely heavily on context or make
educated guesses, which may not always be accurate.

Complex Sentence Structures:

Sentences with complex structures, such as long, nested clauses, can

confuse POS taggers. The tagger must accurately identify the role of each
word in such sentences, which is challenging when dealing with complex
syntax or unusual sentence constructions.

Morphologically Rich Languages:

Languages with rich morphology, where a single word can have many
inflected forms, pose a significant challenge for POS tagging. For example, in
languages like Finnish or Turkish, a word's form can change drastically based
on tense, case, number, or other grammatical features, making it difficult to
tag correctly.

Idiomatic Expressions and Phrasal Verbs:

Idioms and phrasal verbs (e.g., "give up," "look forward to") can complicate
POS tagging because the meaning of the entire phrase differs from the sum
of its parts. Identifying and correctly tagging these phrases require
understanding beyond individual word tags.

Inconsistent Tagging Schemes:

Different POS taggers or datasets might use different tagging schemes,

which can lead to inconsistencies when combining or comparing data from
various sources. For example, one scheme might tag "to" as a preposition,
while another might tag it as a particle in specific contexts.
Multilingual POS Tagging:

Applying POS tagging to multiple languages introduces additional

challenges, such as differences in syntax, word order, and language-specific
tagging conventions. Models trained on one language often do not generalize
well to another without significant retraining or adaptation.

Errors in Text:

Spelling errors, typos, and informal language (e.g., in social media text) can
make POS tagging more difficult, as these errors may result in
misinterpretation of words or sentences.

Q) Affixes and its Types

In Natural Language Processing (NLP), affixes are morphemes that are attached
to a word stem to modify its meaning or grammatical function. Affixes play a
crucial role in morphological analysis, which is the process of studying the
structure of words and how they are formed.

1. Prefixes

Prefixes are affixes added to the beginning of a root word to change its
meaning.

They often alter the word’s semantic value or grammatical category. For
example, in the word "unhappy," the prefix "un-" is added to the root word
"happy" to create its antonym, meaning "not happy."

Similarly, "pre-" in "preview" means "before," changing the meaning of "view"

to refer to seeing something before its official release.

2. Suffixes

Suffixes are affixes attached to the end of a root word to modify its meaning
or grammatical role.
They can indicate tense, number, or part of speech. For instance, in the word
"running," the suffix "-ing" changes the verb "run" into its present participle
form.

Another example is "happiness," where the suffix "-ness" turns the adjective
"happy" into a noun representing the state of being happy.

3. Infixes

Infixes are affixes inserted within a root word rather than at the beginning or
end.

They are less common in English but play a significant role in some
languages.

For example, in the Tagalog language, the infix "-um-" can be inserted into
the root word "sulat" (write) to form "sumulat" (wrote). Infixes can alter the
meaning of the root word by changing its grammatical function.

4. Circumfixes

Circumfixes are affixes that surround a root word, with one part attached at
the beginning and the other at the end.

This type is relatively rare in English but is found in other languages.

For example, in the German language, the circumfix "ge-...-t" is used in the
past participle form of verbs, as in "gespielt" (played), where "ge-" is the
prefix and "-t" is the suffix added to the root "spiel" (play).

Q) Explain Open Class and Close class words

Open Class Words

Open class words, also known as content words, are categories of words that
can freely accept new members and frequently change over time. These
words typically carry significant meaning and contribute most of the content
in a sentence. They include:
Nouns: Represent people, places, things, or concepts (e.g., "computer,"
"city," "happiness").

Verbs: Represent actions or states (e.g., "run," "think," "exist").

Adjectives: Describe or modify nouns (e.g., "beautiful," "large," "quick").

Adverbs: Modify verbs, adjectives, or other adverbs (e.g., "quickly," "very,"

"well").

Examples:

In the sentence "The quick brown fox jumps over the lazy dog," "fox,"
"jumps," "quick," and "lazy" are all open class words because they provide
core meaning and can be replaced or expanded with new words (e.g.,
"clever" instead of "quick," or "dog" instead of "fox").

Closed Class Words

Closed class words, also known as function words, belong to categories that
are generally fixed and do not readily accept new members. These words
primarily serve grammatical functions and help structure sentences rather
than providing substantial content. They include:

Prepositions: Indicate relationships between nouns and other words (e.g.,

"in," "on," "under").

Conjunctions: Connect words, phrases, or clauses (e.g., "and," "but,"

"because").

Pronouns: Replace nouns (e.g., "he," "she," "it").

Determiners: Specify nouns (e.g., "the," "a," "this").

Auxiliary Verbs: Help form various tenses or aspects of verbs (e.g., "is,"
"have," "will").

Examples:

In the sentence "She went to the store because she needed groceries,"
"she," "to," "the," "because," and "needed" are closed class words. They
primarily serve grammatical roles and do not change frequently or expand
with new forms.

Q) Inflectional and Derivational Morphology

Inflectional Morphology

Inflectional morphology deals with modifying a word to express different

grammatical categories without changing its core meaning or part of speech.
Inflectional affixes are added to a word to indicate various grammatical
features such as tense, number, case, mood, or comparison. This process
maintains the base form of the word but alters its role in a sentence.

Examples of Inflectional Morphology:

Verb Tenses: Adding suffixes to verbs to indicate tense or aspect. For

example:

"walk" (present) → "walked" (past)

"run" (present) → "running" (present participle)

Noun Plurals: Changing the form of nouns to indicate plurality:

"cat" (singular) → "cats" (plural)

Adjective Comparisons: Modifying adjectives to show degrees of

comparison:

"big" (positive) → "bigger" (comparative) → "biggest" (superlative)

In inflectional morphology, the affix is typically added to the end of the word
(suffixes), and the word's syntactic category remains the same.

Derivational Morphology

Derivational morphology involves creating new words by adding affixes to a

base or root word, which changes the word’s meaning or part of speech. This
process can result in a significant shift in the word's function, meaning, or
both.

Examples of Derivational Morphology:

Changing Part of Speech: Adding affixes to change the word’s grammatical

category:

"happy" (adjective) → "happiness" (noun)

"care" (verb) → "careful" (adjective)

Adding Prefixes: Modifying the meaning of a word by adding a prefix:

"happy" → "unhappy" (negation of the adjective)

"likely" → "unlikely" (opposite meaning)

Creating New Words: Forming new words by combining roots with

derivational affixes:

"educate" (verb) → "education" (noun)

"perform" (verb) → "performance" (noun)

In derivational morphology, the affix can be either a prefix (attached to the

beginning of a word) or a suffix (attached to the end). This process often
changes the word's meaning and sometimes its grammatical category.

Morphological Models NLP Notes
No ratings yet
Morphological Models NLP Notes
5 pages
NLP Unit-I Notes (1)
No ratings yet
NLP Unit-I Notes (1)
19 pages
Question-Bank-on-NLP,COA,ITB
No ratings yet
Question-Bank-on-NLP,COA,ITB
154 pages
Unit 1 NLP-1
No ratings yet
Unit 1 NLP-1
40 pages
Morphology
No ratings yet
Morphology
41 pages
Natural Language Processing
No ratings yet
Natural Language Processing
72 pages
NLP_FINALLL (2)
No ratings yet
NLP_FINALLL (2)
72 pages
Sen Et Al. - 2021 - Bangla Natural Language Processing A Comprehensiv
No ratings yet
Sen Et Al. - 2021 - Bangla Natural Language Processing A Comprehensiv
46 pages
Natural Language Processing Applications: Fabienne Venant Université Nancy2 / Loria 2008/2009
No ratings yet
Natural Language Processing Applications: Fabienne Venant Université Nancy2 / Loria 2008/2009
123 pages
Understanding Plagiarism Linguistic Patterns, Textual Features and Detection Methods
No ratings yet
Understanding Plagiarism Linguistic Patterns, Textual Features and Detection Methods
17 pages
Sequence Model:: Hidden Markov Models
No ratings yet
Sequence Model:: Hidden Markov Models
60 pages
NLP Module 2_1
No ratings yet
NLP Module 2_1
86 pages
NLP final notes
No ratings yet
NLP final notes
47 pages
2117670648-NLP UT 2
No ratings yet
2117670648-NLP UT 2
10 pages
Marathi To English Sentence Translator For Simple
No ratings yet
Marathi To English Sentence Translator For Simple
5 pages
Unit 5
No ratings yet
Unit 5
45 pages
NLP UNIT 2 Notes
No ratings yet
NLP UNIT 2 Notes
14 pages
Morphology FST
No ratings yet
Morphology FST
47 pages
NLP Lect 2 Words and Morphology
No ratings yet
NLP Lect 2 Words and Morphology
52 pages
NLP Lecture 3
No ratings yet
NLP Lecture 3
44 pages
Screenshot 2024-11-29 at 8.35.21 AM
No ratings yet
Screenshot 2024-11-29 at 8.35.21 AM
40 pages
Unit v Discourse Analysis and Lexical Resources
100% (1)
Unit v Discourse Analysis and Lexical Resources
14 pages
CME4408_P3_NLPtechniques
No ratings yet
CME4408_P3_NLPtechniques
33 pages
Electronic
No ratings yet
Electronic
17 pages
Words
No ratings yet
Words
44 pages
Overview of Linguistics
No ratings yet
Overview of Linguistics
19 pages
Unit 1 Notes.pptx
No ratings yet
Unit 1 Notes.pptx
74 pages
Module 3 - Part 1
No ratings yet
Module 3 - Part 1
54 pages
NLP New
No ratings yet
NLP New
3 pages
NLP_Module 2(1)
No ratings yet
NLP_Module 2(1)
77 pages
Wordlevel Analysis - Chap2
No ratings yet
Wordlevel Analysis - Chap2
97 pages
Morphological Analysis
No ratings yet
Morphological Analysis
35 pages
3nlp Computer
No ratings yet
3nlp Computer
83 pages
NLp shorts 3
No ratings yet
NLp shorts 3
25 pages
NLP-Lectures 4,5,6
No ratings yet
NLP-Lectures 4,5,6
85 pages
Grapheme:: Morpheme
No ratings yet
Grapheme:: Morpheme
20 pages
Tag Disambiguation in Italian: Rodolfo Delmonte°, Emanuele Pianta
No ratings yet
Tag Disambiguation in Italian: Rodolfo Delmonte°, Emanuele Pianta
5 pages
nlp
No ratings yet
nlp
16 pages
unit 12(3 half)
No ratings yet
unit 12(3 half)
37 pages
Automatic Generation of Specification From Natural Language Based On Temporal Logic
No ratings yet
Automatic Generation of Specification From Natural Language Based On Temporal Logic
18 pages
2_nlp
No ratings yet
2_nlp
36 pages
Xu Et Al 2021 The Interplay Between Online Reviews and Physician Demand An Empirical Investigation
No ratings yet
Xu Et Al 2021 The Interplay Between Online Reviews and Physician Demand An Empirical Investigation
18 pages
NLP Unit 2
No ratings yet
NLP Unit 2
48 pages
NLP - SEM
No ratings yet
NLP - SEM
31 pages
NLP QB
No ratings yet
NLP QB
13 pages
Vishalthesis
No ratings yet
Vishalthesis
348 pages
NLP PYQ SOLUTIONS
No ratings yet
NLP PYQ SOLUTIONS
59 pages
UNIT-1 notes
No ratings yet
UNIT-1 notes
19 pages
DocScanner Aug 21, 2024 8-42 PM
No ratings yet
DocScanner Aug 21, 2024 8-42 PM
14 pages
Natural Language Processing With Python Cookbook 1st Edition Krishna Bhavsar Download PDF
100% (9)
Natural Language Processing With Python Cookbook 1st Edition Krishna Bhavsar Download PDF
38 pages
Poeter Stemmer Algorithm
No ratings yet
Poeter Stemmer Algorithm
57 pages
Chapter 1
No ratings yet
Chapter 1
41 pages
Unit 1b
No ratings yet
Unit 1b
24 pages
Chapter 1
No ratings yet
Chapter 1
29 pages
NLP
No ratings yet
NLP
8 pages
Unit - 5
No ratings yet
Unit - 5
13 pages
02 - Morphological Analysis
100% (1)
02 - Morphological Analysis
17 pages
5. Lexical analysis- morphological analysis
No ratings yet
5. Lexical analysis- morphological analysis
9 pages
02 - Morphological Analysis
No ratings yet
02 - Morphological Analysis
17 pages
Unit Ii NLP Notes Final
No ratings yet
Unit Ii NLP Notes Final
6 pages
morpheme
No ratings yet
morpheme
4 pages
Morp
No ratings yet
Morp
30 pages
Exploiting Stylistic Idiosyncrasies For Authorship
No ratings yet
Exploiting Stylistic Idiosyncrasies For Authorship
9 pages
NLP m2
No ratings yet
NLP m2
74 pages
ACFrOgBKMtkrKQXYgwzYfGAQxQ0GJjQ4MloahBs6vi5pwqo xRZUN6IRgh8lAAyR2U7sguAn6becvxh174Y RYo84nZ3K9mm OlN3Q JrDvd18FxMzMkCBuxruzd1tH0C6XqndKXsCSXuwHIWVT7olg5FKOstIhFYq-Kh6hMBg
No ratings yet
ACFrOgBKMtkrKQXYgwzYfGAQxQ0GJjQ4MloahBs6vi5pwqo xRZUN6IRgh8lAAyR2U7sguAn6becvxh174Y RYo84nZ3K9mm OlN3Q JrDvd18FxMzMkCBuxruzd1tH0C6XqndKXsCSXuwHIWVT7olg5FKOstIhFYq-Kh6hMBg
32 pages
Bel-Arabi Advanced Arabic Grammar
No ratings yet
Bel-Arabi Advanced Arabic Grammar
7 pages
NLP Notes
No ratings yet
NLP Notes
26 pages
Natural Language Processing: Aman Shakya
No ratings yet
Natural Language Processing: Aman Shakya
17 pages
Automated Question Generator System Using NLP Libraries
No ratings yet
Automated Question Generator System Using NLP Libraries
5 pages
Entity Recognition in Assamese Text: Abstract - Entity Recognition Detects All The Entities Present
No ratings yet
Entity Recognition in Assamese Text: Abstract - Entity Recognition Detects All The Entities Present
5 pages
Tanvi_chiman_10_BE3_EXP3_A_SMA
No ratings yet
Tanvi_chiman_10_BE3_EXP3_A_SMA
3 pages
NLP JNTUH Unit 1
No ratings yet
NLP JNTUH Unit 1
9 pages
Feature Systems and Augmented Grammars
No ratings yet
Feature Systems and Augmented Grammars
7 pages
mondegreen
No ratings yet
mondegreen
7 pages
IEEE Research Paper
No ratings yet
IEEE Research Paper
6 pages
AI Assignment 1
No ratings yet
AI Assignment 1
31 pages
Module 3
No ratings yet
Module 3
17 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
NLP Quesion Bank
No ratings yet
NLP Quesion Bank
4 pages
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
No ratings yet
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
30 pages
UNIT-3
No ratings yet
UNIT-3
4 pages
Solution NLP UT1
No ratings yet
Solution NLP UT1
7 pages
Unit 2 NLP
No ratings yet
Unit 2 NLP
5 pages
CMR University School of Engineering and Technology Department of Cse and It
No ratings yet
CMR University School of Engineering and Technology Department of Cse and It
8 pages
Ai Phases in NLP Sem Vi
No ratings yet
Ai Phases in NLP Sem Vi
3 pages
List of Experiments: Experiment No. Experiment Name Page No
No ratings yet
List of Experiments: Experiment No. Experiment Name Page No
1 page
AI Unit 3 Lecture 2
No ratings yet
AI Unit 3 Lecture 2
8 pages
Bcse409l Natural-Language-Processing TH 1.1 0 Bcse409l
No ratings yet
Bcse409l Natural-Language-Processing TH 1.1 0 Bcse409l
2 pages
A Sentence Diagramming Primer: The Reed & Kellogg System Step-By-Step
From Everand
A Sentence Diagramming Primer: The Reed & Kellogg System Step-By-Step
Dr. Judith Coats
No ratings yet
Coreference: Fundamentals and Applications
From Everand
Coreference: Fundamentals and Applications
Fouad Sabry
No ratings yet

NLP-PT 1

Uploaded by

NLP-PT 1

Uploaded by

NLP

An n-gram is a contiguous sequence of n items from a given sample of text or

A unigram is a single word or character from a text. It represents the most

A bigram consists of two consecutive words or characters. It captures the

A trigram is a sequence of three consecutive words or characters, offering a

An n-gram is a general term for a sequence of n consecutive words or

N-grams are essential for modeling and analyzing language patterns by

N-grams predict the next word or character in a sentence based on the

N-grams help in sorting texts, like categorizing emails as "spam" or "not

N-grams are used to match and translate phrases between languages,

N-grams improve speech-to-text accuracy by modeling how words or sounds

N-grams enhance search engines by better matching search queries with

**DO N-GRAM SUMS**

Lexical Level Lexical analysis in NLP focuses on the meaning and

Syntax (Parsing) Syntax concerns how words are arranged to form

Discourse Discourse analysis involves how preceding sentences affect the

Pragmatics Pragmatics studies how sentences are used in different

World Knowledge World knowledge includes general knowledge about the

Q) explain morphological analysis / Role of FSA

For example, consider the word "unhappiness."

Morphological analysis would decompose this word into three morphemes:

An FSA is a computational model consisting of a finite number of states,

In morphological analysis, FSA can be used to model the structure of words

FSAs are sometimes employed in syntax parsing, especially in simpler

They can help to validate whether a sequence of words forms a syntactically

4. Spell Checking and Correction:

5. Finite State Transducers (FSTs):

An extension of FSAs, called Finite State Transducers, is often used in tasks

Q) Porter Stemmer ALGORITHM

The Porter Stemmer Algorithm is a widely used algorithm in Natural Language

The algorithm removes common morphological and inflectional endings from

Consonants and Vowels in the Algorithm

The algorithm distinguishes between consonants and vowels to define word

Consonants: Any letter that is not a vowel is considered a consonant. The

Step 1a: Plural Removal

Remove "sses" to "ss" (e.g., "caresses" -> "caress").

Remove "ies" to "i" (e.g., "ponies" -> "poni").

Remove "ss" to "ss" (e.g., "caress" -> "caress").

Step 1b: Ending Removal

If the word is in the form of consonant-vowel-consonant (CVC) and ends with

Step 2: Double Suffix Removal

Remove common suffixes like "ational" to "ate" or "tional" to "tion" (e.g.,

Simplifies suffixes such as "icate" to "ic" (e.g., "duplicate" -> "duplic").

Step 4: "E" Removal

If a word ends with a consonant-vowel-consonant sequence where the final

Let’s walk through an example with the word "relational":

Step 1a: No change needed (no plural "s").

Step 1b: No change needed (no "eed," "ed," or "ing").

Step 1c: No change needed (no ending "y").

Step 2: "ational" is converted to "ate" -> "relate."

Step 3: No change needed (no suffix to simplify).

Step 4: No change needed (no final "e" to remove).

Step 5: No change needed (word is already simplified).

Final stemmed form: "relate."

** also look at the example in the PDF

Parts of Speech (POS) Tagging is a fundamental task in Natural Language

How POS Tagging Works

POS tagging typically involves two main approaches:

This approach uses a set of predefined linguistic rules to assign tags to

Rule-based systems rely heavily on handcrafted rules and lexicons, which

Statistical and Machine Learning-Based Tagging:

This approach uses probabilistic models, such as Hidden Markov Models

Challenges in POS Tagging

Out-of-Vocabulary (OOV) Words:

Complex Sentence Structures:

Sentences with complex structures, such as long, nested clauses, can

Morphologically Rich Languages:

Idiomatic Expressions and Phrasal Verbs:

Inconsistent Tagging Schemes:

Different POS taggers or datasets might use different tagging schemes,

Applying POS tagging to multiple languages introduces additional

Q) Affixes and its Types

Similarly, "pre-" in "preview" means "before," changing the meaning of "view"

This type is relatively rare in English but is found in other languages.

Q) Explain Open Class and Close class words

Open Class Words

DO N-GRAM SUMS