0% found this document useful (0 votes)
19 views

NLP Final Answer

Uploaded by

Saima khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

NLP Final Answer

Uploaded by

Saima khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

1. Explain the challenges of Natural Language 2.

Explain various stages of Natural Language


processing. processing.
• Contextual words and phrases and homonyms: • Lexical Analysis: Lexical Analysis is the first stage
The same words and phrases can have diverse in NLP. Itis also known as morphological analysis.
meanings according the context of a sentence At this stage the structure of the words is
and many words have the exact same identified and analysed. Lexicon of a language
pronunciation but completely different meanings. means the collection of words and phrases in a
For example: language. Lexical analysis is dividing the whole
- I ran to the store because we ran out of milk. portion of text into paragraphs, sentences, and
- Can I run something past you really quick? words.
- The house is looking really run down. • Syntactic Analysis (Parsing): It involves analysis
In the above three sentences the meaning of the of words in the sentence for grammar and
run is different according to the context. ordering words in a way that shows the
Homonyms means the pronunciation of two or relationship among the words. The sentence such
more words is same but have different meaning. as the school goes to girl is rejected by English
• Synonyms syntactic analyser.
: It can cause issues like contextual understanding • Semantic analysis: It draws the exact meaning
since we use many different words to express the or the dictionary meaning from the text. The text
identical idea. Additionally, some of these words is checked for meaningfulness. It is done by
may convey exactly the same meaning, while mapping syntactic structures and objects in the
some may be levels of complexity and different task domain.
people use synonyms to denote slightly different The semantic analyser neglects sentence such as
meanings within their personal vocabulary. "hot ice-cream".
• Ambiguity: Ambiguity in NLP refers to sentences • Discourse Integration: The meaning of any
and phrases that potentially have two or more sentence depends upon the meaning of the
possible interpretations. There is lexical, syntactic sentence just before it. Furthermore, it also
and semantic ambiguity. brings about the meaning of immediately
• Errors in text or speech: Misspelled or misused following sentence. For example: Meena is a girl,
words can generate problems for text analysis. she goes to school here "she" is a dependency
Autocorrect and grammar correction applications pointing to Meena.
can handle common mistakes, but do not at all • Pragmatic Analysis: During this, what was said is
times understand the writer's intention. With re-interpreted on what it truly meant. It contains
spoken language it is difficult for the machine to deriving those aspects of language which
understand mispronunciations, different accents, necessitate real world knowledge. For example,
stammers, etc. John saw Mary in a garden with a cat, here we
• Low-resource languages: Artificial Intelligence, can't say that John is with cat or Mary is with cat.
machine learning NLP applications have been
mostly built for the most common, widely used
languages. It is absolutely incredible at how
precise translation systems have become.
However, many languages, especially those
spoken by people with less access to technology
often go overlooked and under processed.
3. Explain the ambiguities associated at each which is about parsing the sentence. For
level with example for Natural Language example, a sentence like "Madam said on
processing. (23) Monday she would give an exam" Thirdly, there
Natural language has a very rich form and can be referential ambiguity. Check a sentence,
structure. It is very ambiguous. Ambiguity means "Meera went to Geeta. She said "I am Hungry”.
not having well defined solution. Any sentence in “Who is hungry, is not well referred from this
a language with a large-enough grammar can sentence. In many cases we observe that one
have another interpretation. There are various sentence can have meanings. And reversely,
forms of ambiguity related to natural many sentences mean the same. Hence NLU a
language and they are: complicated process.
• Lexical Ambiguity: When words have multiple • Natural Language Generation (NLG): In order to
assertion then it is known as lexical ambiguity. For generate the output text, the intermediate
example: The word back can be a noun or an representation requires to be converted back to
adjective. the natural language format. Hence, in this
- Noun: back stage process there are multiple sub processes
- Adjective: back door involves. They are as follow:
• Syntactic Ambiguity: It means sentences are a) Text Planning: It includes extracting relevant
parsed in multiple syntactical forms or A sentence contents from knowledge base.
can be parsed in different ways. For example: I b) Sentence planning: This process involves
saw the girl on the beach with my binoculars.In selecting correct words, forming meaningful
this sentence, confusion in meaning is created. sentence following language grammar and setting
The phrase with my binoculars could modify the tone for the same.
verb, saw or the noun, girl. c) Text Realization: This is the process of mapping
• Semantic Ambiguity: It is related to the the planned sentence into a structure.
sentence interpretation. For example: I saw the • Explain generic NLP system & ambiguities of
girl on the beach with my binoculars. The NLP. (W-22’) (net)
sentence means that I saw a girl through my 5. Discuss in detail any application considering
binoculars or the girl had my binoculars with her. any Indian regional language of your choice.
• Metonymy Ambiguity: Itis the most difficult NLTK (Natural Language Toolkit): The NLTK Python
ambiguity. It deals with phrases in which the framework is typically employed as a tool for
literal meaning is different from the figurative teaching and conducting research. On apps
assertion. For example: "Nokia us screaming for intended for production, it is rarely utilized
new management", Here it really doesn't mean However, because of how simple it is to use,
that the company is literally screaming. fascinating applications may be created with it.
4. Explain Natural Lang Understanding and • Features: Tokenization, Part of
Natural Lang Generation. Speech tagging (POS), Named Entity Recognition
• Natural Language Understanding (NLU): In this (NER), Classification, Sentiment analysis, Packages
part of the process, the speech input gets of chatbots.
transformed into the useful representations in • Use-cases: Recommendation systems,
order to analyse various aspects of the language. Sentiment analysis, Building chatbots
As the natural language is very reach in forms and • Advantages: Most renowned and full NLP
structures, it is also very ambiguous. There can be library. It supports many languages.
different forms of ambiguities like lexical • Disadvantages: It is difficult to use and learn. It
ambiguity, which is a very basic l.e. word level ignores the context of the word. It is slow. It has
ambiguity. For example, the "document" can be a no neural network model.
noun or verb. It's a complicated process.
Secondly, there can be syntactical ambiguity,
6. Define affixes. Explain the types of affixes. 7. Represent output of morphological analysis
Affixes are small word particles, usually only a few for Regular verb, Irregular verb, singular noun,
letters, added to a root word to change its plural noun. Also Explain Role of FST in
meaning or grammatical properties. Most affixes Morphological Parsing with an example.
are one or two syllables, and some like –s and -es Reg-noun irreg-pl-noun Irret-sg-noun
are just sounds. Often, affixes modify a word’s Fox G o :e o:e se goose
definition. For instance, adding the affix re– Cat sheep Sheep
before read creates reread, which means “read
dog M o:I U:ε s:c e Mouse
again.” They can also be used in grammar, such as
For example: fox stands for f: f 0:0 x:x
adding –ed at the end of a verb to create the
When these two transducers are composed, we
simple past tense, or adding an –s to the end of a
have a FST which maps lexical forms to
noun to make it plural. In morphology, affixes are
intermediate forms of words for simple English
a type of morpheme, a part of a word with its
noun inflections. Next thing that we should
own meaning. For example, the word
handle is to design the FSTs for orthographic
disappearance has three morphemes: the root
rules, and combine all these transducers. We use
word appears and the two affixes dis– and –ance.
these properties of FSTs in the creation of the FST
• Prefixes: These are affixes that come at the
for a morphological processor.
beginning of a word, before a root word.
• Morphological Parsing with FST: The objective
Sometimes they are added to a word to change
of the morphological parsing is to produce output
its meaning, such as legal and illegal. Other times,
lexicons for a single input lexicon. let's consider,
they combine with other affixes to create new
parsing just the productive nominal plural (-s) and
words, such as adding the prefix bio– to the affix
the verbal progressive (-ing). Our aim is to take
–ology to create biology.
input forms like those in the first column below
• Suffixes: These are affixes that come at the end
and produce output forms like those in the
of a word, after the root word. Unlike prefixes,
second column. The second column contains the
which mostly change a word’s meaning, suffixes
stem of each word as well as mixed
are mainly used for grammar purposes: verb
morphological features. These features specify
conjugation (work -> worked), plurality (fox ->
additional information about the stem. For
foxes), possession (Juliana -> Juliana’s), reflexive
example: +SG - singular, +PL - plural.
pronouns (them -> themselves)
Cats cat + N + PLU
comparatives and superlatives (fast -> faster,
Cat cat + N + SG
fastest)
Goose goose + N + SG or goose + V
• Infixes: These are a special type of affix that
Geese goose + V + PLU
comes in the middle of a word. However, the
The column contains the stem of the
English language doesn’t use infixes. Infixes are
corresponding word (lexicon) in first column,
more common in other languages, including
along with its morphological features, like, +N
Greek, Austronesian languages like Tagalog, and
means word is noun, +SG means it is singular, +PL
Indigenous American languages like Choctaw.
means it is plural, +V for verb, and pres-part for
• Circumfixes: These are pairs of prefixes and
present participle. There can be more than one
suffixes always used together. Circumfixes in
lexical level representation for a given word. We
English are very rare, but the circumfix of en– and
achieve it through two level morphology, which
–en is seen in the common word enlighten, and
represents a word as a correspondence between
the circumfix of em– and –en is seen in
lexical level - a simple concatenation of lexicons,
embolden.
as shown in column 2 of table.
8. Explain the role of FSA in morphological 9. Explain Porter Stemmer algorithm with rules.
analysis? While the conventional approach for
An automaton having a finite number of states is morphological parsing involves creating a
called a Finite Automaton (FA) or Finite State transducer from a lexicon and rules, there are
Auto ma ta (FSA). Finite automata are used to simpler algorithms that don't necessitate the
recognize patterns. It takes the string of symbol as extensive online vocabulary that this algorithm
Input and changes its state accordingly. When the requires. These are utilised particularly in
required symbol is found, then the transition - information retrieval (IR) activities like web
happens. When transition takes place, the search, where a query like (marsupial OR
automata can either move to the succeeding kangaroo OR koala) retrieves pages that include
state or stay in the these terms in them. Some IR systems first
current -state. There are two states in FA: Accept perform a stemmer on the query and document
or Reject. When the input string is processed terms because a document including the word
successfully, and the automata reached its final marsupials might not match the keyword
state, then it will accept. Mathematically, an marsupial. Thus, suffixes are ignored in IR and
automaton can be represented by a 5-tuple (Q, z:, morphological information is solely needed to
8, qo, F), where - establish that two words share the same stem.
- Q is a finite set of states. The simple and efficient stemming algorithm is
- Σ: is a finite set of symbols, called the alphabet one of the most often utilised ones. There are
of the automaton. some informational retrieval applications that do
- δ: is the transition function. not perform the whole morphological processor.
- q0: is the initial state from where any input is They only need the stem of the word. It is just a
processed (q0 ε Q). cascaded rewrite rule. In this the Output of one
- F: is a set of final state/states of Q (F ⊆ Q). stage is the input for the next. It is based on a
• Deterministic Finite Automation (DFA): series of simple cascade rules. ATIONAL → ATE
Definition: It may be defined as the type of finite (relational→ relate), ING (motoring → motor),
automation wherein, for every input symbol we SSES→ SS (grasses → grass).
can determine the state to which the machine will Stemming algorithms are efficient but they may
move. It has a finite number of states that is why introduce errors because they do not use a
the machine is called Deterministic Finite lexicon. A stemming algorithm (Port Stemming
Automaton (DFA). algorithm) is a lexicon-free FST.
• Non-deterministic Finite Automation (NDFA): Some errors of commission are: ORGANIZATION –
Definition: Non-deterministic Finite Automation is ORGAN, DOING-DOE, GENERALIZATION- GENERIC,
defined as the type of finite automation where NUMERICAL-NUMEROUS, POLICY-POLICE
for each input symbol we cannot determine the
state to which the machine will move i.e., the
machine can move to any combination of the
states, it means for each state there can be more
than one transition on a given symbol, each lead
to a different state. It has a finite number of
states due to this the machine is called Non-
deterministic Finite Automation.
10. Explain regular expression in NLP. 11. Explain how N-gram model is used in spelling
Regular expressions introduced in 1956 by correction.
Kleene. It was originally studied as part of theory Spelling correction consist of detecting and
of computation. Regular expression is an algebraic correcting errors. Error detection is the process of
formula whose value is a pattern consisting of a finding the misspelled word. Error correction is
set of Strings known as the the process of suggesting correct words to a
language of expression. Regular expressions are misspelled word. Spelling errors are mainly
also called as regexes. It is used for pattern phonetic, where the misspell word is pronounced
matching standards for string passing and in the same way as the
replacement. Regular expressions are powerful correct word. The spelling errors belong to two
way to find and replace string that take a defined categories named non word errors and real world
format. For example, regular expressions are used errors. When an error results in the world that
to parse email addresses, URL's, dates, log files, does not appear in a given lexicon or is not a valid
configuration file, command line, programming orthographic word form it is known as a non-
script or switches. word error. The real world error result in actual
Regular expression is a useful tool to design words of the language it occurs because of the
language compilers as well as they are used in typographical mistakes or due to spelling errors.
natural language processing for tokenization, Then-gram can be used for the both non word
describing lexicons, morphological analysis, etc and real world errors detection because in English
Many of us have used simple form of regular alphabet certain big ram or trigram of letters
expression for searching file patterns in MS DOS never occur or rarely do so. For example, the
for example dir*.txt. trigram 'qst' and bigram 'qd' this information can
• Brackets: Characters are group by putting them be used to handle non word error. N-gram
between square brackets. This way any character technique generally required a large corpus or
in the class will match One character in the input. dictionary as training data so that an n gram table
For example: of possible combinations of letter can be
- /[abed]/ will match any of a, b, c, and d. compiled. N gram uses chain of custody rule as
- / [0123456789]/ specifies any single digit. follows:
• Range: Sometimes regular expression led to P(s) = P(w1 W2 W3 ... Wn}
cumbersome notation. For example: = P(w1) p (w2/W1) p (W3/W1W2) W1P
/[abcdefghijklmnopqrstuvwxyz] - It specifies any (W3/W1W2W3) p (W3/W1W2W3 ... Wn-1)
lowercase letter. In such cases a dash is used to • Example: The Arabian Nights are the fairy tales
specify a range. of the east
• Caret^: The caret is used at the beginning of the P(The/<s>) x P(Arabian/the) x P(Knights/Arabian)
regular expression to specify what a single x P(are/knights) x P(the/are) x
character cannot be. For example: P(fairy/the)P(tales/fairy) x P(of/tales) x P(the/of) x
/[^x] - matches any single character except x P(east/the)
/[^A-Z]/ --> not an upper-case letter = 0.67 X 0.5 X 1.0 X 1.0 X 0.5 X 0.2 X 1.0 X 1.0 X
/[^Tt)/ --> neither ‘T’ nor ‘t’ 1.0 X 0.2
/(^\.]/-> nota period = 0.0067
/[p^]/ --> either ‘p’ or ‘^’
/x^y/ > the pattern ‘x’y’
• * or +: The use of * or + allows you to add 1 or
more of a preceding character. For example:
/woodchucks?/--> woodchuck or woodchucks
/colou?r/--> color or colour
12. Explain perplexity of any language model. 14. Explain Maximum Entropy Model for POS
Intuitively, perplexity means to be surprised. We Tagging.
measure how much the model is surprised by While an HMM can achieve very high accuracy,
seeing new data. The lower the perplexity, the we saw that it requires a number of architectural
better the training is. Perplexity is calculated as innovations to deal with unknown words, backoff,
exponent of the loss obtained from the model. In suffixes, and so on. It would be so much easier if
the above example, we can see that the we could add arbitrary features directly into the
perplexity of our example model with regards to model in a clean way, but that's hard for
the phrase “beautiful scenery” is 9.97. The generative models like HMMs. Luckily, logistic
formula for perplexity is the exponent of mean of regression model does this. But logistic regression
log likelihood of all the words in an input isn't a sequence model; it assigns a class to a
sequence. single observation. However, we could turn
logistic regression into a discriminative sequence
model simply by running it on successive words,
using the class assigned to the prior word as a
The first sentence is one of the sequences on
feature in the classification of the next word.
which the model was trained on and hence the
When we apply logistic regression in this way, it's
perplexity is much lower in comparison to the
called the maximum entropy Markov model or
second sentence. The model has not seen the
MEMM. Let the sequence of words be W = w₁ and
second sentence before and hence the GPT2
the sequence of tags T = t1. In an HMM to
model is more perplexed by it. Perplexity is
compute the best tag sequence that maximizes
usually used only to determine how well a model
P(TW) we rely on Bayes' rule and the likelihood
has learned the training set. Other metrics like
P(WIT): T= argmax P(T\W) = argmax P(WIT) P(T) T
BLEU, ROUGE etc., are used on the test set to
= argmax П P(word, tag;) П P(tagi | tagi - 1)
measure test performance.
In an MEMM, by contrast, we compute the
13. Describe open class words and closed class
posterior P(T(W) directly, training it to
words in English with examples.
discriminate among the possible tag sequences.
Parts of speech can be divided into two broad
T = argmax P(TW) = argmax П P(tw, t-1)
super categories: closed class types and open
Consider tagging just one word. A multinomial
class types. Closed classes are those that have
logistic regression classifier could compute the
relatively fixed membership. For example,
single probability P(t/w; t₁₁) in a different way
prepositions are a closed class because there is a
than an HMM. Fig. 3.3.1 shows the intuition of
fixed set of them in English; new prepositions are
the difference via the direction of the arrows;
rarely coined. By contrast nouns and verbs are
HMMs compute likelihood (observation word
open classes because new nouns and verbs are
conditioned on tags) but MEMMs compute
continually coined or borrowed from other
posterior (tags conditioned on
languages. Closed class words are generally also
observation words). A schematic view of the
function words; function words are grammatical
HMM (top) and MEMM (bottom) representation
words like of, it, and, or you, which tend to be
of the probability computation for the correct
very short, occur frequently, and play an
sequence of tags for the back sentence. The
important role in grammar. Closed classes which
HMM computes the likelihood of the observation
are function words include prepositions,
given the hidden state, while the MEMM
pronouns, determiners, conjunctions, numerals,
computes the posterior of each state, conditioned
auxiliary verbs and particles (preposition or
on the previous state and current observation.
adverbs in phrasal verbs). The closed classes differ
more from language to language than do the
open classes. Here's a quick overview of some of
the more important closed classes in English
15. What is POS tagging? - Second stage − In the second stage, it uses large
When Words are grouped into similar classes lists of hand-written disambiguation rules to sort
which can be called as Parts of speech (POS), down the list to a single part-of-speech for each
word classes, morphological classes, or lexical word.
tags, these classes give information about a word • Properties of Rule-Based POS Tagging
and its neighbours. The Greeks or traditional - These taggers are knowledge-driven taggers.
grammars has 8 basic Part Of Speech like Noun, - The rules in Rule-based POS tagging are built
verb, pronoun, preposition, adverb, conjunction, manually.
adjective, and article. Modern models has much - The information is coded in the form of rules.
larger numbers of extended list of POS, 45 for the - We have some limited number of rules
Penn Treebank, 87 for the Brown corpus, and 146 approximately around 1000.
for the C7 tag set. Tag sets for example distinguish - Smoothing and language modeling is defined
between possessive pronouns like my, your, his, explicitly in rule-based taggers.
her, its and personal pronouns like I, you, he, me. So, now we can define Part-Of-Speech (POS) as it
Knowledge about words arranged or occurred in is process of assigning a tag to a word in a corpus.
sentences for example possessive pronouns are 17. Explain with suitable example following
likely to be followed by a noun, personal relationships between word meanings.
pronouns by a verb, can be useful in a language Homonymy, Polysemy, Synonymy, Antonymy.
model for speech recognition. How word is • Hyponymy and Hypernymy: Hyponymy and
pronounced? CONtent and conTENT based on hypernymy refers to a relationship between a
their pronunciation considered as noun and general term and the more specific terms that fa
adjective respectively. Speech synthesis system 11 under tne category of the general term. For
and speech recognition system can be understood example, the colors red, green, blue and yellow
by knowing part of speech like again Object are hyponyms. They fall under the general
(noun) and obJECT(verb). term of color, which is the hypernym.
16. Explain rule-based tagging. • Synonymy: Synonymy refers to words that are
One of the oldest techniques of tagging is rule- pronounced and spelled differently but contain
based POS tagging. Rule-based taggers use the same
dictionary or lexicon for getting possible tags for meaning. Example: Happy, joyful, glad
tagging each word. If the word has more than one • Antonymy: It refers to words that are related by
possible tag, then rule-based taggers use hand- having the opposite meanings to each other .
written rules to identify the correct tag. There are three types of antonyms: graded
Disambiguation can also be performed in rule- antonyms, complementary antonyms, and
based tagging by analyzing the linguistic features relational
of a word along with its preceding as well as antonyms. Example: dead, alive / long, short
following words. For example, suppose if the • Homonymy: It refers to the relationship
preceding word of a word is article then word between words that are spelled or pronounced
must be a noun. As the name suggests, all such the same .
kind of information in rule-based POS tagging is way but hold different meanings. Example: bank
coded in the form of rules. These rules may be (of river)/ bank (financial institution)
either, Context-pattern rules Or, as Regular • Polysemy: It refers to a word having two or
expression compiled into finite-state automata, more related meanings. Example: bright (shining),
intersected with lexically ambiguous sentence bright (intelligent)
representation. We can also understand Rule- • Meronomy: It is a logical arrangement of text
based POS tagging by its two-stage architecture − and words that represent a part of or member
- First stage − In the first stage, it uses a dictionary something .
to assign each word a list of potential parts-of- Example: A segment of an apple
speech.
18. What is Word Sense Disambiguation (WSD)? 19. Explain Yarowsky bootstrapping approach of
Explain the dictionary-based approach to WSD. semi supervised learning.
Word-Sense Disambiguation (WSD) is a well- The bootstrapping approach starts from a little
known problem in NLP. WSD is used in identifying number of seed data for every word: either
what the sense of a word means in a sentence manually-tagged training examples or a little
when the word has multiple meanings. When a number of sure-fire decision rules. E.g., play
single word has multiple meaning, then for the within the context of bass nearly always indicates
machine it is difficult to identify the correct the instrument. The seeds are wont to train an
meaning and to solve this challenging issue we initial classifier, using any supervised method.
can use the rule-based system or machine This classifier is then used on the untagged
learning techniques. WSD is a natural portion of the corpus to extract a bigger training
classification problem: Given a word and its set, during which only the foremost confident
possible senses, as defined by a dictionary, classifications are included. The process repeats,
classify an occurrence of the word in context into each new classifier being trained on a
one or more of its sense classes. The features of successively larger training corpus, until the entire
the context such as neighbouring words provide corpus is consumed, or until a given maximum
the evidence for classification. A famous example number of iterations is reached. Other semi-
is to determine the sense of pen in the following supervised techniques use large quantities of
passage. "Little John was looking for his toy box. untagged corpora to supply co- occurrence
Finally, he found it. The box was in the pen. John information that supplements the tagged
was very happy." corpora. These techniques have the potential to
• Dictionary and knowledge-based methods assist within the adaptation of supervised models
(Lesk's Algorithm): The Lesk method is dictionary- to different domains. Also, an ambiguous word in
based method. It is based on the hypothesis that one language is usually translated into different
words used together in text are associated with words during a second language counting on the
one another which the relation are often sense of the word. Word-aligned bilingual corpora
observed within the definitions of the words and are wont to infer cross-lingual sense distinctions,
their senses. Two or more words are a sort of semi-supervised system.
disambiguated by finding the pair of dictionary
senses with the best word overlap in their
dictionary definitions, For example, when
disambiguating the words in pine cone, the
definitions of the acceptable senses both include
the words evergreen and tree a minimum of in
one dictionary. An alternative to the utilization of
the definitions is to think about general word-
sense relatedness and to compute the semantic
similarity of every pair of word senses supported
a given lexical knowledgebase like WordNet.
Graph-based methods like spreading-activation
research of the first days of AI research are
applied with some success. The use of selection
preferences or selection restrictions also are
useful. For example, knowing that one typically
cooks food, one can disambiguate the word bass
in I am cooking bass i.e., it's not a musical
instrument.
20. Explain Discourse reference resolution in cause this dependence. These features have been
detail. classified in terms of COHESION and COHERENCE.
The most difficult problem of Al is to process the COHESION refers to linguistic features which link
natural language by computers or in other words sentences together and are generally easy to
natural language processing is the most difficult identify.
problem of artificial intelligence. Actually, the • Discourse Structure: Human discourse often
language always consists of collocated, structured exhibits structures that are intended to indicate
and coherent groups of sentences rather than common experiences and respond to them. For
isolated and unrelated sentences like movies. example, research abstracts are intended to
These coherent groups of sentences are referred inform readers in the same community as the
to as discourse. authors and who are engaged in similar work.
• Concept of Coherence: A sequence of •Discourse Segmentation: Documents are
sentences is a a "text" when there is some kind of automatically partitioned into fragments, also
dependence between the sentences. The task of known as passages, which are different discourse
textual analysis is to identify the features that segments.
Inflection Morphology Derivational Morphology
It is a morphological process that adapts. existing It is concerned with the way morphemes are
words so that they function effectively in sentences connected to existing lexical forms as affixes.
without Changing Pos of base morpheme.
Regular: It is more Regular It is very less regular
Use: Can only be suffix or infix and not prefix Can be both prefix is Suffix
Change in Part of Speech: Never changes the It can change the grammatical category or Pos
grammatical category or Pos
Example: Cat + S = Cats Example: danger+ous (noun)-dangerous(adjective)

Top-Down Parsing Bottom-Up Parsing


It is a parsing strategy that first looks at the highest It is a parsing strategy that first looks at the lowest
level of the parse tree and works down the parse level of the parse tree and works up the parse tree
tree by using the rules of grammar. by using the rules of grammar.
It attempts to find the left most derivations for an It can be defined as an attempt to reduce the input
input string. string to the start symbol of a grammar.
In this parsing technique we start parsing from the In this parsing technique we start parsing from the
top (start symbol of parse tree) to down (the leaf bottom (leaf node of the parse tree) to up (the
node of parse tree) in a top-down manner. start symbol of the parse tree) in a bottom-up
manner.
This parsing technique uses Left Most Derivation. This parsing technique uses Right Most Derivation.
The main leftmost decision is to select what The main decision is to select when to use a
production rule to use in order to construct the production rule to reduce the string to get the
string. starting symbol.
Example: Recursive Descent parser. Example: ItsShift Reduce parser.

You might also like