0% found this document useful (0 votes)
11 views22 pages

NLP Unit 4

Uploaded by

masumamemories12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views22 pages

NLP Unit 4

Uploaded by

masumamemories12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

NLP UNIT 4

https://www.youtube.com/watch?v=X92-Chomhw8

https://www.youtube.com/watch?v=hGuXUIefVkc

https://www.youtube.com/watch?v=wSONlMwa9rE
NLP PARSING BASICS
• Parsing in NLP is the process of determining the syntactic structure of a text by analysing
its constituent words based on an underlying grammar (of the language).
• See this example grammar below, where each line indicates a rule of the grammar to be
applied to an example sentence “Tom ate an apple”.
• the outcome of the parsing process would be a parse tree like the following,
where sentence is the root, intermediate nodes such as noun_phrase, verb_phrase etc.
have children - hence they are called non-terminals and finally, the leaves of the
tree ‘Tom’, ‘ate’, ‘an’, ‘apple’ are called terminals.
• https://devopedia.org/natural-language-parsing
Statistical Parsing
• Statistical parsing is a group of parsing methods within natural language processing. The
methods have in common that they associate grammar rules with a probability.
Grammar rules are traditionally viewed in computational linguistics as defining the valid
sentences in a language.
• Within this mindset, the idea of associating each rule with a probability then provides
the relative frequency of any given grammar rule and, by deduction, the probability of a
complete parse for a sentence. (The probability associated with a grammar rule may be
induced, but the application of that grammar rule within a parse tree and the
computation of the probability of the parse tree based on its component rules is a form
of deduction.)
• Using this concept, statistical parsers make use of a procedure to search over a space of
all candidate parses, and the computation of each candidate's probability, to derive the
most probable parse of a sentence. The Viterbi algorithm is one popular method of
searching for the most probable parse.
• "Search" in this context is an application of search algorithms in artificial intelligence.
• As an example, think about the sentence "The can can hold water".
• A reader would instantly see that there is an object called "the can" and that this object
is performing the action 'can' (i.e. is able to); and the thing the object is able to do is
"hold"; and the thing the object is able to hold is "water".
• Using more linguistic terminology, "The can" is a noun phrase composed of a
determiner followed by a noun, and "can hold water" is a verb phrase which is itself
composed of a verb followed by a verb phrase.
• But is this the only interpretation of the sentence? Certainly "The can-can" is a perfectly
valid noun-phrase referring to a type of dance, and "hold water" is also a valid verb-
phrase, although the coerced meaning of the combined sentence is non-obvious.
• This lack of meaning is not seen as a problem by most linguists but from a pragmatic
point of view it is desirable to obtain the first interpretation rather than the second and
statistical parsers achieve this by ranking the interpretations based on their probability.
PCFG
• A Probabilistic Context-Free Grammar (PCFG) is simply a Context-Free Grammar with probabilities assigned
to the rules such that the sum of all probabilities for all rules expanding the same non-terminal is equal to
one.
• A PCFG is a simple extension of a CFG in which every production rule is associated with a probability (Booth
and Thompson 1973).
• Formally, a PCFG is a quintuple with G = (, N, S, R, D), where  is a finite set of terminal symbols, N is a
finite set of nonterminal symbols, S ∈ N is the start symbol, R is a finite set of production rules of the form A
→ α, where A ∈ N and D : R → [0, 1] is a function that assigns a probability to each member of R.
• A PCFG can be used to estimate a number of useful probabilities concerning a sentence and its parse tree(s),
including the probability of a particular parse tree (useful in disambiguation) and the probability of a
sentence or a piece of a sentence (useful in language modeling).
Generative Models
• People all around the world speak so many different languages, but a
Computer System or any other Computerized Machine only understands a
single language i.e. binary language (1s and 0s).
• This system or a process that converts human language to computer
understandable language is known as Natural Language Processing (NLP),
though various diversified models have suggested so far, yet the need for a
generative predictive model which can optimize depending upon the nature of
problem being addressed is still an area of research under work.
• The generative model is a single platform for diversified areas of NLP that can
address specific problems relating to read text, hear speech, interpret it,
measure sentiment and determine which parts are important.
• This is achieved by process of elimination once the relevant components are
identified. Single platform provides same model generating and reproducing
optimized solutions and addressing different issues.
• Generative models are considered as a class of statistical models that can generate new
data instances. These models are used in unsupervised machine learning as a means
to perform tasks such as:
 Probability and Likelihood estimation,
 Modeling data points,
 To describe the phenomenon in data,
 To distinguish between classes based on these probabilities.
• Since these types of models often rely on the Bayes theorem to find the joint
probability, so generative models can tackle a more complex task than analogous
discriminative models.
• So, Generative models focus on the distribution of individual classes in a dataset and
the learning algorithms tend to model the underlying patterns or distribution of the
data points. These models use the concept of joint probability and create the instances
where a given feature (x) or input and the desired output or label (y) exist at the same
time.
• These models use probability estimates and likelihood to model data points and
differentiate between different class labels present in a dataset. Unlike discriminative
models, these models are also capable of generating new data points.
Mathematical things involved in Generative Models
• ‌ raining generative classifiers involve estimating a function f: X -> Y, or
T
probability P(Y|X):
• Assume some functional form for the probabilities such as P(Y), P(X|Y)
• With the help of training data, we estimate the parameters of P(X|Y), P(Y)
• Use the Bayes theorem to calculate the posterior probability P(Y |X)
Some Examples of Generative Models
• ‌Naïve Bayes
• Bayesian networks
• Markov random fields
• ‌Hidden Markov Models (HMMs)
• Latent Dirichlet Allocation (LDA)
• Generative Adversarial Networks (GANs)
• Autoregressive Model
Discriminative Models

• The discriminative model refers to a class of models used in Statistical


Classification, mainly used for supervised machine learning. These types of
models are also known as conditional models since they learn the
boundaries between classes or labels in a dataset.
• Discriminative models (just as in the literal meaning) separate classes
instead of modeling the conditional probability and don’t make any
assumptions about the data points. But these models are not capable of
generating new data points. Therefore, the ultimate objective of
discriminative models is to separate one class from another.
• If we have some outliers present in the dataset, then discriminative models
work better compared to generative models i.e, discriminative models are
more robust to outliers.
Mathematical things involved in Discriminative Models
• ‌ raining discriminative classifiers involve estimating a function f: X -> Y, or
T
probability P(Y|X)
• Assume some functional form for the probability such as P(Y|X)
• With the help of training data, we estimate the parameters of P(Y|X)
Some Examples of Discriminative Models
• ‌Logistic regression
• Scalar Vector Machine (SVMs)
• ‌Traditional neural networks
• ‌Nearest neighbor
• Conditional Random Fields (CRFs)
• Decision Trees and Random Forest
e e n
t w
e and
e b
e n c i ve l s
f e r n a t d e
i f
D c r i m ve M i o
D i s a t i
n e r
G e
• A father has two kids, Kid A and Kid B. Kid A has a special character whereas he can learn
everything in depth. Kid B have a special character whereas he can only learn the differences
between what he saw.
• One fine day, The father takes two of his kids (Kid A and Kid B) to a zoo. This zoo is a very small
one and has only two kinds of animals say a lion and an elephant. After they came out of the
zoo, the father showed them an animal and asked both of them “is this animal a lion or an
elephant?”
• The Kid A, the kid suddenly draw the image of lion and elephant in a piece of paper based on
what he saw inside the zoo. He compared both the images with the animal standing before and
answered based on the closest match of image & animal, he answered: “The animal is Lion”.
• The Kid B knows only the differences, based on different properties learned, he answered:
“The animal is a Lion”.
• Here, we can see both of them is finding the kind of animal, but the way of learning and the
way of finding answer is entirely different. In Machine Learning, We generally call Kid A as a
Generative Model & Kid B as a Discriminative Model.
• In General, A Discriminative model ‌models the decision boundary between the classes. A
Generative Model ‌explicitly models the actual distribution of each class. In final both of them
is predicting the conditional probability P(Animal | Features). But Both models learn different
probabilities.
• A Generative Model ‌learns the joint probability distribution p(x,y). It predicts the conditional
probability with the help of Bayes Theorem. A Discriminative model ‌learns the conditional
probability distribution p(y|x). Both of these models were generally used in supervised
learning problems.
Difference between Discriminative and
Generative Models
• Discriminative models draw boundaries in the data space, while generative models try to model how data is
placed throughout the space.
• A generative model focuses on explaining how the data was generated, while a discriminative model focuses
on predicting the labels of the data.
• In mathematical terms, a discriminative machine learning trains a model which is done by learning
parameters that maximize the conditional probability P(Y|X), while on the other hand, a generative model
learns parameters by maximizing the joint probability of P(X, Y).
• Discriminative models recognize existing data i.e, discriminative modeling identifies tags and sorts data and
can be used to classify data while Generative modeling produces something.
• Since these models use different approaches to machine learning, so both are suited for specific tasks i.e,
Generative models are useful for unsupervised learning tasks while discriminative models are useful for
supervised learning tasks.
• Discriminative models are computationally cheap as compared to generative models.
• Generative models need fewer data to train compared with discriminative models since generative models are
more biased as they make stronger assumptions i.e, assumption of conditional independence.
• In general, if we have missing data in our dataset, then Generative models can work with these missing data,
while on the contrary discriminative models can’t.
• Local discriminative models generally take the form of conditional history-
based models, where the derivation of a candidate analysis y is modeled as
a sequence of decisions with each decision conditioned on relevant parts of
the derivation history.
• In a local discriminative model, the score of an analysis y, given the
sentence x, factors into the scores of different decisions in the derivation of
y. In a global discriminative model, by contrast, no such factorization is
assumed, and component scores can all be defined on the entire analysis y.
• This has the advantage that the model may incorporate features that
capture global properties of the analysis, without being restricted to a
particular history-based derivation of the analysis (whether generative or
discriminative).
NI T 4
OF U
E N D

You might also like