NLP Unit 4
NLP Unit 4
https://www.youtube.com/watch?v=X92-Chomhw8
https://www.youtube.com/watch?v=hGuXUIefVkc
https://www.youtube.com/watch?v=wSONlMwa9rE
NLP PARSING BASICS
• Parsing in NLP is the process of determining the syntactic structure of a text by analysing
its constituent words based on an underlying grammar (of the language).
• See this example grammar below, where each line indicates a rule of the grammar to be
applied to an example sentence “Tom ate an apple”.
• the outcome of the parsing process would be a parse tree like the following,
where sentence is the root, intermediate nodes such as noun_phrase, verb_phrase etc.
have children - hence they are called non-terminals and finally, the leaves of the
tree ‘Tom’, ‘ate’, ‘an’, ‘apple’ are called terminals.
• https://devopedia.org/natural-language-parsing
Statistical Parsing
• Statistical parsing is a group of parsing methods within natural language processing. The
methods have in common that they associate grammar rules with a probability.
Grammar rules are traditionally viewed in computational linguistics as defining the valid
sentences in a language.
• Within this mindset, the idea of associating each rule with a probability then provides
the relative frequency of any given grammar rule and, by deduction, the probability of a
complete parse for a sentence. (The probability associated with a grammar rule may be
induced, but the application of that grammar rule within a parse tree and the
computation of the probability of the parse tree based on its component rules is a form
of deduction.)
• Using this concept, statistical parsers make use of a procedure to search over a space of
all candidate parses, and the computation of each candidate's probability, to derive the
most probable parse of a sentence. The Viterbi algorithm is one popular method of
searching for the most probable parse.
• "Search" in this context is an application of search algorithms in artificial intelligence.
• As an example, think about the sentence "The can can hold water".
• A reader would instantly see that there is an object called "the can" and that this object
is performing the action 'can' (i.e. is able to); and the thing the object is able to do is
"hold"; and the thing the object is able to hold is "water".
• Using more linguistic terminology, "The can" is a noun phrase composed of a
determiner followed by a noun, and "can hold water" is a verb phrase which is itself
composed of a verb followed by a verb phrase.
• But is this the only interpretation of the sentence? Certainly "The can-can" is a perfectly
valid noun-phrase referring to a type of dance, and "hold water" is also a valid verb-
phrase, although the coerced meaning of the combined sentence is non-obvious.
• This lack of meaning is not seen as a problem by most linguists but from a pragmatic
point of view it is desirable to obtain the first interpretation rather than the second and
statistical parsers achieve this by ranking the interpretations based on their probability.
PCFG
• A Probabilistic Context-Free Grammar (PCFG) is simply a Context-Free Grammar with probabilities assigned
to the rules such that the sum of all probabilities for all rules expanding the same non-terminal is equal to
one.
• A PCFG is a simple extension of a CFG in which every production rule is associated with a probability (Booth
and Thompson 1973).
• Formally, a PCFG is a quintuple with G = (, N, S, R, D), where is a finite set of terminal symbols, N is a
finite set of nonterminal symbols, S ∈ N is the start symbol, R is a finite set of production rules of the form A
→ α, where A ∈ N and D : R → [0, 1] is a function that assigns a probability to each member of R.
• A PCFG can be used to estimate a number of useful probabilities concerning a sentence and its parse tree(s),
including the probability of a particular parse tree (useful in disambiguation) and the probability of a
sentence or a piece of a sentence (useful in language modeling).
Generative Models
• People all around the world speak so many different languages, but a
Computer System or any other Computerized Machine only understands a
single language i.e. binary language (1s and 0s).
• This system or a process that converts human language to computer
understandable language is known as Natural Language Processing (NLP),
though various diversified models have suggested so far, yet the need for a
generative predictive model which can optimize depending upon the nature of
problem being addressed is still an area of research under work.
• The generative model is a single platform for diversified areas of NLP that can
address specific problems relating to read text, hear speech, interpret it,
measure sentiment and determine which parts are important.
• This is achieved by process of elimination once the relevant components are
identified. Single platform provides same model generating and reproducing
optimized solutions and addressing different issues.
• Generative models are considered as a class of statistical models that can generate new
data instances. These models are used in unsupervised machine learning as a means
to perform tasks such as:
Probability and Likelihood estimation,
Modeling data points,
To describe the phenomenon in data,
To distinguish between classes based on these probabilities.
• Since these types of models often rely on the Bayes theorem to find the joint
probability, so generative models can tackle a more complex task than analogous
discriminative models.
• So, Generative models focus on the distribution of individual classes in a dataset and
the learning algorithms tend to model the underlying patterns or distribution of the
data points. These models use the concept of joint probability and create the instances
where a given feature (x) or input and the desired output or label (y) exist at the same
time.
• These models use probability estimates and likelihood to model data points and
differentiate between different class labels present in a dataset. Unlike discriminative
models, these models are also capable of generating new data points.
Mathematical things involved in Generative Models
• raining generative classifiers involve estimating a function f: X -> Y, or
T
probability P(Y|X):
• Assume some functional form for the probabilities such as P(Y), P(X|Y)
• With the help of training data, we estimate the parameters of P(X|Y), P(Y)
• Use the Bayes theorem to calculate the posterior probability P(Y |X)
Some Examples of Generative Models
• Naïve Bayes
• Bayesian networks
• Markov random fields
• Hidden Markov Models (HMMs)
• Latent Dirichlet Allocation (LDA)
• Generative Adversarial Networks (GANs)
• Autoregressive Model
Discriminative Models