0% found this document useful (0 votes)
15 views

NLP question

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

NLP question

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

KHULNA UNIVERSITY OF ENGINEERING & TECHNOLOGY

B.Sc. Engineering 4" Year 24 Term Examination, 2019


Department of Computer Science and Engineering
CSE 4221
Natural Language Processing
TIME: 3 hours FULL MARKS: 210
N.B. i) Answer ANY THREE questions from each section in separate scripts.
ii) Figures in the right margin indicate full marks.

SECTIONA
(Answer ANY THREE questions from this section in Script A)
1. a) Why is the pattern matching by regular expression greedy? Explain with example. (09)
b) Define text normalization. What is the difference between lemmatization and stemming? (10)
Explain with example.
c) Explain porter’s algorithm with example. (11)
d) What is types and tokens? How many types and tokens are there in the following sentence? (05)
“He stepped out into the hall, was delighted to encounter a water brother.”

2. a) “Finding the minimum edit distance is problem of the category ‘dynamic programming’ ”- (12)
Explain the statement to convert one String X of length n to a String Y of length m.
b) Apply the minimum edit distance algorithm (using insertion cost = 1, deletion cost = 1 and (10)
substitution cost = 2) of “leda” to “deal”. Show your work in edit distance grid.
c) Define bag of words problem. How is bag of words conditional independence assumption (13)
applied in Naive Bayes text classification? Explain.

3. a) Define Stop word. How do you handle Stop word in text classification? (08)
b) Given the following text classification for classes C; and C2 (10)
i) PQR > C;
ii) PPS-~C
iii) PT>C;
iv) UVP—-C2
Compute the most likely class for the text “PPPUV”.
c) Consider the following likelihood for a review (05)

Pos_ | Neg
I 0.09 | 0.16
always | 0.07 | 0.06
like 0.29 | 0.06
foreign | 0.04 | 0.15
films | 0.08 | 0.11 |

What class will the Naive Bayes assign to the sentence “I always like foreign films”?
d) Find the context free rules and hence the context free grammar (CFG) for the following (12)
English sentences.
i) I want a morning flight.
ii) Which flight serves breakfast?
iii) Show the lowest fare.
iv) The flight should be at 11 am.

4. a) Consider the following grammar in CNF (10)


S— AB|BB
A-—>CC|AB|a
B— BB|CA|b
C—BA|AA|b
Is ‘aabb’ in L(G)? Justify your answer using CYK algorithm.
b) How does Early Parser differ from CYK Parser? Explain finite state chunking with example. (10)
c) Define PCFG? How can you disambiguate a syntactic Parsing by PCFG? (06)
d) Explain the IR based factoid question answering using example. (09)

Page: 1 of 2
SECTION B
(Answer ANY THREE questions from this section in Script B)
5. a) Define Natural Language Processing (NLP). Why do we need NLP? Write some industrial (10)
applications of NLP? Der've
b) What does smoothing mean? Drive the equation for Laplace smoothing. (10)
c) Given the following corpus: (07)
I like NLP.
I like Deep Learning.
NLP and Deep Learning have huge area for research.
Using Bigram Language model with add-one smoothing, what is P (NLP | like)?
d) Define zero situation and discuss the effect of it. (08)
6. a) “POS tagging is a disambiguation task” — Justify the statement with example. (08)
b) Define perplexity. Derive the equation of computing W with a Bigram Language model. (10)
c) For Hidden Markov Model (HMM) Pos tagging, using the following formula, find the (10)
equation of calculating tag transition probabilities
a
e? ae ir we P ( e | wt)

d) Consider the sentence: “Eight/CD horses/NNS will/MD race/? For/IN the/DT cup./NN.” (07)
Given the probabilities below, find the right Pos tag for the word “race”.(VB or NN).
P (VB | MD) = 0.0045, P (NN | MD) = 0.062, P (race | NN) = 0.048, P (race | VB) = 0.014,
P (IN| VB) = 0.0012, P (IN | NN) = 0.0024
7. a) Define Markov chain. Explain the components of Markov chain. (06)
b) For logistic regression, show that P (y = true| x) =1|(1 +e) (10)
c) Given a sequence of ice-cream observations 3 1 3 and an HMM 2 = (A, B) in the following (12)
figure.

B, B,
P(1|HOT)=0.2 P (1 | COLD) = 0.5
P(2|HOT)=0.4 07 P (2 | COLD) =0.4
P (3 | HOT) =0.4 P (3 | COLD) = 0.1
Find the best hidden weather sequence Q (like HHH).
d) What are the statistical models for processing text and speech? Give example of fully (07)
connected & left to right HMM.

8. a) Define sentence tokenization. Draw the architecture of a TTS. (07)


b) Discuss about the steps of speech synthesis (text to waveform). Show the Hourglass Metaphor. (10)
c) What is prosody? Discuss about the three aspects of prosody. (10)
d) What is accent ratio? Why is accent ratio important? (08)

Page: 2 of 2
KHULNA UNIVERSITY OF ENGINEERING & TECHNOLOGY
_B.Sc. Engineering 4th Year 2nd Term Examination, 2018 .
. Department of Computer Science and Engineering
‘CSE 4221
“Natural Language Processing .
_ TIME: 3 hours. . FULL MARKS: 210 |
N.B. i) Answer ANY THREE questions from each section in separate scripts.
il) Figures i in the right margin indicate full marks.

‘SECTION A. .
(Answer ANY THREE questions from this section in Script A)
1, a) What is Disjunction, Grouping and Precedence for pattern matching i in regular expressions? (10)
Explain with example.
. b) “Pattern matching by regular expressions are greedy.” — Justify the statement. (09)
c) Design.a regular expression to find all instances of the word “the” inatext. - (10)
d) Define types and tokens. How many types and tokens are there in the following sentence: (06) -
- “They picnicked by the pool, then lay back on the grass and looked at the. stars.”
2. a) What is Lemmatization and Stemming? How is Lemmatization done? i (08)
b) What are the operations for editing one string to another? Explain. ©. (06).
c) Explain the algorithm to edit one string X of length x to a string Y of length m.n Show the steps (12)
of your algorithm for
X¥ = INTENTION and Y = EXECUTION.
d) Discuss about the problem with Maximum Likelihood. How does the Laplace (add-1) (oy
Smoothing solve the problem? .
3. a) “Accuracy is not a good metric when the goal is to discover something that is rare.” — Justify (10)
the statement with example. Propose a metric to solve the drawbacks of accuracy.
b). Given the following short movie reviews, each labeled with a genere, either comedy ¢ or action: (10)
i) fun, couple, love, love => comedy
ii) fast, furious, shoot = action .
iii) couple, fly, fast, fun, fun = comedy
iv) . furious, shoot, shoot, fun => action ~
‘v). fly, fast, shoot, love => action .
Consider a new document D: fast, couple, shoot, fly. Compute the most likely class for D.
c) Find the context frée rules and hence the Context Free Grammar (CFG) for the following (5)
English sentences:
i) -I want a morning flight. .
ii) I wanta flight from Ontario to Chicago. . “ .
iii). Show me the cheapest-fare that has, lunch. |
iv) Do-any of these flights have stops? -
_v) Which flights serves breakfast?

4, a). Consider the following grammar in CNF. a LOD


S— AB|BC .
A—>BA\a
BCC |b
C—-AB\a
Is ‘paaba’ in L(G)? Explain your answer using CYK algorithm.
b) Define shallow parsing. What are the applications of shallow parsing? (05)
c) Define Probabilistic Context Free Grammar (PCFG). Consider the following PCFG. — (12)
“S — NPVP | AuxNPVP | VP [0.8|0.1|0.1] - |.
NP —> Pronoun | Proper-noun | DetNominal [0.2 | 0.2 |. 0. 6]
Nominal — Noun | NominalNoun-| NominalPP [0.3 | 0.2 | 0.5]
VP —> verb | verbNP | VPPP [ 0.2 | 0.5 | 0. 3)
_ PP ->PrepNP [1.0]
Det -> the | a | that | this [0.6 | 0.2 | 0.1 | 0.1)
Noun -> book | flight | meal | money [0.1 | 0.5 | 0.2 | 0.2]
verb —» book | include | prefer [0.5 | 0.2 |0.3]
Pronoun >I | he | she | me [0.5 [0.1 | 0.1 | 0.3]
Proper-noun — Houston | NWA [0.8 | 0.2]
Prep —> from | to | on | near | through [0.25 | 0.25 | 0.1 | 0.2 | 0.2]
i) Find the probability of the sentence “book the flight through Houston”.
ii) Using the disambiguation algorithm select the proper parse tree.
d) What are the Stages of IR based question answering? Explain. Se (08) ©

Page: I of2
SECTION B o
"(Answer ANY THREE questions from this section: in Script B)
5. a) Define Natural Language Processing (NLP). What are the “Major areas of research and (10)
development of NLP?
b) What does n-gram mean? Drive the equation ofcalculating the probability for n- grams model. (10):
_ ¢) Consider the following corpus. . (08)
Se <s> 1 am Sam </s> =. .
<s> Sam I am </s>
<s> Iam Sam </s>
<s> I do not like green eggs and Sam </s>
Using a Bigram Language model with add-one smoothing, what is Plsam| am)? Include <s>
‘and </s> in your counts just like any other token.
d) What is absolute discounting? What is its advantages?_ Doe | (07)
6. a) What is closed class and open class of Part-of-Speech (POS)? Explain with example.: (08)
- b). Discuss about Rule-Based POS tagging. Write the ADVERBIAL-THAT RULE. (12) -
c)' For Hidden. Markov Model (HMM) POS Tagging, using the following formula, find the (08)
equation of calculating tag transition probabilities. —
it = argmax Pl} | wi )
qo,
d) Consider the sentence: “Secretariat/NNP is/BEZ, expected/VBN to/TO race/? Tomortow/NR”. (07)
The word “race” is often used as VB or NN. Given the probabilities below, find the right POS
tag for the word “race”
P(NN| TO)= 0.00047, P(ve| TO) =0.83, P(race| NN)=0.00057, P(race| vB) = 0.00012,
_ P(NN|VB)= 0.0027, P(NR| NN)= 0.0012. : |
7. ) HMM characterized by three fundamental problems. Name and discuss about the problems. (09)
__b) Given a sequence of ice-cream observations 313 and an HMM 4= (A, B)in the 1 * following (12) .
figure, find the best hidden. weather sequence Ollike H H H ).

. 2 .
.

P(1|HOT)| [0.2 P(I|COLD)] 0.5


‘P(2|HOT)|=|0.4} | P(2| COLD) | =
P(|HOT)| |0.4 . P(3|COLD)| |0.1

c) Define the term odds for logistic regression. Show that the observation should be labeled true (09)
| if yo wf, >0. ——
a) Write the three- “steps of Forward Algorithm. . — (05)
8. a) Name and discuss about ‘the types of TTS. mo (06) -
_b) Speech Synthesis perform text to waveform mapping in two-steps. Name and discuss about the (12)
_steps. Using Hourglass Metaphor.
~ c) What is Homograph disambiguation? What are the problems of CMU? How does. UNISYN (10)
overcome the problems of CMU?
d) Define text normalization. Why does text normalization important f for Speech’ Synthesis? _ (07).

Page: 2 of 2

You might also like