0% found this document useful (0 votes)

273 views6 pages

Analysis of Statistical Parsing in Natural Language Processing

- A statistical language model is a probability distribution P(s] over all possible word sequences (or any other linguistic unit like words, sentences, paragraphs, documents, or spoken utterances). A number of statistical language models have been proposed in literature. The dominant approach in statistical language modeling is the n-gram model.

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

273 views6 pages

Analysis of Statistical Parsing in Natural Language Processing

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Volume 3, Issue 12, December – 2018 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Analysis of Statistical Parsing in Natural

Language Processing
Krishna Karoo Dr Girish Katkar
Research Scholar, Department of Electronics & Head of Department, Department of Computer Science
Computer Science, R.T.M. Nagpur University, Nagpur Arts, Commerce & Science College Koradi Dist:- Nagpur

Abstract:- A statistical language model is a probability Test sentence (s): The Arabian knights are the fairy tales
distribution P(s] over all possible word sequences (or of the east.
any other linguistic unit like words, sentences,
paragraphs, documents, or spoken utterances). A P(The/<s>) x P(Arabian/the) x P(Knights/Arabian) x
number of statistical language models have been P(are/knights)
proposed in literature. The dominant approach in x P(the/are) x P(fairy/the) x P(tales/fairy) x P(of/tales) x
statistical language modeling is the n-gram model. P(the/of)
x P(east/the)
I. INTRODUCTION = 0.67 x 0.4 x 1.0 x 1.0 x 0.5 x 0.2 x 1.0 x 1.0 x 1.0 x
0.2
A. n-gram Model =0.0268
As discussed earlier, the goal of a statistical
language model is to estimate the probability (likelihood) As each probability is necessarily less than 1,
of a sentence. This is achieved by decomposing sentence multiplying the probabilities might cause a numerical
probability into a product of conditional probabilities underflow, particularly in long sentences. To avoid this,
using the chain rule as follows: calculations are made in log space, where a calculation
corresponds to adding log of individual probabilities and
P(s)= P(w1, w2, w3,……….,wn) taking antilog of the sum.
P(s)= P(w1) P(w2/w1) P(w3/w1 w2)……
P(wn/w1 w2 ……wn-1) B. Add-one Smoothing
This is the simplest smoothing technique. It adds a
value of one to each n-gram frequency before
= Language Processing and normalizing them into probabilities. In general, add-one
Information Retrieval smoothing is not considered a good smoothing technique.
It assigns the same probability to all missing n-grams,
Where h, is history of word w, defined as W 1 W2... even though some of them could be more intuitively
W i_ 1 appealing than others. Gale and Church (1994) reported
that variance of the counts produced by the add-one
Example-1 smoothing is worse than the unsmoothed MLE method.
Another problem with this technique is that it shifts too
 Training set much of the probability mass towards the unseen n-grams
The Arabian Knights (n-grams with 0 probabilities) as there number is usually
These are the fairy tales of the east quite large, Good-Turing smoothing (Good 1953
The stories of the Arabian knights are translated in attempts to improve the situation by looking at the
many languages number of n-grams with a high frequency in order to
estimate the probability mass that needs to be assigned to
 Bi-gram model missing or low-frequency n-grams.

P(the/<s>) = 0.67 P(Arabian/the) = 0.4 P(knights C. Good-Turing Smoothing

/Arabian) = 1.0 Good-Turing smoothing (Good 1953) adjusts the
P(are/these) = 1.0 P(the/are) = 0.5 P(fairy/the) = frequency faf am re-gram using the count of re-grams
0.2 having a frequency of occurrence /+!-It converts the
P(tales/fairy) = 1.0 P(of/tales) = 1.0 P(the/of) = frequency of an re-gram from ftof* using the following
1.0 expression:
P(east/the) = 0.2 P(stories/the) = 0.2 P(of/stories) nf+I /•=(/+!)-£- nf
- 1.0
P(are/knights) = 1.0 P(translated/are) = 0.5 P(in where ni is the number of re-grams that occur
/translated) = 1.0 exactly / times in the training corpus. As an example,
^(many/in) = 1.0 consider that the number of re-grams that occur 4 times is
P(languages/many) = 1.0 25,108 and the number of re-grams that occur 5 times is
20,542. Then, the smoothed count for 4 will be H*™

IJISRT18DC138 www.ijisrt.com 109

Volume 3, Issue 12, December – 2018 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
D. Caching Technique rarely in earlier sections. The basic re-gram model ignores
Another improvement over basic re-gram model is this sort of variation of re-gram frequency. The cache
caching. The frequency of re-gram is not uniform across model combines the most recent re-gram frequency with
the text segments or corpus. Certain words occur more the standard re-gram model to improve its performance
frequently in certain segments (or documents) and locally. The underlying assumption here is that the
rarely in others. For example, in this section, the recently discovered words are more likely to be
frequency of the word 're-gram' is high, whereas it occurs repeated.

Fig 1:- Part-of-speech example

II. PART-OF-SPEECH TAGGING B. Rule-based Tagger

Most rule-based taggers have a two-stage
Part-of-speech tagging is the process of assigning a architecture. The first stage is simply a dictionary look-
part-of-speech (such as a noun, verb, pronoun, up procedure, which returns a set of potential tags
preposition, adverb, and adjective), to each word in a (parts-of-speech) and appropriate syntactic features for
sentence. The input to a tagging algorithm is the each word. The second stage uses a set of hand-coded
sequence of words of a natural language sentence and rules to discard contextually illegitimate tags to get a
specified tag sets (a finite list of part-of-speech tags). The single part-of-speech for each word. For example, consider
output is a single best part-of-speech tag for each word. the noun-verb ambiguity in the following sentence: The
Many words may belong to more than one lexical show must go on.
category. For example, the English word 'book' can be a
noun as in '/ am reading a good book1 or a verb as in C. Stochastic Tagger
'Thepolice booked the snatcher\ The same is true for other The standard stochastic tagger algorithm is the
languages. For example, the Hindi word 'soan’ may HMM tagger. A Markov model applies the simplifying
mean 'gold' (noun) or 'sleep' (verb). However, only one assumption that the probability of a chain of symbols can
of the possible meanings is used at a time. In tagging, we be approximated in terms of its parts or w-grams. The
try to determine the correct lexical category of a word in simplest n-gram model is the unigram model, which
its context. No tagger is efficient enough to identify the assigns the most likely tag (part-of-speech) to each
correct lexical category of each word in a sentence in token.
every case. The tag assigned by a tagger is the most likely
for a particular use of word in a sentence. D. Hybrid Taggers
Hybrid approaches to tagging combine the
A. Hybrid taggers features of both the rule-based and stochastic
Hybrid taggers combine features of both these approaches. They use rules to assign tags to words. Like
approaches. Like rule-based systems, they use rules to the stochastic taggers, this is a machine learning technique
specify tags. like stochastic systems, they use machine- and rules are automatically induced from the data.
learning to induce rules from a tagged training corpus Transformation-based learning (TBL) of tags, also
automatically. The transformation-based tagger or Brill known as Brill tagging, is an example of hybrid
tagger is an example of the hybrid approach. approach. TBL is a machine learning method introduced
by E. Brill (in 1995). Transformation-based error-driven

IJISRT18DC138 www.ijisrt.com 110

Volume 3, Issue 12, December – 2018 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
learning has been applied to a number of natural language As it is difficult to collect statistics for this equation,
problems, including part-of-speech tagging, speech we apply the Bayesian formula to compute it.
generation, and syntactic parsing (Brill 1993, 1994,
Huang et al. 1994). A. Bootstrapping
The Bayes classifier attempts to combine evidence
III. WORD SENSE DISAMBIGUATION from all words in the context window to help
disambiguation. This requires a large sense tagged
Having discussed various types of ambiguities we now training set to collect evidences. Hearst (1991) proposed
focus on identifying the correct sense of words in a the bootstrapping approach to eliminate the need for a
particular use. The first attempt at automatic sense large training set. The bootstrapping method relies on a
disambiguation was made in the context of machine relatively small number of instances labeled with senses
translation. The famous Memorandum, Weaver (1949) having a high degree of confidence. This could be
discusses the need for word sense disambiguation (WSD) accomplished by manually tagging those instances of an
in machine translation, and outlines an approach to ambiguous word for which the sense is clear (Hearst
WSD, which underlies all subsequent work on the topic. 1991). These labeled instances are used as seeds to train
A. Selectional Restriction-based Word Sense an initial classifier. The classifier is then used to extract
Disambiguation more training instances from the remaining untagged
Selectional restrictions or preferences can be used in corpus. As the process is repeated, the training corpus
parsing to eliminate flawed meaning representations. This grows and the numbers of untagged instances are
can be viewed as a form of indirect word sense reduced. The iteration continues until the remaining
disambiguation. We now explore this idea. Consider the untagged corpus is empty or no new instance can be
following sentences: annotated.
The institute will employ new employees, ('to hire') (a)
The committee employed her proposal, ('to accept') B. Bilingual
(b) Corpora
A bilingual corpus consists of two corpora, one of
One can intuitively differentiate the senses of employ which is a translation of the other. As different senses of
in sentences (a) and (b) with the complements of each an ambiguous word often translate differently in another
employ. To be more precise, employ in (a) restricts its language, a bilingual corpus can be used for
subject and object nouns to those associated with the disambiguating word senses. For example, the Hindi
semantic features human/organization and human, word cp<*i<H is translated as pen in the writing sense and graft
respectively. On the other hand, employ in (b) restricts its in the transplant sense. Gale et al. (1992b, 1993) use the
subject and object nouns to those associated with the bilingual Hansard corpus to avoid manual sense tagging
semantic features human/organization and idea, of a corpus. The Handsard corpus consists of
respectively. Consequently, given employees as the transcriptions in French and English of the proceedings of
object, the sense to hire is selected as the interpretation of the Canadian parliament. They first automatically
employ in (a), and the sense to accept is ruled out. The aligned the bilingual corpus and then tagged the words
same reasoning can be used to select the sense to accept of the aligned corpus using the basic assumptions that
as the interpretation of employ in (b). translations of a word reflect the senses of that word.

B. Context-based Word Sense Disambiguation Approaches V. UNSUPERVISED METHODS OF WSD

Approaches to stand-alone WSD that make use of
context of ambiguous word basically fall into one of the Unsupervised methods of WSD eliminate the
following two general categories: need for sense tagged training data. Instead, these
• Knowledge-based approaches take feature-value representations of
• Corpus-based unlabelled contexts (instances) and group them into
clusters. Each cluster can be assumed to represent one
IV. BAYESIAN CLASSIFICATION sense of an ambiguous word. These clusters can be
represented as the average of their constituent feature
The specific algorithm we describe here was vectors. Unknown instances are classified as having the
introduced by Gale (1992). The classifier assumes that sense of the cluster to which they are closest according to
we have a corpus in which each occurrence of an the similarity measure. Strictly speaking, using a
ambiguous word is labelled with its correct sense. The completely Unsupervised sense disambiguation task, we
words around the ambiguous word are used to define a can only discriminate word senses. That is, we can group
context window. The classifier treats the context of word together instances of a word used in different senses
w as a bag of words without structure. No feature without knowing what those senses are. However,
selection is done. All the words occurring in the context Yarowsky (1995) proposed an Unsupervised algorithm that
window contribute in deciding which sense of the can accurately disambiguate word senses in a large
ambiguous word is likely to be used with it. What we" completely untagged corpus. He exploited two powerful
want to find is the most likely sense sf for an input properties of human language in an iterative bootstrapping
context c of an ambiguous word w. This is obtained as setup to avoid the need of manually tagged training data
$' = arg max P(sk/c]h (adapted from Yarowsky 1995):

IJISRT18DC138 www.ijisrt.com 111

Volume 3, Issue 12, December – 2018 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 One sense per discourse: The sense of a target word is disambiguation. All this information can be used
highly consistent within any given document or together with general knowledge about the situation to rule
discourse. out impossible readings.
 One sense per collocation: Nearby words provide strong
and consistent clues to the sense of a target word, B. Applications of WSD
conditional on relative distance, order and syntactic Word sense disambiguation (WSD) is only an
relationship. intermediate task in NLP, like POS tagging or parsing.
Accurate WSD is important for many applications, e.g.,
A. Knowledge Sources in WSD machine translation and information retrieval.
A variety of information, including syntactic (part-of-
speech, grammatical structure), semantic (selection One of the first applications of WSD was machine
restriction) and pragmatic (topics) information as well translation, for which, disambiguating the sense of a
as dictionary (definitions), and corpus (collocation) source language word is crucial for accurately selecting
specific information, can be utilized as a knowledge its translation equivalent in the target language. The Hindi
source in WSD. Here is a list of some of the information word ^PcJT, for example, can either have the sense of the
sources deemed useful in disambiguation. English word fruit, or the sense of Mfeuurft (result). In order
to correctly translate a text containing cpaf, we need to
 Context of a word know which sense is intended.
The context of a word can be regarded as the
words surrounding the ambiguous word. A word C. WSD Evaluation
only can be disambiguated in its context. The context is Evaluation is important in all NLP tasks. It has
therefore useful in determining the meaning of a word in a always been a problem in disambiguation research, as the
particular usage. only way to judge the performance of a disambiguator is
to manually check its output. Manual checking is time
 Frequency of a sense consuming and because of this, most disambiguators have
This information is generally used in statistical been evaluated only on a small number of words. The
approaches to measure the likelihood of each possible SENSEVAL initiatives have simplified the evaluation
sense. Usually, this statistics is gathered over some sense- task. The basic metric used for evaluating word sense
tagged corpus. disambiguation algorithm is precision and recall. Precision
measures the fraction of correctly tagged instances in the
 Part-of-speech total set. This requires access to an annotated corpus.
Part-of-speech information can reduce the number Two such corpuses are now available: the SEMCOR
of possible senses a word can have. For example, in (Landes et al. 1998) corpus and SENSEVAL (Kilgariff
WordNet 2.0 bitter has 3 senses as noun, 7 senses as and Rosenzweig 2000) corpus. These metrics fail to give
adjective, and one sense as a verb. The use of bitter as a any credit to an algorithm that makes only broad
verb does not lead to ambiguity. distinctions between senses, as they consider sense
match to be exact. Some metrics have been proposed to
 Collocations give partial credit to instances where a broader sense is
These may provide useful information about the selected.
sense of a word. For instance, the noun match has 9
senses listed in WordNet but only one of these applies to VI. CONCLUSION
football match.
Semantic analysis is concerned with meaning
 Selectional preferences representation of linguistic inputs. A meaning
Semantic restrictions that predicates place on their representation bridges the gap between linguistic and
argument can be used for disambiguation. For instance, commonsense knowledge. A meaning representation
eat in the have a meal sense prefers humans as subjects. language must be verifiable and unambiguous. It should
This knowledge is similar to the argument-head relation, support the use of variable and inferencing and must be
but selectional preferences are given in terms of semantic expressive enough to handle the wide variety of content
classes, instead of plain words. found in natural language. Syntax driven semantic
analysis uses the syntactic constituents of a sentence to
 Domain build its meaning representation. Semantic grammar
In a particular domain, only one sense of a word is provides an alternative way for creating meaning
likely to be used. Thus, information about domain representation. Word sense disambiguation is concerned
furnishes useful information for disambiguation. For with identifying the correct sense of a word. The
example, in the domain of sports, the cricket bat sense knowledge sources used by word sense disambiguation
of bat is preferred. algorithms include context of word, sense frequency,
selectional preferences, collocation and domain.
Besides these, thematic role of a word (subject or
object), sentence structure, semantic word properties, and
pragmatic information may also be utilized in sense

IJISRT18DC138 www.ijisrt.com 112

Volume 3, Issue 12, December – 2018 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
REFERENCES [13]. Monz, C. and de Rijke, M. 2001. Light-Weight
Entailment Checking for Computational Semantics.
[1]. Bourns, G. 1987. A Unification-based Analysis of The third workshop on inference in computational
Unbounded Dependencies in Categorial Grammar, in semantics (ICoS-3)
J.Groenendijk, M. Stokhof, & F. Veltman (eds.) [14]. Punyakanok., V.,Roth, D. and Yih, W., 2004
Proceedings of the sixth Amsterdam Colloquium, Mapping Dependencies Trees: An Application to
University of Amsterdam, Amsterdam, 1-19. Question Answering Proceedings of AI & Math 2004
[2]. Bourns, G., 1988, Modifiers and Specifiers in Ratnaparkhi, A. 1996 A Maximum Entropy Part-Of-
Categorial Unification Grammar, Linguistics, vol 26, Speech Tagger. In proceeding of the Empirical
21-46.Bourns, G., E. KSnig, & H. Uszkoreit, 1988. A Methods in Natural Language Processing
Flexible Graph-Unification Formalism and its Conference, May 17-18, 1996
Application to Natural Language Processing, IBM [15]. Szpektor I., Tanev H., Dagan I., and Coppola B. 2004
Journal of Research and Development, 32, 170-184 Scaling Web-based Acquisition of Entailment
[3]. Calder, J., E. Klein, & H. Zeevat 1988. Unification Relations In Proceedings of EMNLP-04 - Empirical
Categoriai Grammar: a concise, extendable grammar Methods in Natural Language Processing,
for natural language processing. Proceedings of Barcelona, July 2004
Coling 1988, Hungarian Academy of Sciences, [16]. K. Zhang K., Shasha D. 1990 Fast algorithm for the
Budapest, 83-86. unit cost editing distance between trees. Journal of
[4]. 4. Haas, A. 1989. A Parsing Algorithm for algorithms, vol. 11, p. 1245-1262, December 1990.
Unification Grammar. Computational Linguistics 15- [17]. Asher, Nicholas. 1993. Reference to Abstract Objects
4, 219-232. in Discourse. Kluwer Academic Publishers,
[5]. Karttunen, L. 1989. Radical Lexicalism. In M. Baltin Dordrecht.Ballard, D. Lee, Robert Conrad, and
& A. Kroch (eds.), Alternative Conceptions of Phrase Robert E. Longacre. 1971. The deep and surface
Structure, Chicago University Press, Chicago, 43-66. grammar of interclausal relations. Foundations of
[6]. Matsumoto, Y., H. Tanaka, H. Hirakawa, II. language, 4:70-118.
Miyoshi,& H. Yasukawa, 1983, BUP : A Bottom-Up [18]. Cahn, Janet. 1992. An investigation into the
Parser embedded in Prolog. New Generation correlation of cue phrases, unfilled pauses and the
Computing, vol 1,145-158. structuring of spoken discourse. In Proceedings of
[7]. Pereira, F., & S. Shieber (1986). Proiog and Natural the IRCS Workshop on Prosody in Natural Speech,
Language Analysis. CSLI Lecture Notes 10, pages 19-30.
University of Chicago Press, Chicago.Pollard, C. • I. [19]. Cohen, Robin. 1987. Analyzing the structure of
Sag, 1987, Information-Based Syntax and Semantics, argumentative discourse. Computational Linguistics,
vol 1 : Fundamentals, CSLI Lecture Notes 13, 13 (1-2): 11-24, January-June.Costermans, Jean and
University of Chicago Press, Michel Fayol. 1997. Processing lnterclausal
[8]. Chicago.Shieber, S. 1985. Using Restriction to Relationships. Studies in the Production and
Extend Parsing Algorithms for Complex-Feature- Comprehension of Text. Lawrence Erlbaum
Based Algorithms.Proceedings of the g2nd Annual Associates,Publishers.Cumming, Carmen and
Meeting of the Association for Computational Catherine McKercher. 1994.The Canadian Reporter:
Linguistics, University of Chicago, Chicago, 145- News writing and reporting.
152. [20]. Delin, Judy L. and Jon Oberlander. 1992.
[9]. Uszkoreit, H. 1986. Categorial Unification Aspectswitching and subordination: the role of/t-
Grammars.Proceedings of COLING 1985. Institute clefts in discourse.In Proceedings of the Fourteenth
fiir angewandte Kommunikations- und International Conference on Computational
Sprachforschung, Bonn, 187-194. Linguistics (COLING-92), pages 281-287, Nantes,
[10]. Zeevat, H., E. Klein, & J. Calder, 1987. An France, August 23-28.Fraser, Bruce. 1996. Pragmatic
Introduction to Unification Categorial Grammar. In markers. Pragmatics,6(2): 167-190.
N. Haddock,E. Klein, & G. Morill (eds.), Categorial [21]. Grosz, Barbara J., Aravind K. Joshi, and Scott
Grammar,Unification grammar, and Parsing, Weinstein. 1995. Centering: A framework for
Edinburgh Working Papers in Cognitive Science, modeling the local coherence of discourse.
Vol. 1.Zwicky, A. 1986. German Adjective Computational Linguistics,21 (2):203-226, June.
Agreement in GPSG. Linguistics, vol 24,957-990. [22]. Grosz, Barbara J. and Candace L. Sidner. 1986.
[11]. Dagan, I., Glickman, O. 2004 Generic applied Attention,intentions, and the structure of discourse.
modeling of language variability In Proceedings of Computational Linguistics, 12(3): 175-204, July-
PASCAL Workshop on Learning Methods for Text September.Grover, Claire, Chris Brew, Suresh
Understanding and Mining Grenoble. Manandhar, and Marc Moens. 1994. Priority union
[12]. Lin, D. 1998. Dependency-based evaluation of and generalization in discourse grammars. In
MINIPAR. In Proceedings of the Workshop on Proceedings of the 32nd Annual Meeting of the
Evaluation of Parsing Systems at LREC-98. Granada, Association for Computational Linguistics (ACL-94),
Spain.Lin, D. and Pantel, P. 2001. Discovery of pages 17-24, Las Cruces, June 27-30.
inference rules for Question Answering. Natural
Language Engineering,7(4), pages 343-360.

IJISRT18DC138 www.ijisrt.com 113

Volume 3, Issue 12, December – 2018 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[23]. HaUiday, Michael A.K. and Ruqaiya Hasan. 1976. [35]. Prost, H., R. Scha, and M. van den Berg. 1994.
Cohesion in English. Longman. Harabagiu, Sanda M. Discourse grammar and verb phrase anaphora.
and Dan I. Moldovan. 1995. A marker-propagation Linguistics and Philosophy, 17(3):261-327, June.
algorithm for text coherence. In Working Notes of the [36]. Redeker, Gisela 1990. Ideational and pragmatic
Workshop on Parallel Processing in Artificial markers of discourse, structure. Journal
Intelligence, pages 76-86, Montreal,Canada, August. ofPragmatics, 14:367-381.
[24]. Hirschberg, Julia and Diane Litman. 1993. Empirical [37]. Sanders, Ted J.M., Wilbert P.M. Spooren, and Leo
studies on the disambiguation of cue phrases. G.M. Noordman. 1992. Toward a taxonomy of
Computational Linguistics, 19(3):501-530. coherence relations. Discourse Processes, 15:1-
[25]. Hobbs, Jerry R. 1990. Literature and Cognition. 35.Schiffrin, Deborah. 1987. Discourse Markers.
CSLI Lecture Notes Number 21. Kamp, Hand and Cambridge University Press.
Uwe Reyle. 1993. From Discourse to Logic:
Introduction to ModelTheoretic Semantics of Natural
Language, Formal Logic and Discourse
Representation Theory. Kluwer Academic
Publishers,London, Boston, Dordrecht. Studies in
Linguistics and Philosophy, Volume 42.
[26]. Kintsch, Walter. 1977. On comprehending stories. In
Marcel Just and Patricia Carpenter, editors, Cognitive
processes in comprehension. Erlbaum, Hillsdale,
New Jersey.
[27]. Knott, Alistair. 1995. A Data-Driven Methodology
for Motivating a Set of Coherence Relations. Ph.D.
thesis, University of Edinburgh.Lascarides, Alex and
Nicholas Asher. 1993. Temporal interpretation,
discourse relations, and common sense entailment.
Linguistics and Philosophy, 16(5):437-493.
[28]. Lascarides, Alex, Nicholas Asher, and Jon
Oberlander. 1992. Inferring discourse relations in
context. In Proceedings of the 30th Annual Meeting
of the Association for Computational Linguistics
(ACL-92), pages 1-8.Longacre, Robert E. 1983. The
Grammar of DiscoursePlenum Press, New York.
[29]. Mann, William C. and Sandra A. Thompson.
1988.Rhetorical structure theory: Toward a
functional theory of text organization. Text,
8(3):243-281.
[30]. Marcu, Daniel. 1996. Building up rhetorical structure
trees. In Proceedings of the Thirteenth National
Conference on Artificial intelligence (AAA1-96 ),
volume 2, pages 1069-1074, Portland, Oregon,
August 4-8,.
[31]. Marcu, Daniel. 1997. The rhetorical parsing,
summarization,and generation of natural language
texts.Ph.D. thesis, Department of Computer Science,
University of Toronto, Forthcoming.
[32]. Martin, James R. 1992. English Text. System and
Structure.John Benjamin Publishing Company,
Philadelphia/Amsterdam.
[33]. Moens, Marc and Mark Steedman. 1988. Temporal
ontology and temporal reference. Computational
Linguistics,14(2): 15-28.
[34]. Moser, Megan and Johanna D. Moore. 1997. On the
correlation of cues with discourse structure: Results
from a corpus study. Submitted for
publication.Polanyi, Livia. 1988. A formal model of
the structure of discourse. Journal of Pragmatics,
12:601-638.

IJISRT18DC138 www.ijisrt.com 114

N-gram Models in NLP Explained
No ratings yet
N-gram Models in NLP Explained
28 pages
NLP Trends and Challenges
No ratings yet
NLP Trends and Challenges
26 pages
Unit 1
No ratings yet
Unit 1
99 pages
Langauage Model
No ratings yet
Langauage Model
148 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
(Ebook) Speech and Language Processing: An Introduction To Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky, James H. Martin Download
100% (1)
(Ebook) Speech and Language Processing: An Introduction To Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky, James H. Martin Download
80 pages
(A) What Is Traditional Model of NLP?: Unit - 1
No ratings yet
(A) What Is Traditional Model of NLP?: Unit - 1
18 pages
Unit 3
No ratings yet
Unit 3
14 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
Natural Language Processing: Dr. Tulasi Prasad Sariki SCOPE, VIT Chennai
No ratings yet
Natural Language Processing: Dr. Tulasi Prasad Sariki SCOPE, VIT Chennai
29 pages
A Domain-Specific Automatic Text Summarization Using Fuzzy Logic
No ratings yet
A Domain-Specific Automatic Text Summarization Using Fuzzy Logic
13 pages
Text Summarization Using NLP Final
No ratings yet
Text Summarization Using NLP Final
38 pages
04 - Parsing in NLP
No ratings yet
04 - Parsing in NLP
39 pages
NLP Word Vectors: Intro & Methods
No ratings yet
NLP Word Vectors: Intro & Methods
128 pages
Intro to Topic Modeling
No ratings yet
Intro to Topic Modeling
120 pages
Natural Language Processing
No ratings yet
Natural Language Processing
49 pages
Question Bank
No ratings yet
Question Bank
13 pages
Table of Content
No ratings yet
Table of Content
13 pages
NLP Course for Students
No ratings yet
NLP Course for Students
25 pages
Bhawini NLP File
No ratings yet
Bhawini NLP File
100 pages
6CS4 AI Unit-5
No ratings yet
6CS4 AI Unit-5
65 pages
Placement Preparation Tasks For AI, ML
No ratings yet
Placement Preparation Tasks For AI, ML
4 pages
Natural Language Processing
No ratings yet
Natural Language Processing
12 pages
NLP Presentation
No ratings yet
NLP Presentation
19 pages
IS 7118 Unit-9 Semantics
No ratings yet
IS 7118 Unit-9 Semantics
82 pages
NLP Notes
No ratings yet
NLP Notes
203 pages
Linguistics & NLP: Morphology Basics
No ratings yet
Linguistics & NLP: Morphology Basics
14 pages
Google NLP: NLP (Natural Language Processing)
No ratings yet
Google NLP: NLP (Natural Language Processing)
8 pages
Shivangi Tyagi (NLP Assignments)
No ratings yet
Shivangi Tyagi (NLP Assignments)
60 pages
Text Summarization Using NLP
No ratings yet
Text Summarization Using NLP
6 pages
NLP: Background and Overview: Introduction To Natural Language Processing (CSE5321)
No ratings yet
NLP: Background and Overview: Introduction To Natural Language Processing (CSE5321)
30 pages
NLP Unit-V
No ratings yet
NLP Unit-V
30 pages
Introduction To Natural Language Processing (NLP)
No ratings yet
Introduction To Natural Language Processing (NLP)
87 pages
Semantics, Pragmatics, and Logic
No ratings yet
Semantics, Pragmatics, and Logic
105 pages
Speech Recognition Systems Guide
No ratings yet
Speech Recognition Systems Guide
13 pages
Seminar Title: Natural Language Processing: Understanding and Generating Human Language
No ratings yet
Seminar Title: Natural Language Processing: Understanding and Generating Human Language
20 pages
Introduction To NLP: Natural Language Processing
No ratings yet
Introduction To NLP: Natural Language Processing
21 pages
1 Intro To NLP
100% (1)
1 Intro To NLP
46 pages
NLP End Sem Paper - Evaluation Scheme
No ratings yet
NLP End Sem Paper - Evaluation Scheme
14 pages
Module-5:: Network Analysis
No ratings yet
Module-5:: Network Analysis
22 pages
Text Analysis With NLTK Cheatsheet PDF
No ratings yet
Text Analysis With NLTK Cheatsheet PDF
3 pages
Natural Language Processing
No ratings yet
Natural Language Processing
24 pages
Unit-1 Aim 502
No ratings yet
Unit-1 Aim 502
15 pages
Natural Language Processing Inside Pages 2
No ratings yet
Natural Language Processing Inside Pages 2
159 pages
10 Natural Language Processing
No ratings yet
10 Natural Language Processing
27 pages
Unit - 5 Natural Language Processing
No ratings yet
Unit - 5 Natural Language Processing
66 pages
Nlp-Unit-I Final
No ratings yet
Nlp-Unit-I Final
31 pages
Natural Language Processing-Wiki
No ratings yet
Natural Language Processing-Wiki
237 pages
NLP Unit 5
No ratings yet
NLP Unit 5
10 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
211 pages
Dependency Parsing Explained
No ratings yet
Dependency Parsing Explained
38 pages
Deep Learning: Hoàng Huy Minh Hoàng Thảo Lan Chi Phạm Huy Thiên Phúc Trương Huỳnh Đăng Khoa
No ratings yet
Deep Learning: Hoàng Huy Minh Hoàng Thảo Lan Chi Phạm Huy Thiên Phúc Trương Huỳnh Đăng Khoa
25 pages
Natural Language Toolkit NLTK PDF
No ratings yet
Natural Language Toolkit NLTK PDF
23 pages
Generative Models For Ambiguity Resolution
No ratings yet
Generative Models For Ambiguity Resolution
8 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
18 pages
NLP Lab Guide for Students
No ratings yet
NLP Lab Guide for Students
103 pages
IJISRT18DC138
No ratings yet
IJISRT18DC138
6 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
NLP Techniques for Word Prediction
No ratings yet
NLP Techniques for Word Prediction
77 pages
Ngram
No ratings yet
Ngram
41 pages
Methotrexate-Induced Tissue Toxicity and Cancer Herbal Management: Anti Oxidative and Protective Roles of Extract of Lannae Egregia Leaf
No ratings yet
Methotrexate-Induced Tissue Toxicity and Cancer Herbal Management: Anti Oxidative and Protective Roles of Extract of Lannae Egregia Leaf
21 pages
Electronic Watch Dog Project
No ratings yet
Electronic Watch Dog Project
3 pages
Hybrid Nano-Bio Platforms: Synergistic Strategies Combining Probiotic Metabolites and Nanocarriers Against Multidrug-Resistant Pathogens
No ratings yet
Hybrid Nano-Bio Platforms: Synergistic Strategies Combining Probiotic Metabolites and Nanocarriers Against Multidrug-Resistant Pathogens
8 pages
Explainable Deep Learning Framework for Multi-Class Brain Tumor Classification Using VGG16 and Grad-CAM Visualization
No ratings yet
Explainable Deep Learning Framework for Multi-Class Brain Tumor Classification Using VGG16 and Grad-CAM Visualization
12 pages
The Research Landscape of Financial Inclusion and Women's Empowerment: A Bibliometric Analysis Using Scopus Data
No ratings yet
The Research Landscape of Financial Inclusion and Women's Empowerment: A Bibliometric Analysis Using Scopus Data
13 pages
Solar Tracking System with IoT
No ratings yet
Solar Tracking System with IoT
11 pages
Positive Psychology at Work: A Win-Win for the Worker and the Organization
100% (1)
Positive Psychology at Work: A Win-Win for the Worker and the Organization
9 pages
Strategies in Mitigating the Constraints and Obstacles of Qatari Women's Entrepreneurship in the Global Market
No ratings yet
Strategies in Mitigating the Constraints and Obstacles of Qatari Women's Entrepreneurship in the Global Market
13 pages
Conceptual Framework for Resilience and Support: A Job Demands–Resources Model for Healthcare Workers in Pandemics
No ratings yet
Conceptual Framework for Resilience and Support: A Job Demands–Resources Model for Healthcare Workers in Pandemics
8 pages
Secure Passwordless Authentication Using Blockchain: A Cryptographic Challenge-Response Approach
No ratings yet
Secure Passwordless Authentication Using Blockchain: A Cryptographic Challenge-Response Approach
7 pages
Role of Supervised Exercise Rehabilitation in Enhancing Functional Capacity and Quality of Life in a Patient with Takayasu Arteritis: A Case Report
No ratings yet
Role of Supervised Exercise Rehabilitation in Enhancing Functional Capacity and Quality of Life in a Patient with Takayasu Arteritis: A Case Report
6 pages
Flotation Tests of Poor Copper and Cobalt Tailings from Dike 3 of Kipushi
No ratings yet
Flotation Tests of Poor Copper and Cobalt Tailings from Dike 3 of Kipushi
7 pages
Fiscal Stress in India an Investigation into Budget Deficit and Reformoriented Solutions
No ratings yet
Fiscal Stress in India an Investigation into Budget Deficit and Reformoriented Solutions
7 pages
Navigating Financial Stress: Impact on Job Performance and Well-Being of PNP Personnel at Bataan Police Provincial Office
No ratings yet
Navigating Financial Stress: Impact on Job Performance and Well-Being of PNP Personnel at Bataan Police Provincial Office
10 pages
The Use of Nanoparticle as Absorbent in the Remediation of the Physico-Chemical Properties of Crude Oil and Waste Water Contaminated Environment
No ratings yet
The Use of Nanoparticle as Absorbent in the Remediation of the Physico-Chemical Properties of Crude Oil and Waste Water Contaminated Environment
9 pages
A Hybrid Petri Net–AI Architecture for Adaptive and Explainable Cybersecurity in Business Workflows
No ratings yet
A Hybrid Petri Net–AI Architecture for Adaptive and Explainable Cybersecurity in Business Workflows
15 pages
Energy-Efficient and Sustainable Networking Solutions in AI Enabled Sugar Industry: A Theoretical Study
No ratings yet
Energy-Efficient and Sustainable Networking Solutions in AI Enabled Sugar Industry: A Theoretical Study
11 pages
Molecular and Genomic Mechanisms Underlying the Pathophysiology of Atrial Fibrillation
No ratings yet
Molecular and Genomic Mechanisms Underlying the Pathophysiology of Atrial Fibrillation
6 pages
Deep Reinforcement Learning Applications in Dynamic Portfolio Optimization
No ratings yet
Deep Reinforcement Learning Applications in Dynamic Portfolio Optimization
10 pages
Building the Future of Agriculture: Strategic Theories for Resilience and Ecological Transition
No ratings yet
Building the Future of Agriculture: Strategic Theories for Resilience and Ecological Transition
8 pages
Building Custom Solutions and Integrations into Salesforce Marketing Cloud for E-Commerce
No ratings yet
Building Custom Solutions and Integrations into Salesforce Marketing Cloud for E-Commerce
6 pages
Chemo-Metric Optimization for Robust and Green Chromatographic Methods
No ratings yet
Chemo-Metric Optimization for Robust and Green Chromatographic Methods
7 pages
Identifying Best Practices and Current Strengths of Community Extension Services in Ifugao State University
No ratings yet
Identifying Best Practices and Current Strengths of Community Extension Services in Ifugao State University
7 pages
Prediction and Monitoring of Solar Radiation Using Artificial Neural Networks for Renewable Energy Optimization
No ratings yet
Prediction and Monitoring of Solar Radiation Using Artificial Neural Networks for Renewable Energy Optimization
15 pages
A Market in African Context
No ratings yet
A Market in African Context
6 pages
Survey Paper on LoRa-Enabled Marine Communication Platform for Real-Time Ship-to- Ship Data Transfer
No ratings yet
Survey Paper on LoRa-Enabled Marine Communication Platform for Real-Time Ship-to- Ship Data Transfer
7 pages
The Future Revolution in Cancer Care: A Paradigm Shift in Diagnosis and Therapy
No ratings yet
The Future Revolution in Cancer Care: A Paradigm Shift in Diagnosis and Therapy
4 pages
Improvement of Extraction Efficiency at Kamoto Copper Company's High Grade Train 1 KCC SA
No ratings yet
Improvement of Extraction Efficiency at Kamoto Copper Company's High Grade Train 1 KCC SA
12 pages
Prevalence of Asymptomatic Malaria Infection in Malaria-Endemic Areas of Southeast Asia: A Scoping Review
No ratings yet
Prevalence of Asymptomatic Malaria Infection in Malaria-Endemic Areas of Southeast Asia: A Scoping Review
7 pages
The Relationship of Maternal and Environmental Characteristics to the Occurrence of Acute Respiratory Infections (ARIs) in Children Under Five Years of Age in Searema Hamlet, Babulo Village, Same Administrative Post, Manufahi Municipality, Year 2024
No ratings yet
The Relationship of Maternal and Environmental Characteristics to the Occurrence of Acute Respiratory Infections (ARIs) in Children Under Five Years of Age in Searema Hamlet, Babulo Village, Same Administrative Post, Manufahi Municipality, Year 2024
11 pages
Decision Analysis for Students
No ratings yet
Decision Analysis for Students
5 pages
Pumps and Turbines
No ratings yet
Pumps and Turbines
14 pages
Primary Three - Set Four
No ratings yet
Primary Three - Set Four
5 pages
Dasigo, Timothy Luis D. Nones, John Lloyd S. Engr. Meinrado V. Samonte Proposed Three - Storey Commercial Building With Roof Deck S 2 A
No ratings yet
Dasigo, Timothy Luis D. Nones, John Lloyd S. Engr. Meinrado V. Samonte Proposed Three - Storey Commercial Building With Roof Deck S 2 A
5 pages
Rhymes "Take Care of The Planet"
No ratings yet
Rhymes "Take Care of The Planet"
2 pages
Educational Philosophy Survey
86% (7)
Educational Philosophy Survey
5 pages
Design History Challenges & Opportunities
No ratings yet
Design History Challenges & Opportunities
26 pages
WPI Step Up Card K
No ratings yet
WPI Step Up Card K
3 pages
TM 5-5420-279-10 Dry Support Bridge Launcher M1975
No ratings yet
TM 5-5420-279-10 Dry Support Bridge Launcher M1975
966 pages
Timetable 3&4year
No ratings yet
Timetable 3&4year
2 pages
First Year SCI Lab - Manual 2024
No ratings yet
First Year SCI Lab - Manual 2024
69 pages
Simple Ormus
100% (4)
Simple Ormus
22 pages
International NLP Practitioner Training and Certification Guide
No ratings yet
International NLP Practitioner Training and Certification Guide
13 pages
Performance. Optimizing: Maximizing Efficiency
No ratings yet
Performance. Optimizing: Maximizing Efficiency
12 pages
Case 22 Robin Hood Case Study Analysis Questions - Group 4
100% (1)
Case 22 Robin Hood Case Study Analysis Questions - Group 4
2 pages
Tricky Tower New 24000
No ratings yet
Tricky Tower New 24000
14 pages
Editorial Writing Final
No ratings yet
Editorial Writing Final
81 pages
Contemporary Approaches Applied1
No ratings yet
Contemporary Approaches Applied1
2 pages
Charger DC2DC Charger iTECHDCDC40 - 2021
No ratings yet
Charger DC2DC Charger iTECHDCDC40 - 2021
13 pages
Central Orbit. Notes
67% (3)
Central Orbit. Notes
40 pages
Impact of Visual Merchandising in Sale
No ratings yet
Impact of Visual Merchandising in Sale
39 pages
Lathe Manual
No ratings yet
Lathe Manual
44 pages
Robot Arm Tutorial
100% (4)
Robot Arm Tutorial
16 pages
HDD Password Unlock SUCCESS With ZU - Exe (Zong Unlock) - My Digital Life Forums
No ratings yet
HDD Password Unlock SUCCESS With ZU - Exe (Zong Unlock) - My Digital Life Forums
4 pages
C1.9 PLUS Ultrasound System: Service Manual
No ratings yet
C1.9 PLUS Ultrasound System: Service Manual
98 pages
Analytical Essay Writing Guide
100% (2)
Analytical Essay Writing Guide
3 pages
Tle 7 Farm Tools Activity
100% (3)
Tle 7 Farm Tools Activity
6 pages
Kota Thermal Power Station Report
No ratings yet
Kota Thermal Power Station Report
24 pages
Learners Performance Hums Spencer 12
No ratings yet
Learners Performance Hums Spencer 12
21 pages
34-Samss-718 (12-02-2015)
100% (1)
34-Samss-718 (12-02-2015)
14 pages

Analysis of Statistical Parsing in Natural Language Processing

Uploaded by

Analysis of Statistical Parsing in Natural Language Processing

Uploaded by

Volume 3, Issue 12, December – 2018 International Journal of Innovative Science and Research Technology

Analysis of Statistical Parsing in Natural

P(the/<s>) = 0.67 P(Arabian/the) = 0.4 P(knights C. Good-Turing Smoothing

IJISRT18DC138 www.ijisrt.com 109

Fig 1:- Part-of-speech example

II. PART-OF-SPEECH TAGGING B. Rule-based Tagger

IJISRT18DC138 www.ijisrt.com 110

B. Context-based Word Sense Disambiguation Approaches V. UNSUPERVISED METHODS OF WSD

IJISRT18DC138 www.ijisrt.com 111

IJISRT18DC138 www.ijisrt.com 112

IJISRT18DC138 www.ijisrt.com 113

IJISRT18DC138 www.ijisrt.com 114

You might also like