0% found this document useful (0 votes)
222 views

Analysis of Statistical Parsing in Natural Language Processing

- A statistical language model is a probability distribution P(s] over all possible word sequences (or any other linguistic unit like words, sentences, paragraphs, documents, or spoken utterances). A number of statistical language models have been proposed in literature. The dominant approach in statistical language modeling is the n-gram model.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
222 views

Analysis of Statistical Parsing in Natural Language Processing

- A statistical language model is a probability distribution P(s] over all possible word sequences (or any other linguistic unit like words, sentences, paragraphs, documents, or spoken utterances). A number of statistical language models have been proposed in literature. The dominant approach in statistical language modeling is the n-gram model.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Volume 3, Issue 12, December – 2018 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Analysis of Statistical Parsing in Natural


Language Processing
Krishna Karoo Dr Girish Katkar
Research Scholar, Department of Electronics & Head of Department, Department of Computer Science
Computer Science, R.T.M. Nagpur University, Nagpur Arts, Commerce & Science College Koradi Dist:- Nagpur

Abstract:- A statistical language model is a probability Test sentence (s): The Arabian knights are the fairy tales
distribution P(s] over all possible word sequences (or of the east.
any other linguistic unit like words, sentences,
paragraphs, documents, or spoken utterances). A P(The/<s>) x P(Arabian/the) x P(Knights/Arabian) x
number of statistical language models have been P(are/knights)
proposed in literature. The dominant approach in x P(the/are) x P(fairy/the) x P(tales/fairy) x P(of/tales) x
statistical language modeling is the n-gram model. P(the/of)
x P(east/the)
I. INTRODUCTION = 0.67 x 0.4 x 1.0 x 1.0 x 0.5 x 0.2 x 1.0 x 1.0 x 1.0 x
0.2
A. n-gram Model =0.0268
As discussed earlier, the goal of a statistical
language model is to estimate the probability (likelihood) As each probability is necessarily less than 1,
of a sentence. This is achieved by decomposing sentence multiplying the probabilities might cause a numerical
probability into a product of conditional probabilities underflow, particularly in long sentences. To avoid this,
using the chain rule as follows: calculations are made in log space, where a calculation
corresponds to adding log of individual probabilities and
P(s)= P(w1, w2, w3,……….,wn) taking antilog of the sum.
P(s)= P(w1) P(w2/w1) P(w3/w1 w2)……
P(wn/w1 w2 ……wn-1) B. Add-one Smoothing
This is the simplest smoothing technique. It adds a
value of one to each n-gram frequency before
= Language Processing and normalizing them into probabilities. In general, add-one
Information Retrieval smoothing is not considered a good smoothing technique.
It assigns the same probability to all missing n-grams,
Where h, is history of word w, defined as W 1 W2... even though some of them could be more intuitively
W i_ 1 appealing than others. Gale and Church (1994) reported
that variance of the counts produced by the add-one
Example-1 smoothing is worse than the unsmoothed MLE method.
Another problem with this technique is that it shifts too
 Training set much of the probability mass towards the unseen n-grams
The Arabian Knights (n-grams with 0 probabilities) as there number is usually
These are the fairy tales of the east quite large, Good-Turing smoothing (Good 1953
The stories of the Arabian knights are translated in attempts to improve the situation by looking at the
many languages number of n-grams with a high frequency in order to
estimate the probability mass that needs to be assigned to
 Bi-gram model missing or low-frequency n-grams.

P(the/<s>) = 0.67 P(Arabian/the) = 0.4 P(knights C. Good-Turing Smoothing


/Arabian) = 1.0 Good-Turing smoothing (Good 1953) adjusts the
P(are/these) = 1.0 P(the/are) = 0.5 P(fairy/the) = frequency faf am re-gram using the count of re-grams
0.2 having a frequency of occurrence /+!-It converts the
P(tales/fairy) = 1.0 P(of/tales) = 1.0 P(the/of) = frequency of an re-gram from ftof* using the following
1.0 expression:
P(east/the) = 0.2 P(stories/the) = 0.2 P(of/stories) nf+I /•=(/+!)-£- nf
- 1.0
P(are/knights) = 1.0 P(translated/are) = 0.5 P(in where ni is the number of re-grams that occur
/translated) = 1.0 exactly / times in the training corpus. As an example,
^(many/in) = 1.0 consider that the number of re-grams that occur 4 times is
P(languages/many) = 1.0 25,108 and the number of re-grams that occur 5 times is
20,542. Then, the smoothed count for 4 will be H*™

IJISRT18DC138 www.ijisrt.com 109


Volume 3, Issue 12, December – 2018 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
D. Caching Technique rarely in earlier sections. The basic re-gram model ignores
Another improvement over basic re-gram model is this sort of variation of re-gram frequency. The cache
caching. The frequency of re-gram is not uniform across model combines the most recent re-gram frequency with
the text segments or corpus. Certain words occur more the standard re-gram model to improve its performance
frequently in certain segments (or documents) and locally. The underlying assumption here is that the
rarely in others. For example, in this section, the recently discovered words are more likely to be
frequency of the word 're-gram' is high, whereas it occurs repeated.

Fig 1:- Part-of-speech example

II. PART-OF-SPEECH TAGGING B. Rule-based Tagger


Most rule-based taggers have a two-stage
Part-of-speech tagging is the process of assigning a architecture. The first stage is simply a dictionary look-
part-of-speech (such as a noun, verb, pronoun, up procedure, which returns a set of potential tags
preposition, adverb, and adjective), to each word in a (parts-of-speech) and appropriate syntactic features for
sentence. The input to a tagging algorithm is the each word. The second stage uses a set of hand-coded
sequence of words of a natural language sentence and rules to discard contextually illegitimate tags to get a
specified tag sets (a finite list of part-of-speech tags). The single part-of-speech for each word. For example, consider
output is a single best part-of-speech tag for each word. the noun-verb ambiguity in the following sentence: The
Many words may belong to more than one lexical show must go on.
category. For example, the English word 'book' can be a
noun as in '/ am reading a good book1 or a verb as in C. Stochastic Tagger
'Thepolice booked the snatcher\ The same is true for other The standard stochastic tagger algorithm is the
languages. For example, the Hindi word 'soan’ may HMM tagger. A Markov model applies the simplifying
mean 'gold' (noun) or 'sleep' (verb). However, only one assumption that the probability of a chain of symbols can
of the possible meanings is used at a time. In tagging, we be approximated in terms of its parts or w-grams. The
try to determine the correct lexical category of a word in simplest n-gram model is the unigram model, which
its context. No tagger is efficient enough to identify the assigns the most likely tag (part-of-speech) to each
correct lexical category of each word in a sentence in token.
every case. The tag assigned by a tagger is the most likely
for a particular use of word in a sentence. D. Hybrid Taggers
Hybrid approaches to tagging combine the
A. Hybrid taggers features of both the rule-based and stochastic
Hybrid taggers combine features of both these approaches. They use rules to assign tags to words. Like
approaches. Like rule-based systems, they use rules to the stochastic taggers, this is a machine learning technique
specify tags. like stochastic systems, they use machine- and rules are automatically induced from the data.
learning to induce rules from a tagged training corpus Transformation-based learning (TBL) of tags, also
automatically. The transformation-based tagger or Brill known as Brill tagging, is an example of hybrid
tagger is an example of the hybrid approach. approach. TBL is a machine learning method introduced
by E. Brill (in 1995). Transformation-based error-driven

IJISRT18DC138 www.ijisrt.com 110


Volume 3, Issue 12, December – 2018 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
learning has been applied to a number of natural language As it is difficult to collect statistics for this equation,
problems, including part-of-speech tagging, speech we apply the Bayesian formula to compute it.
generation, and syntactic parsing (Brill 1993, 1994,
Huang et al. 1994). A. Bootstrapping
The Bayes classifier attempts to combine evidence
III. WORD SENSE DISAMBIGUATION from all words in the context window to help
disambiguation. This requires a large sense tagged
Having discussed various types of ambiguities we now training set to collect evidences. Hearst (1991) proposed
focus on identifying the correct sense of words in a the bootstrapping approach to eliminate the need for a
particular use. The first attempt at automatic sense large training set. The bootstrapping method relies on a
disambiguation was made in the context of machine relatively small number of instances labeled with senses
translation. The famous Memorandum, Weaver (1949) having a high degree of confidence. This could be
discusses the need for word sense disambiguation (WSD) accomplished by manually tagging those instances of an
in machine translation, and outlines an approach to ambiguous word for which the sense is clear (Hearst
WSD, which underlies all subsequent work on the topic. 1991). These labeled instances are used as seeds to train
A. Selectional Restriction-based Word Sense an initial classifier. The classifier is then used to extract
Disambiguation more training instances from the remaining untagged
Selectional restrictions or preferences can be used in corpus. As the process is repeated, the training corpus
parsing to eliminate flawed meaning representations. This grows and the numbers of untagged instances are
can be viewed as a form of indirect word sense reduced. The iteration continues until the remaining
disambiguation. We now explore this idea. Consider the untagged corpus is empty or no new instance can be
following sentences: annotated.
The institute will employ new employees, ('to hire') (a)
The committee employed her proposal, ('to accept') B. Bilingual
(b) Corpora
A bilingual corpus consists of two corpora, one of
One can intuitively differentiate the senses of employ which is a translation of the other. As different senses of
in sentences (a) and (b) with the complements of each an ambiguous word often translate differently in another
employ. To be more precise, employ in (a) restricts its language, a bilingual corpus can be used for
subject and object nouns to those associated with the disambiguating word senses. For example, the Hindi
semantic features human/organization and human, word cp<*i<H is translated as pen in the writing sense and graft
respectively. On the other hand, employ in (b) restricts its in the transplant sense. Gale et al. (1992b, 1993) use the
subject and object nouns to those associated with the bilingual Hansard corpus to avoid manual sense tagging
semantic features human/organization and idea, of a corpus. The Handsard corpus consists of
respectively. Consequently, given employees as the transcriptions in French and English of the proceedings of
object, the sense to hire is selected as the interpretation of the Canadian parliament. They first automatically
employ in (a), and the sense to accept is ruled out. The aligned the bilingual corpus and then tagged the words
same reasoning can be used to select the sense to accept of the aligned corpus using the basic assumptions that
as the interpretation of employ in (b). translations of a word reflect the senses of that word.

B. Context-based Word Sense Disambiguation Approaches V. UNSUPERVISED METHODS OF WSD


Approaches to stand-alone WSD that make use of
context of ambiguous word basically fall into one of the Unsupervised methods of WSD eliminate the
following two general categories: need for sense tagged training data. Instead, these
• Knowledge-based approaches take feature-value representations of
• Corpus-based unlabelled contexts (instances) and group them into
clusters. Each cluster can be assumed to represent one
IV. BAYESIAN CLASSIFICATION sense of an ambiguous word. These clusters can be
represented as the average of their constituent feature
The specific algorithm we describe here was vectors. Unknown instances are classified as having the
introduced by Gale (1992). The classifier assumes that sense of the cluster to which they are closest according to
we have a corpus in which each occurrence of an the similarity measure. Strictly speaking, using a
ambiguous word is labelled with its correct sense. The completely Unsupervised sense disambiguation task, we
words around the ambiguous word are used to define a can only discriminate word senses. That is, we can group
context window. The classifier treats the context of word together instances of a word used in different senses
w as a bag of words without structure. No feature without knowing what those senses are. However,
selection is done. All the words occurring in the context Yarowsky (1995) proposed an Unsupervised algorithm that
window contribute in deciding which sense of the can accurately disambiguate word senses in a large
ambiguous word is likely to be used with it. What we" completely untagged corpus. He exploited two powerful
want to find is the most likely sense sf for an input properties of human language in an iterative bootstrapping
context c of an ambiguous word w. This is obtained as setup to avoid the need of manually tagged training data
$' = arg max P(sk/c]h (adapted from Yarowsky 1995):

IJISRT18DC138 www.ijisrt.com 111


Volume 3, Issue 12, December – 2018 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 One sense per discourse: The sense of a target word is disambiguation. All this information can be used
highly consistent within any given document or together with general knowledge about the situation to rule
discourse. out impossible readings.
 One sense per collocation: Nearby words provide strong
and consistent clues to the sense of a target word, B. Applications of WSD
conditional on relative distance, order and syntactic Word sense disambiguation (WSD) is only an
relationship. intermediate task in NLP, like POS tagging or parsing.
Accurate WSD is important for many applications, e.g.,
A. Knowledge Sources in WSD machine translation and information retrieval.
A variety of information, including syntactic (part-of-
speech, grammatical structure), semantic (selection One of the first applications of WSD was machine
restriction) and pragmatic (topics) information as well translation, for which, disambiguating the sense of a
as dictionary (definitions), and corpus (collocation) source language word is crucial for accurately selecting
specific information, can be utilized as a knowledge its translation equivalent in the target language. The Hindi
source in WSD. Here is a list of some of the information word ^PcJT, for example, can either have the sense of the
sources deemed useful in disambiguation. English word fruit, or the sense of Mfeuurft (result). In order
to correctly translate a text containing cpaf, we need to
 Context of a word know which sense is intended.
The context of a word can be regarded as the
words surrounding the ambiguous word. A word C. WSD Evaluation
only can be disambiguated in its context. The context is Evaluation is important in all NLP tasks. It has
therefore useful in determining the meaning of a word in a always been a problem in disambiguation research, as the
particular usage. only way to judge the performance of a disambiguator is
to manually check its output. Manual checking is time
 Frequency of a sense consuming and because of this, most disambiguators have
This information is generally used in statistical been evaluated only on a small number of words. The
approaches to measure the likelihood of each possible SENSEVAL initiatives have simplified the evaluation
sense. Usually, this statistics is gathered over some sense- task. The basic metric used for evaluating word sense
tagged corpus. disambiguation algorithm is precision and recall. Precision
measures the fraction of correctly tagged instances in the
 Part-of-speech total set. This requires access to an annotated corpus.
Part-of-speech information can reduce the number Two such corpuses are now available: the SEMCOR
of possible senses a word can have. For example, in (Landes et al. 1998) corpus and SENSEVAL (Kilgariff
WordNet 2.0 bitter has 3 senses as noun, 7 senses as and Rosenzweig 2000) corpus. These metrics fail to give
adjective, and one sense as a verb. The use of bitter as a any credit to an algorithm that makes only broad
verb does not lead to ambiguity. distinctions between senses, as they consider sense
match to be exact. Some metrics have been proposed to
 Collocations give partial credit to instances where a broader sense is
These may provide useful information about the selected.
sense of a word. For instance, the noun match has 9
senses listed in WordNet but only one of these applies to VI. CONCLUSION
football match.
Semantic analysis is concerned with meaning
 Selectional preferences representation of linguistic inputs. A meaning
Semantic restrictions that predicates place on their representation bridges the gap between linguistic and
argument can be used for disambiguation. For instance, commonsense knowledge. A meaning representation
eat in the have a meal sense prefers humans as subjects. language must be verifiable and unambiguous. It should
This knowledge is similar to the argument-head relation, support the use of variable and inferencing and must be
but selectional preferences are given in terms of semantic expressive enough to handle the wide variety of content
classes, instead of plain words. found in natural language. Syntax driven semantic
analysis uses the syntactic constituents of a sentence to
 Domain build its meaning representation. Semantic grammar
In a particular domain, only one sense of a word is provides an alternative way for creating meaning
likely to be used. Thus, information about domain representation. Word sense disambiguation is concerned
furnishes useful information for disambiguation. For with identifying the correct sense of a word. The
example, in the domain of sports, the cricket bat sense knowledge sources used by word sense disambiguation
of bat is preferred. algorithms include context of word, sense frequency,
selectional preferences, collocation and domain.
Besides these, thematic role of a word (subject or
object), sentence structure, semantic word properties, and
pragmatic information may also be utilized in sense

IJISRT18DC138 www.ijisrt.com 112


Volume 3, Issue 12, December – 2018 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
REFERENCES [13]. Monz, C. and de Rijke, M. 2001. Light-Weight
Entailment Checking for Computational Semantics.
[1]. Bourns, G. 1987. A Unification-based Analysis of The third workshop on inference in computational
Unbounded Dependencies in Categorial Grammar, in semantics (ICoS-3)
J.Groenendijk, M. Stokhof, & F. Veltman (eds.) [14]. Punyakanok., V.,Roth, D. and Yih, W., 2004
Proceedings of the sixth Amsterdam Colloquium, Mapping Dependencies Trees: An Application to
University of Amsterdam, Amsterdam, 1-19. Question Answering Proceedings of AI & Math 2004
[2]. Bourns, G., 1988, Modifiers and Specifiers in Ratnaparkhi, A. 1996 A Maximum Entropy Part-Of-
Categorial Unification Grammar, Linguistics, vol 26, Speech Tagger. In proceeding of the Empirical
21-46.Bourns, G., E. KSnig, & H. Uszkoreit, 1988. A Methods in Natural Language Processing
Flexible Graph-Unification Formalism and its Conference, May 17-18, 1996
Application to Natural Language Processing, IBM [15]. Szpektor I., Tanev H., Dagan I., and Coppola B. 2004
Journal of Research and Development, 32, 170-184 Scaling Web-based Acquisition of Entailment
[3]. Calder, J., E. Klein, & H. Zeevat 1988. Unification Relations In Proceedings of EMNLP-04 - Empirical
Categoriai Grammar: a concise, extendable grammar Methods in Natural Language Processing,
for natural language processing. Proceedings of Barcelona, July 2004
Coling 1988, Hungarian Academy of Sciences, [16]. K. Zhang K., Shasha D. 1990 Fast algorithm for the
Budapest, 83-86. unit cost editing distance between trees. Journal of
[4]. 4. Haas, A. 1989. A Parsing Algorithm for algorithms, vol. 11, p. 1245-1262, December 1990.
Unification Grammar. Computational Linguistics 15- [17]. Asher, Nicholas. 1993. Reference to Abstract Objects
4, 219-232. in Discourse. Kluwer Academic Publishers,
[5]. Karttunen, L. 1989. Radical Lexicalism. In M. Baltin Dordrecht.Ballard, D. Lee, Robert Conrad, and
& A. Kroch (eds.), Alternative Conceptions of Phrase Robert E. Longacre. 1971. The deep and surface
Structure, Chicago University Press, Chicago, 43-66. grammar of interclausal relations. Foundations of
[6]. Matsumoto, Y., H. Tanaka, H. Hirakawa, II. language, 4:70-118.
Miyoshi,& H. Yasukawa, 1983, BUP : A Bottom-Up [18]. Cahn, Janet. 1992. An investigation into the
Parser embedded in Prolog. New Generation correlation of cue phrases, unfilled pauses and the
Computing, vol 1,145-158. structuring of spoken discourse. In Proceedings of
[7]. Pereira, F., & S. Shieber (1986). Proiog and Natural the IRCS Workshop on Prosody in Natural Speech,
Language Analysis. CSLI Lecture Notes 10, pages 19-30.
University of Chicago Press, Chicago.Pollard, C. • I. [19]. Cohen, Robin. 1987. Analyzing the structure of
Sag, 1987, Information-Based Syntax and Semantics, argumentative discourse. Computational Linguistics,
vol 1 : Fundamentals, CSLI Lecture Notes 13, 13 (1-2): 11-24, January-June.Costermans, Jean and
University of Chicago Press, Michel Fayol. 1997. Processing lnterclausal
[8]. Chicago.Shieber, S. 1985. Using Restriction to Relationships. Studies in the Production and
Extend Parsing Algorithms for Complex-Feature- Comprehension of Text. Lawrence Erlbaum
Based Algorithms.Proceedings of the g2nd Annual Associates,Publishers.Cumming, Carmen and
Meeting of the Association for Computational Catherine McKercher. 1994.The Canadian Reporter:
Linguistics, University of Chicago, Chicago, 145- News writing and reporting.
152. [20]. Delin, Judy L. and Jon Oberlander. 1992.
[9]. Uszkoreit, H. 1986. Categorial Unification Aspectswitching and subordination: the role of/t-
Grammars.Proceedings of COLING 1985. Institute clefts in discourse.In Proceedings of the Fourteenth
fiir angewandte Kommunikations- und International Conference on Computational
Sprachforschung, Bonn, 187-194. Linguistics (COLING-92), pages 281-287, Nantes,
[10]. Zeevat, H., E. Klein, & J. Calder, 1987. An France, August 23-28.Fraser, Bruce. 1996. Pragmatic
Introduction to Unification Categorial Grammar. In markers. Pragmatics,6(2): 167-190.
N. Haddock,E. Klein, & G. Morill (eds.), Categorial [21]. Grosz, Barbara J., Aravind K. Joshi, and Scott
Grammar,Unification grammar, and Parsing, Weinstein. 1995. Centering: A framework for
Edinburgh Working Papers in Cognitive Science, modeling the local coherence of discourse.
Vol. 1.Zwicky, A. 1986. German Adjective Computational Linguistics,21 (2):203-226, June.
Agreement in GPSG. Linguistics, vol 24,957-990. [22]. Grosz, Barbara J. and Candace L. Sidner. 1986.
[11]. Dagan, I., Glickman, O. 2004 Generic applied Attention,intentions, and the structure of discourse.
modeling of language variability In Proceedings of Computational Linguistics, 12(3): 175-204, July-
PASCAL Workshop on Learning Methods for Text September.Grover, Claire, Chris Brew, Suresh
Understanding and Mining Grenoble. Manandhar, and Marc Moens. 1994. Priority union
[12]. Lin, D. 1998. Dependency-based evaluation of and generalization in discourse grammars. In
MINIPAR. In Proceedings of the Workshop on Proceedings of the 32nd Annual Meeting of the
Evaluation of Parsing Systems at LREC-98. Granada, Association for Computational Linguistics (ACL-94),
Spain.Lin, D. and Pantel, P. 2001. Discovery of pages 17-24, Las Cruces, June 27-30.
inference rules for Question Answering. Natural
Language Engineering,7(4), pages 343-360.

IJISRT18DC138 www.ijisrt.com 113


Volume 3, Issue 12, December – 2018 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[23]. HaUiday, Michael A.K. and Ruqaiya Hasan. 1976. [35]. Prost, H., R. Scha, and M. van den Berg. 1994.
Cohesion in English. Longman. Harabagiu, Sanda M. Discourse grammar and verb phrase anaphora.
and Dan I. Moldovan. 1995. A marker-propagation Linguistics and Philosophy, 17(3):261-327, June.
algorithm for text coherence. In Working Notes of the [36]. Redeker, Gisela 1990. Ideational and pragmatic
Workshop on Parallel Processing in Artificial markers of discourse, structure. Journal
Intelligence, pages 76-86, Montreal,Canada, August. ofPragmatics, 14:367-381.
[24]. Hirschberg, Julia and Diane Litman. 1993. Empirical [37]. Sanders, Ted J.M., Wilbert P.M. Spooren, and Leo
studies on the disambiguation of cue phrases. G.M. Noordman. 1992. Toward a taxonomy of
Computational Linguistics, 19(3):501-530. coherence relations. Discourse Processes, 15:1-
[25]. Hobbs, Jerry R. 1990. Literature and Cognition. 35.Schiffrin, Deborah. 1987. Discourse Markers.
CSLI Lecture Notes Number 21. Kamp, Hand and Cambridge University Press.
Uwe Reyle. 1993. From Discourse to Logic:
Introduction to ModelTheoretic Semantics of Natural
Language, Formal Logic and Discourse
Representation Theory. Kluwer Academic
Publishers,London, Boston, Dordrecht. Studies in
Linguistics and Philosophy, Volume 42.
[26]. Kintsch, Walter. 1977. On comprehending stories. In
Marcel Just and Patricia Carpenter, editors, Cognitive
processes in comprehension. Erlbaum, Hillsdale,
New Jersey.
[27]. Knott, Alistair. 1995. A Data-Driven Methodology
for Motivating a Set of Coherence Relations. Ph.D.
thesis, University of Edinburgh.Lascarides, Alex and
Nicholas Asher. 1993. Temporal interpretation,
discourse relations, and common sense entailment.
Linguistics and Philosophy, 16(5):437-493.
[28]. Lascarides, Alex, Nicholas Asher, and Jon
Oberlander. 1992. Inferring discourse relations in
context. In Proceedings of the 30th Annual Meeting
of the Association for Computational Linguistics
(ACL-92), pages 1-8.Longacre, Robert E. 1983. The
Grammar of DiscoursePlenum Press, New York.
[29]. Mann, William C. and Sandra A. Thompson.
1988.Rhetorical structure theory: Toward a
functional theory of text organization. Text,
8(3):243-281.
[30]. Marcu, Daniel. 1996. Building up rhetorical structure
trees. In Proceedings of the Thirteenth National
Conference on Artificial intelligence (AAA1-96 ),
volume 2, pages 1069-1074, Portland, Oregon,
August 4-8,.
[31]. Marcu, Daniel. 1997. The rhetorical parsing,
summarization,and generation of natural language
texts.Ph.D. thesis, Department of Computer Science,
University of Toronto, Forthcoming.
[32]. Martin, James R. 1992. English Text. System and
Structure.John Benjamin Publishing Company,
Philadelphia/Amsterdam.
[33]. Moens, Marc and Mark Steedman. 1988. Temporal
ontology and temporal reference. Computational
Linguistics,14(2): 15-28.
[34]. Moser, Megan and Johanna D. Moore. 1997. On the
correlation of cues with discourse structure: Results
from a corpus study. Submitted for
publication.Polanyi, Livia. 1988. A formal model of
the structure of discourse. Journal of Pragmatics,
12:601-638.

IJISRT18DC138 www.ijisrt.com 114

You might also like