0% found this document useful (0 votes)
30 views

Module1_NLP_BAD613B_Notes

The Natural Language Processing (NLP) course BAD613B covers the fundamentals of NLP, including language modeling, word-level analysis, syntactic analysis, information retrieval, and machine translation. Students will learn to apply concepts such as Naive Bayes classifiers and sentiment analysis, and understand the challenges of processing natural language. The course aims to equip students with skills in NLP applications, grammar-based models, and ethical considerations in machine translation.

Uploaded by

sb4083070
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Module1_NLP_BAD613B_Notes

The Natural Language Processing (NLP) course BAD613B covers the fundamentals of NLP, including language modeling, word-level analysis, syntactic analysis, information retrieval, and machine translation. Students will learn to apply concepts such as Naive Bayes classifiers and sentiment analysis, and understand the challenges of processing natural language. The course aims to equip students with skills in NLP applications, grammar-based models, and ethical considerations in machine translation.

Uploaded by

sb4083070
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Natural Language Processing [BAD613B]

NATURAL LANGUAGE PROCESSING


Course Code: BAD613B Semester: VI
Teaching Hours/Week (L: T:P: S): 3:0:0:0 CIE Marks: 50
Total Hours of Pedagogy: 40 SEE Marks: 50
Credits: 03 Total Marks: 100
Examination type (SEE): Theory Exam Hours: 03

Course objectives:
• Learn the importance of natural language modelling.
• Understand the applications of natural language processing.
• Study spelling, error detection and correction methods and parsing techniques in NLP.
• Illustrate the information retrieval models in natural language processing.

Module-1
Introduction: What is Natural Language Processing? Origins of NLP, Language and
Knowledge, The Challenges of NLP, Language and Grammar, Processing Indian Languages,
NLP Applications.
Language Modeling: Statistical Language Model - N-gram model (unigram, bigram),
Paninion Framework, Karaka theory.
Textbook 1: Ch. 1, Ch. 2.
Module-2

Word Level Analysis: Regular Expressions, Finite-State Automata, Morphological Parsing,


Spelling Error Detection and Correction, Words and Word Classes, Part-of Speech Tagging.

Syntactic Analysis: Context-Free Grammar, Constituency, Top-down and Bottom-up Parsing,


CYK Parsing.
Textbook 1: Ch. 3, Ch. 4.
Module-3
Naive Bayes, Text Classification and Sentiment: Naive Bayes Classifiers, Training the
Naive Bayes Classifier, Worked Example, Optimizing for Sentiment Analysis, Naive Bayes
for Other Text Classification Tasks, Naive Bayes as a Language Model.
Textbook 2: Ch. 4.
Module-4
Information Retrieval: Design Features of Information Retrieval Systems, Information
Retrieval Models - Classical, Non-classical, Alternative Models of Information Retrieval -
Custer model, Fuzzy model, LSTM model, Major Issues in Information Retrieval.

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 1


Natural Language Processing [BAD613B]

Lexical Resources: WordNet, FrameNet, Stemmers, Parts-of-Speech Tagger, Research


Corpora.
Textbook 1: Ch. 9, Ch. 12.
Module-5
Machine Translation: Language Divergences and Typology, Machine Translation using
Encoder-Decoder, Details of the Encoder-Decoder Model, Translating in Low-Resource
Situations, MT Evaluation, Bias and Ethical Issues.
Textbook 2: Ch. 13.

Course outcome (Course Skill Set)


At the end of the course, the student will be able to:
1. Apply the fundamental concept of NLP, grammar-based language model and statistical-
based language model.
2. Explain morphological analysis and different parsing approaches.
3. Develop the Naïve Bayes classifier and sentiment analysis for Natural language problems
and text classifications.
4. Apply the concepts of information retrieval, lexical semantics, lexical dictionaries.
5. Identify the Machine Translation applications of NLP using Encode and Decoder.
Suggested Learning Resources:
Text Books:
1. Tanveer Siddiqui, U.S. Tiwary, “Natural Language Processing and Information
Retrieval”, Oxford University Press.
2. Daniel Jurafsky, James H. Martin, “Speech and Language Processing, An Introduction
to Natural Language Processing, Computational Linguistics, and Speech Recognition”,
Pearson Education, 2023.
Reference Books:
1. Akshay Kulkarni, Adarsha Shivananda, “Natural Language Processing Recipes -
Unlocking Text Data with Machine Learning and Deep Learning using Python”, Apress,
2019.
2. T V Geetha, “Understanding Natural Language Processing – Machine Learning and Deep
Learning Perspectives”, Pearson, 2024.
3. Gerald J. Kowalski and Mark. T. Maybury, “Information Storage and Retrieval systems”,
Kluwer Academic Publishers.
Web links and Video Lectures (e-Resources):
• https://www.youtube.com/watch?v=M7SWr5xObkA
• https://youtu.be/02QWRAhGc7g
• https://www.youtube.com/watch?v=CMrHM8a3hqw
• https://onlinecourses.nptel.ac.in/noc23_cs45/preview
• https://archive.nptel.ac.in/courses/106/106/106106211/

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 2


Natural Language Processing [BAD613B]

Module-1

Introduction & Language Modelling


• Introduction: What is Natural Language Processing? Origins of NLP, Language and
Knowledge, The Challenges of NLP, Language and Grammar, Processing Indian
Languages, NLP Applications.
• Language Modelling: Statistical Language Model - N-gram model (unigram, bigram),
Paninion Framework, Karaka theory.

Textbook 1: Tanveer Siddiqui, U.S. Tiwary, “Natural Language Processing and Information
Retrieval”, Oxford University Press. Ch. 1, Ch. 2.

1. INTRODUCTION

1.1 What is Natural Language Processing (NLP)


Language is the primary means of communication used by humans and tool to express
the greater part of our ideas and emotions. It shapes thought and has a structure, and carries
meaning. To express a thought, content helps represent the language in real-time.

NLP is concerned with development of computational models of aspects of human


language processing, there are two main reasons:

1. To develop automated tools for language processing.


2. To gain a better understanding of human communication.

Building computational models with human language-processing abilities requires a


knowledge of how humans acquire, store, and process language.

Historically, there have been two major approaches to NLP:

1. Rationalist approach
2. Empiricist approach

Rationalist Approach: Early approach, assumes the existence of some language faculty in
the human brain. Supporters of this approach argue that it is not possible to learn a complex thing
like natural language from limited sensory inputs.

Empiricist approach: Do not believe in existence of a language faculty. Believe in the existence
of some general organization principles such as pattern recognition, generalization, and association.

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 3


Natural Language Processing [BAD613B]

Learning of detailed structures takes place through the application of these principles on sensory
inputs available to the child.

1.2 Origins of NLP


The NLP includes speech processing and sometimes mistakenly termed natural language
understanding-originated from machine translation research. Natural language processing includes both
understanding (interpretation) and generation (production). We are concerned with text processing only
- The area of computational linguistics and its application.

Computational linguistics: is similar to theoretical- and psycho-linguistics, but uses different tools.
While theoretical linguistics is more about the structural rules of language, psycho-linguistics focuses on
how language is used and processed in the mind.
Theoretical linguistics explores the abstract rules and structures that govern language. It investigates
universal grammar, syntax, semantics, phonology, and morphology. Linguists create models to explain
how languages are structured and how meaning is encoded. Eg. Most languages have constructs like noun
and verb phrases. Theoretical linguists identify rules that describe and restrict the structure of languages
(grammar).
Psycho-linguistics focuses on the psychological and cognitive processes involved in language use. It
examines how individuals acquire, process, and produce language. Researchers study language
development in children and how the brain processes language in real-time. Eg. Studying how children
acquire language, such as learning to form questions ("What’s that?").

Computational Linguistics Models:


Computational linguistics is concerned with the study of language using computational models of
linguistic phenomena. It deals with the application of linguistic theories and computational techniques
for NLP. In computational linguistics, representing a language is a major problem; Most knowledge
representations tackle only a small part of knowledge. Representing the whole body of knowledge is
almost impossible.
Computational models may be broadly classified under knowledge-driven and data-driven categories.
Knowledge-driven systems rely on explicitly coded linguistic knowledge, often expressed as a set of
handcrafted grammar rules. Acquiring and encoding such knowledge is difficult and is the main
bottleneck in the development of such systems.
Data-driven approaches presume the existence of a large amount of data and usually employ some
machine learning technique to learn syntactic patterns. Performance of these systems is dependent on the
quantity of the data and usually adaptive to noisy data.
Main objective of the models is to achieve a balance between semantic (knowledge-driven) and
data-driven approaches on one hand, and between theory and practice on the other.

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 4


Natural Language Processing [BAD613B]

With the unprecedented amount of information now available on the web, NLP has become one
of the leading techniques for processing and retrieving information. NLP has become one of the leading
techniques for processing and retrieving information.
Information retrieval includes a number of information processing applications such as information
extraction, text summarization, question answering, and so forth. It includes multiple modes of
information, including speech, images, and text.

1.3 Language & Knowledge


Language is the medium of expression in which knowledge is deciphered. We are here considering
the text form of the language and the content of it as knowledge.
Language, being a medium of expression, is the outer form of the content it expresses. The same
content can be expressed in different languages.
Hence, to process a language means to process the content of it. As computers are not able to
understand natural language, methods are developed to map its content in a formal language.
The language and speech community considers a language as a set of sounds that, through
combinations, conveys meaning to a listener. However, we are concerned with representing and
processing text only. Language (text) processing has different levels, each involving different types of
knowledge.
1.3.1 lexical analysis
• Analysis of words.
• Word-level processing requires morphological knowledge, i.e., knowledge about the
structure and formation of words from basic units (morphemes).
• The rules for forming words from morphemes are language specific.

1.3.2 Syntactic analysis


• Considers a sequence of words as a unit, usually a sentence, and finds its structure.
• Decomposes a sentence into its constituents (or words) and identifies how they relate to each
other.
• It captures grammaticality or non-grammaticality of sentences by looking at constraints like
word order, number, and case agreement.
• This level of processing requires syntactic knowledge (How words are combined to form
larger units such as phrases and sentences)
• For example:
o 'I went to the market' is a valid sentence whereas 'went the I market to' is not.
o 'She is going to the market' is valid, but 'She are going to the market' is not.

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 5


Natural Language Processing [BAD613B]

1.3.3 Semantic analysis


• It is associated with the meaning of the language.
• Semantic analysis is concerned with creating meaningful representation of linguistic inputs.
• Eg. 'Colorless green ideas sleep furiously' - syntactically correct, but semantically anomalous.
• A word can have a number of possible meanings associated with it. But in a given context,
only one of these meanings participates.

Syntactic Semantic

• Finding out the correct meaning of a particular use of word is necessary to find meaning of larger
units.
• Eg. Kabir and Ayan are married.
Kabir and Suha are married.
• Syntactic structure and compositional semantics fail to explain these interpretations.
• This means that semantic analysis requires pragmatic knowledge besides semantic and syntactic
knowledge.
• Pragmatics helps us understand how meaning is influenced by context, social factors, and
speaker intentions.

1.3.4 Discourse Analysis


• Attempts to interpret the structure and meaning of even larger units, e.g., at the paragraph and
document level, in terms of words, phrases, clusters, and sentences.
• It requires the resolution of anaphoric references and identification of discourse structure.

Anamorphic Reference
• Pragmatic knowledge may be needed for resolving anaphoric references.
Example: The district administration refused to give the trade union
permission for the meeting because they feared violence. (a)

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 6


Natural Language Processing [BAD613B]

The district administration refused to give the trade union


permission for the meeting because they oppose government. (b)
• For example, in the above sentences, resolving the anaphoric reference 'they' requires pragmatic
knowledge.

1.3.5 Pragmatic analysis


• The highest level of processing, deals with the purposeful use of sentences in situations.
• It requires knowledge of the world, i.e., knowledge that extends beyond the contents of the text.

1.4 The Challenges of NLP


• Natural languages are highly ambiguous and vague, achieving precise representation of content
can be difficult.
• The inability to capture all the required knowledge.
• Identifying its semantics.
• A language keeps on evolving. New words are added continually and existing words are
introduced in new context. (eg. 9/11 - terrorist act on WTC)
Solution: The only way machines can learn is by considering its context, context of a
word is defined by co-occurring words.
• The frequency of a word being used in a particular sense also affects its meaning.
• Idioms, metaphor, and ellipses add more complexity to identify the meaning of the written text.
o Example: “The old man finally kicked the bucket” → "kicked the bucket" is a well-known
Idiom, meaning is to "to die."
o "Time is a thief." → Metaphor suggests “time robs you of valuable moments or
experiences in life”.
o "I’m going to the store, and you’re going to the party, right?"
"Yes, I am…"
Ellipses refer to the omission of words or phrases in a sentence. (represented by "…")
• The ambiguity of natural languages is another difficulty (explicit as well as implicit sources of
knowledge).
o Word Ambiguity: Example: 'Taj' - a monument, a brand of tea, or a hotel.
▪ “Can” – ambiguous in its part-of-speech. ('Part-of-speech tagging' algorithm)
▪ “Bank” is ambiguous in its meaning. ('word sense disambiguation' algorithm)
o Structural ambiguity - A sentence may be ambiguous
▪ 'Stolen rifle found by tree.'
▪ Verb sub-categorization may help to resolve
▪ Probabilistic parsing - statistical models to predict the most likely syntactic
structure.
• A number of grammars have been proposed to describe the structure of sentences.
o It is almost impossible for grammar to capture the structure of all and only meaningful
text.

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 7


Natural Language Processing [BAD613B]

1.5 Language and Grammar


• Language Grammar: Grammar defines language and consists of rules that allow parsing and
generation of sentences, serving as a foundation for natural language processing.
• Syntax vs. Semantics: Although syntax and semantics are closely related, a separation is made
in processing due to the complexity of world knowledge influencing both language structure and
meaning.
• Challenges in Language Specification: Natural languages constantly evolve, and the numerous
exceptions make language specification challenging for computers.
• Different Grammar Frameworks: Various grammar frameworks have been developed,
including transformational grammar, lexical functional grammar, and dependency grammar, each
focusing on different aspects of language such as derivation or relationships.
• Chomsky’s Contribution: Noam Chomsky’s generative grammar framework, which uses rules
to specify grammatically correct sentences, has been fundamental in the development of formal
grammar hierarchies.
Chomsky argued that phrase structure grammars are insufficient for natural language and proposed
transformational grammar in Syntactic Structures (1957). He suggested that each sentence has two levels:
a deep structure and a surface structure (as shown in Fig 1), with transformations mapping one to the
other.

Fig 1. Surface and Deep Structures of sentence


• Chomsky argued that an utterance is the surface representation of a 'deeper structure' representing
its meaning.
• The deep structure can be transformed in a number of ways to yield many different surface-level
representations.
• Sentences with different surface-level representations having the same meaning, share a common
deep-level representation.
Pooja plays veena.
Veena is played by Pooja.
Both sentences have the same meaning, despite having different surface structures (roles of subject and
object are inverted).

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 8


Natural Language Processing [BAD613B]

Transformational grammar has three components:


1. Phrase structure grammar: Defines the basic syntactic structure of sentences.
2. Transformational rules: Describe how deep structures can be transformed into different surface
structures.
3. Morphophonemic rules: Govern the relationship structure of a sentence (its syntax) influences
the form of the words in terms of sound and pronunciation (phonology).

Phrase structure grammar consists of rules that generate natural language sentences and assign a
structural description to them. As an example, consider the following set of rules:

Eg: “The police will catch the snatcher.”

S → NP + VP Det → the, a, an, ...


VP → V + NP Verb → catch, write, eat, ...
NP → Det + Noun Noun → police, snatcher, ...
V → Aux + Verb Aux → will, is, can, ...

Transformation rules, transform one phrase-maker (underlying) into another phrase-marker (derived).
These rules are applied on the terminal string generated by phrase structure rules. It transforms one surface
representation into another, e.g., an active sentence into passive one.
Consider the active sentence: “The police will catch the snatcher.”

Eg. [NP1 - Aux - V - NP2] → [NP2 - Aux + be + en - V - by + NP1]

The application of phrase structure rules will assign the structure shown in Fig 2 (a)

(a) Phrase structure (b) Passive Transformation

The passive transformation rules will convert the sentence into


The + culprit + will + be + en + catch + by + police

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 9


Natural Language Processing [BAD613B]

Morphophonemic Rule: Another transformational rule will then reorder 'en + catch' to 'catch + en' and
subsequently one of the morphophonemic rules will convert 'catch + en' to 'caught'.

Note: Long distance dependency refers to syntactic phenomena where a verb and its subject or object can
be arbitrarily apart. Wh-movement are a specific case of these types of dependencies.

E.g.

"I wonder who John gave the book to" involves a long-distance dependency between the verb "wonder"
and the object "who". Even though "who" is not directly adjacent to the verb, the syntactic relationship
between them is still clear.
The problem in the specification of appropriate phrase structure rules occurs because these phenomena
cannot be localized at the surface structure level.

1.6 Processing Indian Languages


There are a number of differences between Indian languages and English:
• Unlike English, Indic scripts have a non-linear structure.
• Unlike English, Indian languages have SOV (Subject-Object-Verb) as the default sentence
structure.
• Indian languages have a free word order, i.e., words can be moved freely within a sentence
without changing the meaning of the sentence.
• Spelling standardization is more subtle in Hindi than in English.
• Indian languages have a relatively rich set of morphological variants.
• Indian languages make extensive and productive use of complex predicates (CPs).
• Indian languages use post-position (Karakas) case markers instead of prepositions.
• Indian languages use verb complexes consisting of sequences of verbs,
o e.g., गा रहा है (ga raha hai-singing) and खेल रही है (khel rahi hai-playing).
o The auxiliary verbs in this sequence provide information about tense, aspect, modality,
etc

Paninian grammar provides a framework for Indian language models. These can be used for
computation of Indian languages. The grammar focuses on extraction of relations from a
sentence.

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 10


Natural Language Processing [BAD613B]

1.7 NLP Applications


1.7.1 Machine Translation
This refers to automatic translation of text from one human language to another. In order to carry out
this translation, it is necessary to have an understanding of words and phrases, grammars of the two
languages involved, semantics of the languages, and word knowledge.

1.7.2 Speech Recognition


This is the process of mapping acoustic speech signals to a set of words. The difficulties arise due to
wide variations in the pronunciation of words, homonym (e.g. dear and deer) and acoustic ambiguities
(e.g., in the rest and interest).

1.7.3 Speech Synthesis


Speech synthesis refers to automatic production of speech (utterance of natural language sentences). Such
systems can read out your mails on telephone, or even read out a storybook for you.

1.7.4 Information Retrieval


This focuses on identifying relevant documents for a user's query using NLP techniques in
information retrieval. Methods like indexing, word sense disambiguation, query modification, and
knowledge bases improve IR performance, including query expansion. Lexical resources like WordNet,
LDOCE, and Roget's Thesaurus enhance these systems. These tools help refine search results and
improve accuracy.

1.7.5 Information Extraction


An information extraction system captures and outputs factual information contained within a
document. Query is specified as pre-defined templates. System identifies a subset of information within
a document that fits the pre-defined template.

1.7.6 Question Answering


Given a question and a set of documents, a question answering system attempts to find the precise
answer, or at least the precise portion of text in which the answer appears. A question answering system
requires more NLP than an information retrieval system or an information extraction system. It requires
not only precise analysis of questions and portions of texts but also semantic as well as background
knowledge to answer certain type of questions.

1.7.7 Text Summarization


This deals with the creation of summaries of documents and involves syntactic, semantic, and
discourse level processing of text.

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 11


Natural Language Processing [BAD613B]

1.8 Some Successful Early NLP Systems

1.8.1 ELIZA (Weizenbaum 1966)


ELIZA is one of the earliest natural language understanding programs. It uses syntactic patterns to
mimic human conversation with the user. Here is a sample conversation.

Eliza: Hello. I am ELIZA. How may I help you?


User: I am feeling a little bit sleepy.
Eliza: How long have you been feeling a little bit sleepy?
User: For almost half an hour.
Eliza: Please go on.
1.8.2 SysTran (System Translation)

The first SysTran machine translation system was developed in 1969 for Russian-English translation.
SysTran also provided the first on-line machine translation service called Babel Fish, which is used by
AltaVista search engines for handling translation requests from users.

1.8.3 TAUM METEO

This is a natural language generation system used in Canada to generate weather reports. It accepts
daily weather data and generates weather reports in English and French.

1.8.4 SHRDLU (Winogard 1972)

This is a natural language understanding system that simulates actions of a robot in a block world
domain. It uses syntactic parsing and semantic reasoning to understand instructions. The user can ask the
robot to manipulate the blocks, to tell the blocks configurations, and to explain its reasoning.

1.8.5 LUNAR (Woods 1977)

This was an early question answering system that answered questions about moon rock.

1.9 Information Retrieval

The availability of vast amounts of electronic text has made it challenging to find relevant
information. Information retrieval (IR) systems aim to address this issue by providing efficient access to
relevant content. Unlike 'entropy' in communication theory, which measures uncertainty, information
here refers to the content or subject matter of text, not digital communication or data transmission. Words
serve as carriers of information, and text is seen as the message encoded in natural language.

In IR, "retrieval" refers to accessing information from computer-based representations, requiring


processing and storage. Only relevant information, based on a user's query, is retrieved. IR involves

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 12


Natural Language Processing [BAD613B]

organizing, storing, retrieving, and evaluating information that matches a query, working with
unstructured data. Retrieval is based on content, not structure, and systems typically return a ranked list
of relevant documents.

IR has been integrated into various systems, including database management systems, bibliographic
retrieval systems, question answering systems, and search engines. Approaches for accessing large text
collections fall into two categories: one builds topic hierarchies (e.g., Yahoo), requiring manual
classification of new documents, which can be cost-ineffective; the other ranks documents by relevance,
offering more scalability and efficiency for large collections

Major issues in designing and evaluating Information Retrieval (IR) systems include selecting
appropriate document representations. Current models often use keyword-based representation, which
suffers from problems like polysemy, homonymy, and synonymy, as well as ignoring semantic and
contextual information. Additionally, vague or inaccurate user queries lead to poor retrieval performance,
which can be addressed through query modification or relevance feedback.

Matching query representation to document representation is another challenge, requiring effective


similarity measures to rank results. Evaluating IR system performance typically relies on recall and
precision, though relevance itself is subjective and difficult to measure accurately. Relevance
frameworks, such as the situational framework, attempt to address this by considering context and time.
Moreover, varying user needs and document collection sizes further complicate retrieval, requiring
specialized methods for different scopes.

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 13


Natural Language Processing [BAD613B]

2. LANGUAGE MODELLING
To create a general model of any language is a difficult task. There are two approaches for language
modelling.

1. To define a grammar that can handle the language.


2. To capture the patterns in a grammar language statistically.

2.1 Introduction
Our purpose is to understand and generate natural languages from a computational viewpoint.

1st approach: Try to understand every word and sentence of it, and then come to a conclusion (has not
succeeded).
2nd approach: To study the grammar of various languages, compare them, and if possible, arrive at
reasonable models that facilitate our understanding of the problem and designing of natural-language
tools.
Language Model: A model is a description of some complex entity or process. Natural language is a
complex entity and in order to process it through a computer-based program, we need to build a
representation (model) of it.
Two categories of language modelling approaches:
Grammar-based language model:

• Uses the grammar of a language to create its model.


• It attempts to represent the syntactic structure of language.
• Hand-coded rules defining the structure and ordering of various constituents appearing in a
linguistic unit.

Eg. A sentence usually consists of noun phrase and a verb phrase. The grammar-based approach attempts
to utilize this structure and also the relationships between these structures.

Statistical language modelling:

• Creates a language model by training it from a corpus.


• To capture regularities of a language, the training corpus needs to be sufficiently large.
• Fundamental tasks in many NLP applications, including speech recognition, spelling correction,
handwriting recognition, and machine translation.
• Information retrieval, text summarization, and question answering.
• Most popular - n-gram models.

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 14


Natural Language Processing [BAD613B]

2.2 Various Grammar-Based Language Models


• Generative Grammars
• Hierarchical Grammar
• Government and Binding (GB)
• Lexical Functional Grammar (LFG) Model
• Paninian Framework

2.2.1 Generative Grammars


• We can generate sentences in a language if we know a collection of words and rules in that
language (Noam Chomsky).
• Sentences that can be generated as per the rules are grammatical and had dominated
computational linguistics.
• Addressed syntactical structure of language.
• But Language is a relation between the sound (or the written text) and its meaning.

2.2.2 Hierarchical Grammar


• Chomsky (1956) described classes of grammars in a hierarchical manner, where the top layer
contained the grammars represented by its sub classes.
• Hence, Type 0 (or unrestricted) grammar contains Type 1 (or context-sensitive grammar), which
in turn contains Type 2 (context-free grammar) and that again contains Type 3 grammar (regular
grammar).

2.2.3 Government and Binding (GB)


(Eliminated rules of Grammar – since rules were language particular)
Linguists often argue that language structure, especially in resolving structural ambiguity, can be
understood through meaning. However, the transformation between meaning and syntax is not well
understood. Transformational grammars distinguish between surface-level and deep-root-level sentence
structures.

Government and Binding (GB) theories rename these as s-level and d-level, adding phonetic and
logical forms as parallel levels of representation for analysis, as shown in Figure.

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 15


Natural Language Processing [BAD613B]

• 'meaning' in a 'sound' form is represented as logical form (LF) and phonetic form (PF) in above
figure.
• The GB is concerned with LF, rather than PF.
• The GB imagines that if we define rules for structural units at the deep level, it will be possible
to generate any language with fewer rules.

Let us take an example to explain d- and s- Structures in GB:


Mukesh was killed
i) In Transformational grammar, this can be expressed as:
S – NP AUX VP → as given below

ii) In GB, s-structure & d-structure are as follows:

Surface structure Deep structure


Note:
• The surface structure is the actual form of the sentence as it appears in speech or writing.
• The deep structure represents the underlying syntactic and semantic structure that is abstract and not
directly visible (Represents the core meaning of the sentence). "Someone killed Mukesh" or "A person
killed Mukesh."

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 16


Natural Language Processing [BAD613B]

Components of GB

• Government and binding (GB) comprise a set of theories that map the structures from d-structure
to s-structure and to logical form (LF).
• A general transformational rule called 'Move 𝛼' is applied at d-structure level as well as at s-
structure level.
• Simplest form GB can be represented as below.

GB consists of 'a series of modules that contain constraints and principles' applied at various
levels of its representations and the transformation rule, Move α.
The GB considers all three levels of representations (d-, s-, and LF) as syntactic, and LF is also
related to meaning or semantic-interpretive mechanisms.
GB applies the same Move 𝛼 transformation to map d-levels to s-levels or s-levels to LF level.
LF level helps in quantifier scoping and also in handling various sentence constructions such as passive
or interrogative constructions.
Example:
Consider the sentence: “Two countries are visited by most travellers.”
Its two possible logical forms are:
LF1: [s Two countries are visited by [NP most travellers]]
LF2: Applying Move 𝛼
[NP Most travellersi ] [s two countries are visited by ei]

• In LF1, the interpretation is that most travellers visit the same two countries (say, India and
China).
• In LF2, when we move [most travellers] outside the scope of the sentence, the interpretation can
be that most travellers visit two countries, which may be different for different travellers.
• One of the important concepts in GB is that of constraints. It is the part of the grammar which
prohibits certain combinations and movements; otherwise Move α can move anything to any
possible position.

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 17


Natural Language Processing [BAD613B]

• Thus, GB, is basically the formulation of theories or principles which create constraints to
disallow the construction of ill-formed sentences.
The organization of GB is as given below:

̅ Theory:
𝑿

• ̅ Theory (pronounced X-bar theory) is one of the central concepts in GB. Instead of defining
The 𝑿
̅ Theory defines
several phrase structures and the sentence structure with separate sets of rules, 𝑿
them both as maximal projections of some head.
• Noun phrase (NP), verb phrase (VP), adjective phrase (AP), and prepositional phrase (PP) are
maximal projections of noun (N), verb (V), adjective (A), and preposition (P) respectively, and
can be represented as head X of their corresponding phrases (where X = {N, V, A, P})
• Even the sentence structure can be regarded as the maximal projection of inflection (INFL).
• The GB envisages projections at two levels:
• The projection of head at semi-phrasal level, denoted by 𝑿 ̅,
• ̿.
The Maximal projection at the phrasal level, denoted by 𝑿

Figure depicts the general and particular structures with examples

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 18


Natural Language Processing [BAD613B]

Maximal projection of sentence structure

Sub-categorization: It refers to the process of classifying words or phrases (typically verbs) according
to the types of arguments or complements they can take. It's a form of syntactic categorization that is
important for understanding the structure and meaning of sentences.

For example, different verbs in English can have different sub-categorization frames (also called
argument structures). A verb like "give" might take three arguments (subject, object, and indirect object),
while a verb like "arrive" might only take a subject and no objects.

"He gave her a book." ("gave" requires a subject, an indirect object, and a direct object)

"He arrived." ("arrived" only requires a subject)

In principle, any maximal projection can be the argument of a head, but sub-categorization is used as a
filter to permit various heads to select a certain subset of the range of maximal projections.

Projection Principle:
Three syntactic representations:
1. Constituency Parsing (Tree Structure):
• Sentences are broken into hierarchical phrases or constituents (e.g., noun phrases, verb
phrases), represented as a tree structure.
2. Dependency Parsing (Directed Graph):
• Focuses on the direct relationships between words, where words are connected by directed
edges indicating syntactic dependencies.
3. Semantic Role Labelling (SRL):
• Identifies the semantic roles (e.g., agent, patient) of words in a sentence, focusing on the
meaning behind the syntactic structure.
The projection principle, a basic notion in GB, places a constraint on the three syntactic representations
and their mapping from one to the other.

The principle states that representations at all syntactic levels (i.e., d-level, s-level, and LF level) are
projections from the lexicon (collection or database of words and their associated linguistic information).

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 19


Natural Language Processing [BAD613B]

Thus, lexical properties of categorical structure (sub-categorization) must be observed at each level.
Suppose 'the object' is not present at d-level, then another NP cannot take this position at s-level.

Example:

• At D-structure, each argument of a verb is assigned a thematic role (e.g., Agent, Theme, Goal,
etc.).
• In a sentence like "John gave Mary the book", the verb "gave" requires three arguments: Agent
(John), Recipient (Mary), and Theme (the book).
• If the object (Theme) is not present at the deep structure, it cannot be filled at the surface structure
(S-structure) by another NP (e.g., a different noun phrase).

Theta Theory (Ɵ-Theory) or The Theory of Thematic Relations

• 'Sub-categorization' only places a restriction on syntactic categories which a head can accept.
• GB puts another restriction on the lexical heads through which it assigns certain roles to its
arguments.
• These roles are pre-assigned and cannot be violated at any syntactical level as per the projection
principle.
• These role assignments are called theta-roles and are related to 'semantic-selection'.

Theta Role and Theta Criterion


There are certain thematic roles from which a head can select. These are called Ɵ-roles and they are
mentioned in the lexicon, say for example the verb 'eat' can take arguments with Ɵ-roles '(Agent, Theme)'.

Agent is a special type of role which can be assigned by a head to outside arguments (external
arguments) whereas other roles are assigned within its domain (internal arguments).

Hence in 'Mukesh ate food',

the verb 'eat' assigns the 'Agent' role to 'Mukesh' (outside VP)

and 'Theme' (or 'patient') role to 'food'.

Theta-Criterion states that 'each argument bears one and only one Ɵ-role, and each Ɵ-role is
assigned to one and only one argument'.

C-command and Government


C-Command: It is a syntactic relation that defines a type of hierarchical relationship between two
constituents (words or phrases) in a sentence. It plays a critical role in the distribution of certain syntactic
phenomena, such as binding, agreement, and pronoun reference.

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 20


Natural Language Processing [BAD613B]

If any word or phrase (say α or ß) falls within the scope of and is determined by a maximal projection,
we say that it is dominated by the maximal projection.

If there are two structures α and ß related in such a way that 'every maximal projection dominating a
dominates ß', we say that a C-commands ß, and this is the necessary and sufficient condition (iff) for C-
command.

Government
α governs ß iff: α C-commands ß
α is an X (head, e.g., noun, verb, preposition, adjective, and inflection), and every maximal projection
dominating ß dominates α.
Additional information
C-COMMAND
A c-command is a syntactic relationship in linguistics, particularly in the theory of syntax, where one node (word
or phrase) in a tree structure can "command" or "govern" another node in certain ways. In simpler terms, it's a rule
that helps determine which parts of a sentence can or cannot affect each other syntactically.
Simple Definition:
C-command occurs when one word or phrase in a sentence has a syntactic connection to another word or phrase,
typically by being higher in the syntactic tree (closer to the top).
Example 1:
In the sentence "John saw Mary,"
"John" c-commands "Mary" because "John" is higher up in the tree structure and can potentially affect "Mary"
syntactically.
Example 2:
In the sentence "She thinks that I am smart,"
The pronoun "She" c-commands "I" because "She" is higher in the syntactic tree, governing the phrase where "I"
occurs.
In essence, c-command helps explain which words in a sentence are connected in ways that allow for things like
pronoun interpretation or binding relations (e.g., which noun a pronoun refers to).
GOVERNMENT
-is a special case of C-COMMAND
government refers to the syntactic relationship between a head (typically a verb, noun, or adjective) and its
dependent elements (such as objects or complements) within a sentence. It determines how certain words control
the form or case of other words in a sentence.
On the other hand, c-command is a syntactic relationship between two constituents in a sentence. A constituent A
c-commands another constituent B if the first constituent (A) is higher in the syntactic structure (usually in the tree)
and can potentially govern or affect the second constituent (B), provided no intervening nodes.
To put it together in context:
Government: This is a formal rule determining how certain words govern the case or form of other words in a
sentence (e.g., verbs can govern the object noun in accusative case in languages like Latin or German).
C-command: This is a structural relationship in which one constituent can influence another, typically affecting
operations like binding, scope, and sometimes government.

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 21


Natural Language Processing [BAD613B]

In short, government often operates within the structures of c-command, but c-command itself is a broader syntactic
relationship that is also relevant for other linguistic phenomena, such as binding theory, where one element can bind
another if it c-commands it.
Sure! Here are a few examples of government in syntax, showing how one word governs the form or case of another
word in a sentence:
1. Verb Government
In many languages, verbs can govern the case of their objects. Here’s an example in Latin:
Latin: "Vidēre puellam" (to see the girl)
The verb "vidēre" (to see) governs the accusative case of "puellam" (the girl).
In this case, the verb "vidēre" governs the object "puellam" by requiring it to be in the accusative case.
2. Preposition Government
Prepositions can also govern the case of their objects. Here’s an example from German:
German: "Ich gehe in den Park" (I am going to the park)
The preposition "in" governs the accusative case of "den Park" (the park).
The preposition "in" governs the accusative case for the noun "Park" in this sentence.
3. Adjective Government
Adjectives can govern the case, gender, or number of the noun they modify. Here's an example from Russian:
Russian: "Я вижу красивую девочку" (I see a beautiful girl)
The adjective "красивую" (beautiful) governs the accusative case of "девочку" (girl).
In this case, the adjective "красивую" (beautiful) governs the accusative case of "девочку".
4. Noun Government
In some languages, nouns can govern the case of their arguments. In Russian, for example, some nouns govern a
particular case:
Russian: "Я горжусь успехом" (I am proud of the success)
The noun "успехом" (success) governs the instrumental case in this sentence.
Here, the noun "успехом" governs the instrumental case of its argument "успех".
Summary:
Government involves syntactic relationships where a head (verb, preposition, adjective, etc.) dictates or determines
the form (such as case) of its dependent elements.
In these examples, verbs, prepositions, and adjectives have a "governing" influence on the cases of nouns or objects
in the sentence, which is a core part of the syntax in many languages.

Movement, Empty Category, and Co-indexing


Movement & Empty Category:
In GB, Move α is described as 'move anything anywhere', though it provides restrictions for valid
movements.
In GB, the active to passive transformation is the result of NP movement as shown in sentence. Another
well-known movement is the wh-movement, where wh-phrase is moved as follows.
What did Mukesh eat?

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 22


Natural Language Processing [BAD613B]

[Mukesh INFL eat what]


As discussed in the projection principle, lexical categories must exist at all the three levels. This principle,
when applied to some cases of movement leads to the existence of an abstract entity called empty category.

In GB, there are four types of empty categories:

Two being empty NP positions called wh-trace and NP trace, and the remaining two being pronouns
called small 'pro' and big 'PRO'.

This division is based on two properties-anaphoric (+a or -a ) and pronominal (+p or -p).
Wh-trace -a, -p
NP-trace +a, -p
small 'pro' -a, +p
big 'PRO' . +a, +p

The traces help ensure that the proper binding relationships are maintained between moved elements
(such as how pronouns or reflexives bind to their antecedents, even after movement).
Additional Information:
• +a (Anaphor): A form that must refer back to something mentioned earlier (i.e., it has an
antecedent). For example, "himself" in "John washed himself." The form "himself" is an anaphor
because it refers back to "John."
• -a (Non-Anaphor): A form that does not require an antecedent to complete its meaning. A regular
pronoun like "he" in "He went to the store" is not an anaphor because it doesn't explicitly need to
refer back to something within the same sentence or clause.
• +p (Pronominal): A form that can function as a pronoun, standing in for a noun or noun phrase.
For example, "she" in "She is my friend" is a pronominal because it refers to a specific person
(though not necessarily previously mentioned).
• -p (Non-Pronominal): A word or form that isn't used as a pronoun. It could be a noun or other
word that doesn't serve as a replacement for a noun phrase in a given context. For example, in
"John went to the store," "John" is not pronominal—it is a noun phrase.

Co-indexing
It is the indexing of the subject NP and AGR (agreement) at d-structure which are preserved by Move α
operations at s-structure.

When an NP-movement takes place, a trace of the movement is created by having an indexed empty
category (e) from the position at which the movement began to the corresponding indexed NP.

For defining constraints to movement, the theory identifies two positions in a sentence. Positions assigned
̅ positions.
θ -roles are called θ-positions, while others are called 𝜃
In a similar way, core grammatical positions (where subject, object, indirect object, etc., are positioned)
̅-positions.
are called A-positions (arguments positions), and the rest are called 𝐴

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 23


Natural Language Processing [BAD613B]

Binding theory:

Binding Theory is a syntactic theory that explains how pronouns and noun phrases are interpreted and
distributed in a sentence. It's concerned with the relationships between pronouns and their antecedents
(myself, herself, himself).

Binding is defined by Sells (1985) as follows:


α binds ß iff
α C-commands ß, and
α and ß are co-indexed
As we noticed in sentence,
[ei INFL kill Mukesh]
[Mukesh; was killed (by ei)]
Mukesh was killed.
Empty clause (ei) and Mukesh (NPi) are bound. This theory gives a relationship between NPs.

Empty clause (ei) and Mukesh (NPi) are bound. This theory gives a relationship between NPs (including
pronouns and reflexive pronouns). Now, binding theory can be given as follows:
(a) An anaphor (+a) is bound in its governing category.
(b) A pronominal (+p) is free in its governing category.
(c) An R-expression (-a, -p) is free.
Example
A: Mukeshi knows himselfi
B: Mukeshi believes that Amrita knows himi
C: Mukesh believes that Amritaj knows Nupurk (Referring expression)

Similar rules apply on empty categories also:


NP-trace: +a, -p: Mukesh, was killed ei
wh-trace: -a, -p: Who does he; like ei
Empty Category Principle (ECP):

The 'proper government' is defined as:


α properly governs ß iff:
α governs ß and a is lexical (i.e. N, V, A, or P) or
α locally A-binds ß
The ECP says 'A trace must be properly governed'.
This principle justifies the creation of empty categories during NP- trace and wh-trace and also explains
the subject/object asymmetries to some extent. As in the following sentences:

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 24


Natural Language Processing [BAD613B]

(a) Whati do you think that Mukesh ate ei?


(b) Whati do you think Mukesh ate ei?
Mukesh is subject, ate is a verb and what is object that moves to the front. Mukesh remains in its original
position.

Bounding and Control Theory:

Note: There are many other types of constraints on Move α and not possible to explain all of them.

In English, the long-distance movement for complement clause can be explained by bounding theory if
NP and S are taken to be bounding nodes. The theory says that the application of Move a may not cross
more than one bounding node. The theory of control involves syntax, semantics, and pragmatics.

Case Theory and Case Filter:

In GB, case theory deals with the distribution of NPs and mentions that each NP must be assigned a case.
In English, we have the nominative, objective, genitive, etc., cases, which are assigned to NPs at particular
positions. Indian languages are rich in case-markers, which are carried even during movements.

Example:
He is running ("He" is the subject of the sentence, performing the action. - nominative)
She sees him. ("Him" is the object of the verb "sees." - Objective)
The man's book. (The genitive case expresses possession or a relationship between nouns,)

Case filter: An NP is ungrammatical if it has phonetic content or if it is an argument and is not case-
marked. Phonetic content here, refers to some physical realization, as opposed to empty categories.

Thus, case filters restrict the movement of NP at a position which has no case assignment. It works in a
manner similar to that of the θ-criterion.

Summary of GB:

In short, GB presents a model of the language which has three levels of syntactic representation.

• It assumes phrase structures to be the maximal projection of some lexical head and in a similar
fashion, explains the structure of a sentence or a clause.
• It assigns various types of roles to these structures and allows them a broad kind of movement
called Move α.
• It then defines various types of constraints which restrict certain movements and justifies others.

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 25


Natural Language Processing [BAD613B]

2.2.4 Lexical Functional Grammar (LFG) Model


%Watch this video: https://www.youtube.com/watch?v=EoCLhS_0cmE %

• LFG represents sentences at two syntactic levels - constituent structure (c-structure) and
functional structure (f-structure).
• Kaplan proposed a concrete form for the register names and values which became the functional
structures in LFG.
• Bresnan was more concerned with the problem of explaining some linguistic issues, such as
active/passive and dative alternations, in transformational approach. She proposed that such
issues can be dealt with by using lexical redundancy rules.
• The unification of these two diverse approaches (with a common concern) led to the development
of the LFG theory.

The term 'lexical functional' is composed of two terms:

• The 'functional' part is derived from 'grammatical functions', such as subject and object, or roles
played by various arguments in a sentence.
• The 'lexical' part is derived from the fact that the lexical rules can be formulated to help define
the given structure of a sentence and some of the long-distance dependencies, which is difficult
in transformational grammars.

C-structure and f-structure in LFG


The c-structure is derived from the usual phrase and sentence structure syntax, as in CFG

The grammatical-functional role cannot be derived directly from phrase and sentence structure, functional
specifications are annotated on the nodes of c-structure, which when applied on sentences, results in f-
structure

Example: She saw stars in the sky

[
SUBJ: [ PERS: 3, NUM: SG ], // "She" is the subject, 3rd person, singular
PRED: "see", // The verb "saw" represents the predicate "see"
OBJ: [ NUM: PL, PRED: "star" ], // "stars" is the object, plural, and the predicate is "star"
LOC: [ PRED: "sky", DEF: + ] // "sky" is the location, with a definite determiner ("the")
]

f-structure

c- structure

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 26


Natural Language Processing [BAD613B]

Example:
She saw stars in the sky
CFG rules to handle this sentence are:
S → NP VP
VP → V {NP} PP* {NP} {S'}
Stars Sky
PP → P NP
NP → Det N {PP}
S' → Comp S
Where: S: Sentence V: Verb P: Preposition N: Noun

S': clause Comp: complement { }: optional

* : Phrase can appear any number of times including blank

When annotated with functional specifications, the rules become

• Here, (up arrow) refers to the f-structure of the mother node that is on the left-hand side of
the rule.
• The (down arrow) symbol refers to the f-structure of the node under which it is denoted.
• Hence, in Rule 1, indicates that the f-structure of the first NP goes to the f-structure of

the subject of the sentence, while indicates that the f-structure of the VP node goes directly
to the f-structure of the sentence VP.

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 27


Natural Language Processing [BAD613B]

Consistency In a given f-structure, a particular attribute may have at the most one value. Hence, while
unifying two f-structures, if the attribute Num has value SG in one and PL in the other, it will be rejected.
Completeness When an f-structure and all its subsidiary f-structures (as the value of any attribute of f-
structure can again contain other f-structures) contain all the functions that their predicates govern, then
and only then is the f-structure complete.

For example, since the predicate 'see < ( Subj) ( Obj) >' contains an object as its governable function,
a sentence like 'She saw' will be incomplete.
Coherence Coherence maps the completeness property in the reverse direction. It requires that all
governable functions of an f-structure, and all its subsidiary f-structures, must be governed by their
respective predicates. Hence, in the f-structure of a sentence, an object cannot be taken if its verb does
not allow that object. Thus, it will reject the sentence, 'I laughed a book.'
Example:

Let us see first the lexical entries of various words in the sentence:

She saw stars

Lexical entries

c – structure

Finally, the f-structure is the set of attribute-value pairs, represented as

It is interesting to note that the final f-structure is obtained


through the unification of various f-structures for subject, object,
verb, complement, etc. This unification is based on the functional
specifications of the verb, which predicts the overall sentence
structure.

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 28


Natural Language Processing [BAD613B]

Lexical Rules in LFG


Different theories have different kinds of lexical rules and constraints for handling various sentence-
constructs (active, passive, dative, causative, etc.).

In LFG, the verb is converted to the participial form, but the sub-categorization is changed directly.

Consider the following example:


oblique agent (Oblag) phrase:
Active: Tara ate the food.
Passive: The food was eaten by Tara

Active: Pred='eat<( Subj) ( Obj)>’

Passive: Pred='eat<( Oblag) ( Subj)>’


Here, Oblag represents oblique agent phrase.
Similar rules can be applied in active and dative constructs for the verbs that accept two objects.
oblique goal (Oblgo) phrase:
Active: Tara gave a pen to Monika.
Passive: Tara gave Monika a pen.

Active: Pred='give<( Subj) ( Obj2) ( Obj)>’

Passive: Pred ='give <( Subj) ( Obj) ( Oblgo)>'


Here, Oblgo stands for oblique goal phrase.
Similar rules are also applicable to the process of causativization. This can be seen in Hindi, where the
verb form is changed as follows:

Example

Active: तारा हँ सी

Taaraa hansii
Tara laughed

Causative: मोनिका िे तारा को हँ साया

Monika ne Tara ko hansaayaa Here, a new predicate is formed which


Monika Subj Tara Obj laugh-cause-past causes the action and requires a new
subject, while the old subject becomes the
Monika made Tara to laugh. object of the new predicate and the old verb
becomes the X-complement (complement
Active: Pred='Laugh < Subj>’ to infinital VPs).

Causative: Pred='cause <( Subj) ( Obj) (Comp)>’

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 29


Natural Language Processing [BAD613B]

Long Distance Dependencies and Coordination


In GB, when a category moved, it creates an empty category.

In LFG, unbounded movement and coordination is handled by the functional identity and by correlation
with the corresponding f-structure.

Example: Consider the wh-movement in the following sentence.


Which picture does Tara like-most?
The f-structure can be represented as follows:

2.2.5 Paninian Framework


Paninian grammar (PG) was written by Panini in 500 BC in Sanskrit (the original text being titled
Asthadhyayi), the framework can be used for other Indian languages and possibly some Asian languages
as well.

Unlike English (Subject-Verb-Object ordered), Asian languages are SOV (Subject-Object-Verb) ordered
and inflectionally rich. The inflections provide important syntactic and semantic cues for language
analysis and understanding. The Paninian framework takes advantage of these features.

Note: Inflectional – refers to the changes a word undergoes to express different grammatical categories
such as tense, number, gender, case, mood, and aspect without altering the core meaning of the word.

Indian languages have traditionally used oral communication for knowledge propagation. In Hindi, we
can change the position of subject and object. For example:

(a) माँ बच्चे को खािा दे ती है। (b) बच्चे को माँ खािा दे ती है ।


Maan Bachche ko khanaa detii hai Bachche ko Maan khanaa detii hai
Mother child to food give-(s) Child to mother food give-(s)
Mother gives food to the child. Mother gives food to the child.
The auxilary verbs follow the main verb. In Hindi, they remain as separate words:

खा रहा है करता रहा है


khaa raha hai kartaa rahaa hai
eat-ing doing been has
eating has been doing

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 30


Natural Language Processing [BAD613B]

In Hindi, some verbs (main), e.g., give (दे िा), take (लेिा), also combine with other verbs (main) to
change the aspect and modality of the verbs.

उसिे खािा खाया। उसिे खािा खा नलया।


Usne khanaa khaayaa Usne khaanaa kha liyaa
He (Subj) food ate He (Subj) food eat taken
He ate food He ate food (completed the action)

वह चला वह चल नदया
He move given
He moved He moved (started the action)

The nouns are followed by post-positions instead of prepositions. They generally remain as separate
words in Hindi,

रे खा के निता उसके निता

Rekha ke pita Uske pita


Rekha of father
Father of Rekha Her (His) father
All nouns are categorized as feminine or masculine, and the verb form must have a gender agreement
with the subject

ताला खो गया चाभी खो गयी

Taalaa kho gayaa Chaabhii kho gayeee


Lock lose (past) key lose (past)
The lock was lost The key was lost.
Layered Representation in PG
The GB theory represents three syntactic levels: deep structure, surface structure, and logical form (LF),
where the LF is nearer to semantics. This theory tries to resolve all language issues at syntactic levels
only.

Paninian grammar framework is said to be syntactico-semantic, that


is, one can go from surface layer to deep semantics by passing
through intermediate layers.

• The surface and the semantic levels are obvious. The other
two levels should not be confused with the levels of GB.
• Vibhakti literally means inflection, but here, it refers to word
(noun, verb, or other) groups based either on case endings, or
post-positions, or compound verbs, or main and auxiliary
verbs, etc

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 31


Natural Language Processing [BAD613B]

• Karaka (pronounced Kaaraka) literally means Case, and in GB, we have already discussed case
theory, θ-theory, and sub-categorization, etc. Paninian Grammar has its own way of defining
Karaka relations.

Karaka Theory

• Karaka theory is the central theme of PG framework.


• Karaka relations are assigned based on the roles played by various participants in the main
activity.
• Various Karakas, such as Karta (subject), Karma (object), Karana (instrument), Sampradana
(beneficiary), Apadan (separation), and Adhikaran (locus).

Example:

माँ बच्ची को आँ गि में हाथ से रोटी खखलाती है ।

Maan bachchi ko aangan mein haath se rotii khilaatii hei


Mother child-to courtyard-in hand-by bread feed (s).
The mother feeds bread to the child by hand in the courtyard.

• 'maan' (mother) is the Karta, Karta has generally 'ne' or 'o' case marker.
• rotii (bread) is the Karma. ('Karma' is similar to object and is the locus of the result of the activity)
• haath (hand) is the Karan. (noun group through which the goal is achieved), It has the marker
“dwara” (by) or “se”
• 'Sampradan' is the beneficiary of the activity, e.g., bachchi (child).
• 'Apaadaan' denotes separation and the marker is attached to the part that serves as a reference
point (being stationary). It takes the marker “ko” (to) or “ke liye” (for).
• aangan (courtyard) is the Adhikaran (is the locus (support in space or time) of Karta or Karma).

Issues in Paninian Grammar


The two problems challenging linguists are:
(i) Computational implementation of PG, and
(ii) Adaptation of PG to Indian, and other similar languages.
However, many issues remain unresolved, specially in cases of shared Karak relations. Another
difficulty arises when mapping between the Vibhakti (case markers and post-positions) and the semantic
relation (with respect to verb) is not one to one. Two different Vibhakti can represent the same relation,
or the same Vibhakti can represent different relations in different contexts.

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 32


Natural Language Processing [BAD613B]

2.3 Statistical Language Model


A statistical language model is a probability distribution P(s) over all possible word sequences (or any
other linguistic unit like words, sentences, paragraphs, documents, or spoken utterances).

2.3.1 n-gram Model (https://www.youtube.com/watch?v=Vc2C1NZkH0E )


Applications: Suggestions in messages, spelling correction, Machine translation, Handwritten recognition…

It is a statistical method that predicts the probability of a word appearing next in a sequence based on the
previous "n" words.

Why n-gram?

The goal of a statistical language model is to estimate the probability (likelihood) of a sentence. This is
achieved by decomposing sentence probability into a product of conditional probabilities using the chain
rule as follows:

where hi is history of word wi, defined as w1 w2 . . . wi-1

So, in order to calculate sentence probability, we need to calculate the probability of a word, given the
sequence of words preceding it. This is not a simple task.

An n-gram model simplifies the task by approximating the probability of a word given all the previous
words by the conditional probability given previous n-1 words only.

P(Wi/hi) = P(Wi/Wi-n+1.Wi-1)

Thus, an n-gram model calculates P(w/h) by modelling language as Markov model of order n-1, i.e., by
looking at previous n-1 words only.

A model that limits the history to the previous one word only, is termed a bi-gram (n= 1) model.

A model that conditions the probability of a word to the previous two words, is called a tri-gram (n=2)
model.

Using bi-gram and tri-gram estimate, the probability of a sentence can be calculated as:

Example: The Arabian knights are fairy tales of the east


bi-gram approximation - P(east/the), tri-gram approximation - P(east/of the)

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 33


Natural Language Processing [BAD613B]

One pseudo-word <s> is introduced to mark the beginning of the sentence in bi-gram estimation.
Two pseudo-words <s1> and <s2> for tri-gram estimation.
How to estimate these probabilities?
1. Train n-gram model on training corpus.
2. Estimate n-gram parameters using the maximum likelihood estimation (MLE) technique, i.e.,
using relative frequencies.
o Count a particular n-gram in the training corpus and divide it by the sum of all n-grams
that share the same prefix
3. The sum of all n-grams that share first n-1 words is equal to the count of the common prefix
Wi-n+1, ... , Wi-1.

Example tri-gram:

Predicted word for “The girl bought”

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 34


Natural Language Processing [BAD613B]

Example
Training set:

The Arabian Knights


These are the fairy tales of the east
The stories of the Arabian knights are translated in many languages

Bi-gram model:
P(the/<s>) =0.67 P(Arabian/the) = 0.4 P(knights /Arabian) =1.0
P(are/these) = 1.0 P(the/are) = 0.5 P(fairy/the) =0.2
P(tales/fairy) =1.0 P(of/tales) =1.0 P(the/of) =1.0
P(east/the) = 0.2 P(stories/the) =0.2 P(of/stories) =1.0
P(are/knights) =1.0 P(translated/are) =0.5 P(in /translated) =1.0
P(many/in) =1.0
P(languages/many) =1.0

Test sentence(s): The Arabian knights are the fairy tales of the east.
P(The/<s>)×P(Arabian/the)×P(Knights/Arabian)x P(are/knights)
× P(the/are)×P(fairy/the)xP(tales/fairy)×P(of/tales)× P(the/of)
x P(east/the)
=0.67×0.5×1.0×1.0×0.5×0.2×1.0×1.0×1.0×0.2
=0.0067
Limitations:

• Multiplying the probabilities might cause a numerical underflow, particularly in long sentences.
To avoid this, calculations are made in log space, where a calculation corresponds to adding log
of individual probabilities and taking antilog of the sum.
• The n-gram model faces data sparsity, assigning zero probability to unseen n-grams in the training
data, leading to many zero entries in the bigram matrix. This results from the assumption that a
word's probability depends solely on the preceding word(s), which isn't always true.
• Fails to capture long-distance dependencies in natural language sentences.

Solution:

• A number of smoothing techniques have been developed to handle the data sparseness problem.
• Smoothing in general refers to the task of re-evaluating zero-probability or low-probability n-
grams and assigning them non-zero values.

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 35


Natural Language Processing [BAD613B]

2.3.2 Add-one Smoothing

• It adds a value of one to each n-gram frequency before normalizing them into probabilities. Thus,
the conditional probability becomes:

Where, V is the vocabulary size.

• Yet, not effective, since it assigns the same probability to all missing n-grams, even though some
of them could be more intuitively appealing than others.s

Example:

Consider the following toy corpus:

• "I love programming"

• "I love coding"

We want to calculate the probability of the bigram "I love" using Add-one smoothing.

Step 1: Count the occurrences

• Unigrams:

o "I" appears 2 times

o "love" appears 2 times

o "programming" appears 1 time

o "coding" appears 1 time

• Bigrams:

o "I love" appears 2 times

o "love programming" appears 1 time

o "love coding" appears 1 time

• Vocabulary size V: There are 4 unique words: "I", "love", "programming", "coding".

Step 2: Apply Add-one smoothing

For the bigram "I love":

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 36


Natural Language Processing [BAD613B]

Step 3: For an unseen bigram

Let’s say we want to calculate the probability for the bigram "I coding" (which doesn’t appear in the
training data):

2.3.3 Good-Turing Smoothing

• Good-Turing smoothing improves probability estimates by adjusting for unseen n-grams based
on the frequency distribution of observed n-grams.
• It adjusts the frequency f of an n-gram using the count of n-grams having a frequency of
occurrence f+1. It converts the frequency of an n-gram from f to f* using the following
expression:

where n is the number of n-grams that occur exactly f times in the training corpus. As an example, consider
that the number of n-grams that occur 4 times is 25,108 and the number of n-grams that occur 5 times is
20,542. Then, the smoothed count for 5 will be:

2.3.4 Caching Technique


The caching model is an enhancement to the basic n-gram model that addresses the issue of frequency
variation across different segments of text or documents. In traditional n-gram models, the probability of
an n-gram is calculated based solely on its occurrence in the entire corpus, which does not take into
account the local context or recent patterns. The caching model improves this by incorporating the
recently discovered n-grams into the probability calculations.

Dept. of CSE-DS, RNSIT Dr. Mahantesh K 37

You might also like