0% found this document useful (0 votes)
183 views

Saskrit Parser Presentation CSE

The document discusses developing a Sanskrit parser to analyze Sanskrit sentences by breaking them into logical components. It covers the motivation, goals and approach which includes lexical analysis to tokenize words and recognize parts of speech, as well as parsing to understand sentence structure.

Uploaded by

PrateekSharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
183 views

Saskrit Parser Presentation CSE

The document discusses developing a Sanskrit parser to analyze Sanskrit sentences by breaking them into logical components. It covers the motivation, goals and approach which includes lexical analysis to tokenize words and recognize parts of speech, as well as parsing to understand sentence structure.

Uploaded by

PrateekSharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

SANSKRIT PARSER

(PARSING A SANSKRIT SENTENCE IN SOME


RECOGNIZABLE FORMAT)

Project Guide

Aman Singhai

Mr. Pantha Nath

Ashish Mani Tripathi


Mukesh Kumar
Nitish Kumar
Prasoon


World's First
Programming
Language

Why we chose this project

According to many researchers, Sanskrit


is a very scientific language.
Sanskrit behaves very closely as
programming language.
So if we are able to make a translator
that translates Sanskrit into machine
code, then it would prove to be a
significant development in the field of
NLP(Natural Language Processing).

Why we became interested


NASA scientist Rick Briggs had invited 1,000Sanskrit
scholars from India for working at NASA. But
scholars refused to allow the language to be put to
foreign use- Dainik
Being a computer and human understandable,
Sanskrit was considered useful in space and many
other natural language processing Applications.

What we are trying to do


here
In this project we will basically try to parse a Sanskrit
sentence so that later on it could be easy to
translate it in some other language
We take input as a Sanskrit sentence or para.
We tokenize the whole sentence(Lexical analysis).
We recognize the parts of the speech from individual
tokens(Parsing)
And then we parse the sentence or try to make sense
out of it(Parsing)

Index
We will first put up some concepts then employ
them1. Lexical Analysis
2. Parsing
3. Advantages of using Sanskrit
4. Approach
A. Lexical Analysis in Sanskrit
B. Parsing in Sanskrit
5. Where we are now.

Advantages of using Sanskrit


-Why Sanskrit)
Linguistically Sanskrit is common base to a large
group of Indo-European languages
Limited Vocabulary Words represent properties
Prefix+Word+Suffix
Fixed Morphology
Concept of Vibhakti

Vibhakti as Pointer

Vibhakti as Pointer
Consider the Sentence

'The man saw the girl with the binoculars.'


The man(S) saw(V) the girl(O) with the binoculars(I)

OR

The man(S) saw(V) the girl with the binoculars(O)



Same is also the reason for UNAMBIGUITY in a


sentence. NO effect of shuffling words.

Fixed Morphology

Words in Sanskrit belong to 3 categories,namelyDhatuRoop root of all verbs


ShabdaRoop root of all nouns
Avyaya words with no morphology(indeclinables)
Each word belonging to
DhatuRoop has 36 morphed versions
ShabdaRoop has 21 morphed versions
Avyaya words can represent a single meaning

Approach

Programming language used: C and C++

Database Used: Linux file system, indexed

Data Structures: Array, Linked List, structure

Tree, Indexing and Hashing

INPUT: A sanskrit sentence or paragraph

eg: "!

OUTPUT: recognize all the parts of speech

Form a tree structure to be able to understand the


sentence.

How the output will be shown in


terminal

::: this is a avyaya.. and the meaning is: where_there ]

::: Nominative,Singular, Gender-Masculine ,noun

and the root is:

::: The root is: the meaning is: go

present-tense,first-person,singular

::: this is a avyaya.. and the meaning is: there

::: Nominative,Plural Gender-Masculine ,noun ,and the root is:

and the meaning is god

::: Instrumental,Singular,

and the meaning is lord_raama

Gender-Masculine ,noun, and the root is:

and the meaning is boy


::: Accusative,Singular, Gender-Feminine ,nounand the root is:
and the meaning is river

Approach(coding concept)

We first tokenize the input using


strtok(str, );
Each token can be of 4 types- Noun,verb,
preposition,etc.
The task is to identify these token which is
done by matching in indexed database.
Each token is stored in a structure along
with the meaning and its morphologic.
Then parser comes into play and form a
tree type of structure. Using these tokens.

Lexical Analysis

Lexical analysis is the process of converting a


sequence of characters into a sequence of tokens
A program or function that performs lexical analysis
is called a lexical analyzer, lexer, tokenizer, or
scanner
A lexer often exists as a single function which is
called by a parser or another function, or can be
combined with the parser in scannerless parsing
The lexical analyzer is the first phase of translator.
Its main task is to read the input characters and
produces output a sequence of tokens that the
parser uses for syntax analysis.

The role of lexical analyzer

Source
program

token
Lexical Analyzer

Parser

getNextToken

Symbol
table

To semantic
analysis

Whats a Token?

Output of lexical analysis is a stream of


tokens
A token is a syntactic category
In

English:
noun, verb, adjective,
In

sanskrit language:
Vibhakti, kriya, vishashena, ..

Parser relies on the token distinctions:

17

Lexical Analyzer:
Implementation

An implementation must do two things:


1.

Recognize substrings corresponding to tokens

2.

Search the identified token in the database to


recognize its context

According to the different context it may be


different parts of speech of Sanskrit language
eg: verb(kriya), vibhakti(dhatu roop).
4. Every token is tagged accordingly.
3.

18

Lookahead

Two important points:


1.

The goal is to partition the string. This is


implemented by reading left-to-right, recognizing
one token at a time

2.

Lookahead may be required to decide where one


token ends and the next token begins

Even

our simple example has lookahead issues


i vs. if
= vs. ==

LEXICAL ANALYSIS

Sanskrit's property of FIXED MORPHOLOGY lays the


basis for analyzing individual verbs and nouns
programmically.
The input word's suffix is analyzed to obtain the
following result Verbs Tense,number,person
Noun Sex,number,case

LEXICAL ANALYSIS
Consider the dhatu(verb root) meaning to
heat
The following inflections are analyzed lexically HEATS

, , " |
" |
, , |
|
, ,

HEATED
, , |

WILL HEAT

, ,

, ,
, ,
HEAT IT(order)
, , "

LEXICAL ANALYSIS
Consider the noun representing God
The following inclusions are possible
1. Nominative (subject)
2. Accusative (object)
3. Instrumental (by)

4. Dative(to)

5. Ablative(from)
6. Genitive(of)

7. Locative(in)

LEXICAL ANALYSIS
Input Sentence
Tokenize
Avyaya Analysis
Verb Analysis
Noun Analysis
Unknown word(add to database)

Parsing

The scanner recognizes words

The parser recognizes syntactic units

Parser operations:

Check and verify syntax based on specified


syntax rules

Report errors

Build IR

Automation:

The process can be automated

Why to separate Lexical


analysis and parsing

1.
2.
3.

Simplicity of design
Improving translator efficiency
Enhancing translator portability

Parsing Sanskrit Text


Now we move towards translating a Sanskrit
Sentence into its English equivalent
PARSING
Analyze (a sentence) into its component parts and
describe their syntactic roles.
Analyze (a string or text) into logical syntactic components,
typically in order to test conformability to a logical grammar.

Thus we need to syntactically re-align the lexical output


to represent a meaningful context in English.

Parsing Sanskrit Text


Sanskrit Sentence Structure
SOV
English Sentence Strucute
SVO


Boy reads
chapter
S O V
S
V
O

Example Sanskrit Sentence

Avyaya's Role in Sanskrit

Avyaya words(indeclinables) are used to connect


2 or more simple sentences. Examples - (if-then)
- (where-there)
" (but)
(hence)
(provided,if)

Not only do avyaya connect sentences but they


also affect structure of a simple sentence.

Where We are Now

A big chunk of our time was invested in research of


sanskrit language and its grammar which was quite
difficult.
Till now we have implemented lexer part which could
be very usefull for educational purpose only.
For end sem we will be implementing parser part of
this project.

Thank You

You might also like