CS3501-COMPILER DESIGN
INPUT BUFFERING
presented by
MITHRA.S.J
INTRODUCTION
Input buffering is a concept of lexical analyzer
Lexical analyzer:reads the sourceprogram
character by character and genareates tokens
How the tokens are read by the lexical analyser?
there must be some strategyto read the input
program and this strategy is called input buffering
PROGRAM STORAGE FORMAT
Input program is stored in hard disk.
i n t a , b ;
Each line of code is stored in the
above format in the hard disk.
The lexical analyzer reads a block of
characters in a single system call
and generates tokens.
WORKING
Uses 2 pointers:
lexeme begin pointer
forward pointer
Lexeme begin pointer: always points the beginning of lexeme.
Forward pointer: moves forward until a lexeme is recognized
Initially, both pointers are pointed at the beginning
lbp
i n t a , b ;
fp
lbp
i n t a , b ;
fp
lbp
i n t a , b ;
fp
lbp
i n t a , b ;
fp
lbp
i n t a , b ;
fp
lbp
i n t a , b ;
fp
“int”is recognized as a lexeme.
When the fp reaches a blankstate(i.e., when condition for lexeme
is not satisfied) lbp is repositioned.
Similarly, all the lexemes are read.
SCHEMES OF BUFFERING
1. One Buffer Scheme
2. Two Buffer Scheme
1.One Buffer Scheme
uses one buffer to read the input
so,when a program has more than one line the buffer
should be overridden.
Overriding operation takes more time
To overcome this,we are in need for sentinals.
SENTINALS
We can combine the buffer-end test with the test for
the current character if we extend each buffer to hold
a sentinel character at the end.
The sentinel is a special character that cannot be part
of the source program
Sentinals are simply characters that denotes the end of
one line.for eg.:EOF(end of file) character.
i n t a , b ; EOF SENTINAL
2.Two Buffer scheme
Uses two buffers
Overriding do not take separate time. When 1st buffer is being read the 2nd buffer
is overridden and vice versa
To indicate end of each line we use sentinel at end of each buffer
When EOF of buffer-1 is reached lbp and fp are pointed to beginning of buffer-2
and vice versa.
lbp lbp
i n t a , b ; EOF a = 5 ; EOF
fp fp
LEXEMES:
int keyword
a identifier
b identifier
; symbol
= operator
5 constant
PSEUDO CODE FOR 2 BUFFER SCHEME
switch(*fp++){
case ‘EOF’:
if(fp==EOFbuffer1) then
fp=buffer2[0];
else if(fp==EOFbuffer2) then
fp=buffer1[0];
break;
case character:
call parser();
break;
}
THANK YOU