0% found this document useful (0 votes)
80 views

Regular Expressions To Finite Automata: - High-Level Sketch

The document describes the process of converting regular expressions to finite automata. It involves breaking down regular expressions into constituent subexpressions, creating simple NFAs for basic symbols, and combining the NFAs using rules for operators like concatenation, alternation, and exponentiation. The resulting NFA is then converted to a DFA by simulating the NFA and representing each state as a non-empty subset of NFA states.

Uploaded by

Ankit Rajpal
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views

Regular Expressions To Finite Automata: - High-Level Sketch

The document describes the process of converting regular expressions to finite automata. It involves breaking down regular expressions into constituent subexpressions, creating simple NFAs for basic symbols, and combining the NFAs using rules for operators like concatenation, alternation, and exponentiation. The resulting NFA is then converted to a DFA by simulating the NFA and representing each state as a non-empty subset of NFA states.

Uploaded by

Ankit Rajpal
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Regular Expressions to Finite Automata

• High-level sketch

NFA

Regular
DFA
expressions

Lexical Table-driven
Specification Implementation of DFA

59
Regular Expressions to NFA (0)

• The steps are:


– Break down the RE into constituent sub-
expressions
– Create simple NFA’s for each basic symbol
– Combine the NFA’s using rules for the
operators (concatenation, alternation,
exponentiation
• Note: details here differ slightly from book, but
are equivalent
60
Regular Expressions to NFA (1)

• For each kind of rexp, define an NFA


– Notation: NFA for rexp M

For ϵ
ϵ

For input a
a

61
Regular Expressions to NFA (2)

For AB
A B

(slightly different in book)

For A | B
ϵ B ϵ
ϵ
ϵ A

62
Regular Expressions to NFA (3)

For A*
ϵ

ϵ A ϵ

63
Example of RegExp -> NFA conversion

• Consider the regular expression


(1|0)*1
• The NFA is

ϵ C
1 ϵ
E 1
A ϵ B ϵ
ϵ 0 ϵ
G H I
D F
ϵ

64
NFA to DFA. The Trick

• Simulate the NFA


• Each state of DFA
= a non-empty subset of states of the NFA
• Start state
= the set of NFA states reachable through ϵ-moves from NFA
start state
See ϵ -closure() algorithm, p.69 in textbook
• Add a transition S →a S’ to DFA iff
– S’ is the set of NFA states reachable from any state in S
after seeing the input a
• considering ϵ -moves as well

65
NFA -> DFA Example: Machine for (1|0)*1

C
1 ϵ
ϵ E
ϵ 1
A B 0 G ϵ H I
ϵ D F ϵ

•NFA Start state is marked

66
ϵ -closure(A) = DFA Start State
ϵ -closure(A) ϵ
1 ϵ
ϵ C E
A ϵ B G ϵ
1
0 H I
ϵ D F ϵ

ϵ
(States reachable from A on ϵ−transitions)
S
Call this “S”.
ABCDH Where can we get from
S on input “0”?
(ϵ -closure(F))
67
Second DFA state, Reachable on 0 from
Start State
ϵ -closure(F) ϵ
1 ϵ
ϵ C E
A ϵ B G ϵ
1
0 H I
ϵ D F ϵ

ϵ
T 0
S 0 BCDFGH
ABCDH
Call this “T”. Next: back to “S” to look
Where can we get for another possible state
from T on input “0”? 68
Answer: only back to T
DFA Start State Again

ϵ
1 ϵ
ϵ C E
A ϵ B G ϵ
1
0 H I
ϵ D F ϵ

From “S” again:


Where can we get on
input “1”?
ϵ-closure(E), + I
69
Third DFA state, Reachable on “1” From
Start State
ϵ-closure(E), + I ϵ
1 ϵ
ϵ C E
A ϵ B G ϵ
1
0 H I
ϵ D F ϵ

ϵ
T 0 Call this “U”.
Because this
S 0 BCDFGH
includes
0 1 state I, it is
ABCDH 1 an accepting
1 BCDEGHI state.
U 70
The Resultant DFA: Three States

(1|0)*1
0
0 T
S 0 1
1
1 U

No more distinct states were found, so


this is the complete DFA.

71
NFA to DFA. Remarks

• An NFA may be in many states at any time

• How many different states ?

• If there are N states, the NFA must be in


some subset of those N states

• How many non-empty subsets are there?


– 2N - 1 = finitely many

72
Implementation

• A DFA can be implemented by a 2D table T[]


– One dimension is “states”
– Other dimension is “input symbols”
– For every transition Si →a Sk define T[i,a] = k
• DFA “execution”
– If in state Si and input a, read T[i,a] = k and skip to
state Sk
– Very efficient

73
Table Implementation of a DFA

0
0 T

S 0 1
1
1 U

(1|0)*1
0 1 “U” is the
S T U accepting
state, so it
T T U is marked
with a “*”.
U* T U

74
Algorithm to simulate DFA (Fig 3.22, p.116)
array move[ ] input int final(int s) {/* “1” iff s is final */
return (s == 2);
state 0 1
}
0 1 2 int scan() {
1 1 2 int s = 0; -- current state
int c, move[3,2];
2* 1 2 c = getchar();
while (c != EOF) {
s = move[s,c];
S,T, & U are now states c = getchar();
0, 1, and 2
}
if (final(s)) return (YES);
This algorithm reads
an input string, else return (NO);
attempting to verify }
it using a DFA encoded
as a table.
75
Implementation (Wrap-up)

• NFA -> DFA conversion is at the heart of tools


such as lex and flex

• But, DFAs can be huge

• There are algorithms for reducing DFA’s but


we will not look at this issue

76
Lex and Flex: Scanner Generators

Lex source lex.yy.c


program Lex contains
foo.l compiler
function
yylex( )

a.out
lex.yy.c C (binary
compiler
executable)

candidate token
a.out
text file stream

77
Lex “Program” Structure

%{
.....C header stuff, copied into lex.yy.c.....
%}
.....Regular Definitions....
%%
.....Translation Rules for all tokens.....
%%
.....Auxiliary C Procedures, also copied
into lex.yy.c

78
Regular Definitions and Translation Rules

D1 R1
D2 R2
D3 R3
%%
P1 { action1 }
P2 { action2 }
P3 { action3 }
P4 { action4 }

79
Earlier Example Token Specs
(Added ASSIGN token)

Token Code lexval


“if” IF -
“then” THEN -
“else” ELSE -
id ID (symtab entry)
num NUM (value)
:= ASSIGN -
< RELOP LT
<= RELOP LE
= RELOP EQ
<> RELOP NE
> RELOP GT
>= RELOP GE
80
A Lex Implementation
%{
#define IF 3
#define THEN 4
#define ELSE 5 /* Regular definitions */
#define ID 6 delim [ \t\n]
#define NUM 7 ws {delim}+
#define RELOP 8 letter [A-Za-z]
#define ASSIGN 10 digit [0-9]
/* Subtypes of RELOP */ id {letter}({letter}|{digit})*
#define LT 1 num {digit}+
#define LE 2 %%
#define EQ 3
#define NE 4
#define GT 5
#define GE 6
int yylval; /* tokenval */
#include "sym.h"
Sym *insert(),*lookup();
%}
81
A Few Notes about Lex

• When a pattern is matched, the corresponding action


is executed
• Static char array “yytext” contains matched string
• If the action does not return, Lex tries to consume
more input, looking for another token
– Therefore actions normally execute a return statement with
a token code
– To “throw away” something, just use an “empty” action
• If more than one pattern matches a prefix of the
remaining input, chooses the longest matching string
• If more than one pattern matches the same input,
chooses the first one defined
• Unrecognized input is copied to stdout
82
The Translation Rules
{ws} { /* throw it away */ }
if { return(IF); }
then { return(THEN); }
else { return(ELSE); }
{id} { if ((yylval=(int)lookup(yytext))==NULL)
yylval=(int)insert(yytext,ID); return(ID); }
{num} { yylval=atoi(yytext); return(NUM); }
"<" { yylval=LT; return(RELOP); }
"<=" { yylval=LE; return(RELOP); }
"<>" { yylval=NE; return(RELOP); }
"=" { yylval=EQ; return(RELOP); }
">" { yylval=GT; return(RELOP); }
">=” { yylval=GE; return(RELOP); }
":=" { return (ASSIGN); }
%%
/* Code for insert( ) and lookup( ) can go here... */

83
Demonstration of Lex Example

/* A Short mainline to test the lexical analyser*/

extern int yylval;


main() {
int tok;
init_strtab(20000); /* part of symbol table package */
do {
tok = yylex( );
printf("Token %d Value %d\n", tok, yylval);
}
while (tok != 0);
}

84
Compile, Link and Run..

torch% a.out typed input


torch% lex scanner.l if a<=26 then b:=sum else b:=0 string
Token 3 Value 0
torch% cc main.c sym.c lex.yy.c -ll Token 6 Value 53416
main.c: Token 8 Value 2
sym.c: Token 7 Value 26 “left-overs”
lex.yy.c: Token 4 Value 26
Linking: Token 6 Value 53440
Token 10 Value 53440
torch%
Token 6 Value 53464
Token 5 Value 53464
Token 6 Value 53440
Token 10 Value 53440
IF 3 Token 7 Value 0
THEN 4 Token 0 Value 0
ELSE 5
ID 6 torch%
NUM 7
RELOP 8
ASSIGN 10 85
86
First Glimpse of the Term Project...

• A small object-oriented language, compiled to MIPS


• “Cool”: Classroom Objected-Oriented Language”
• “UnCool” (because I simplified it, so it isn’t Cool
anymore!)
• Multiple classes, all in single source file
• Int, Bool, String, and Int array primitive types
• Public Methods and private Attribute fields
– With Attribute initializers
• Assignment, expressions, if and while
• Simple input-output of numbers, and output of Strings
– (Uses built-in facilities in MIPS simulator)

87
Example Program: Factorial.cl

class Fac {
computeFac(num : Int) : Int {
if num < 1 then 1
else num * (computeFac(num-1)) fi
};
};
class Main {
main() : Object {
{ out_string("Enter a number ");
out_int((new Fac).computeFac(in_int()));
out_string("\n");
}
};
}; 88
Project (continued)

• Language spec is on the web


– (see "UnCool Manual" in Downloads section)
• Bison-ready grammar file will be provided
– You will need to provide a lexical analyser (Can use flex)
• On-line bison and flex manuals available from course web page
• Symbol table manager and several test programs will be provided
• “Reference compiler” binary on torch (see manual)
• Three passes:
– Parsing and construction of an AST
– Semantic Analysis Type checking
– Code Generation
• The reference compiler is about 1900 lines of code +bison & flex
files
89
Programming Assignment 1

• Familiarize yourself with the UnCool language


– and the MIPS simulator
• by writing and debugging a small program
– Quick Sort, to sort a list of numbers
– starting point provided (as a Java program)

• Complete assignment posted on web


• Due one week from today (January 23)
– worth 5 points, of the 35% for the term Project

90

You might also like