0% found this document useful (0 votes)
9 views

Unit 4 Symbol Table

symbol table
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Unit 4 Symbol Table

symbol table
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Symbol Tables

™ A symbol table is a major data structure used in a compiler:


’ Associates attributes with identifiers used in a program
’ For instance, a type attribute is usually associated with each identifier
’ A symbol table is a necessary component
— Definition (declaration) of identifiers appears once in a program
— Use of identifiers may appear in many places of the program text
’ Identifiers and attributes are entered by the analysis phases
— When processing a definition (declaration) of an identifier
— In simple languages with only global variables and implicit declarations:
z
The scanner can enter an identifier into a symbol table if it is not already there
— In block-structured languages with scopes and explicit declarations:
z The parser and/or semantic analyzer enter identifiers and corresponding attributes

’ Symbol table information is used by the analysis and synthesis phases


— To verify that used identifiers have been defined (declared)
— To verify that expressions and assignments are semantically correct – type checking
— To generate intermediate or target code

Symbol Tables, Hashing, and Hash Tables – 1 Compiler Design – © Muhammed Mudawwar
Symbol Table Interface
™ The basic operations defined on a symbol table include:
’ allocate – to allocate a new empty symbol table
’ free – to remove all entries and free the storage of a symbol table
’ insert – to insert a name in a symbol table and return a pointer to its entry
’ lookup – to search for a name and return a pointer to its entry
’ set_attribute – to associate an attribute with a given entry
’ get_attribute – to get an attribute associated with a given entry

™ Other operations can be added depending on requirement


’ For example, a delete operation removes a name previously inserted
— Some identifiers become invisible (out of scope) after exiting a block

™ This interface provides an abstract view of a symbol table


™ Supports the simultaneous existence of multiple tables
™ Implementation can vary without modifying the interface
Symbol Tables, Hashing, and Hash Tables – 2 Compiler Design – © Muhammed Mudawwar
Basic Implementation Techniques
™ First consideration is how to insert and lookup names
™ Variety of implementation techniques
™ Unordered List
’ Simplest to implement
’ Implemented as an array or a linked list
’ Linked list can grow dynamically – alleviates problem of a fixed size array
’ Insertion is fast O(1), but lookup is slow for large tables – O(n) on average

™ Ordered List
’ If an array is sorted, it can be searched using binary search – O(log2 n)
’ Insertion into a sorted array is expensive – O(n) on average
’ Useful when set of names is known in advance – table of reserved words

™ Binary Search Tree


’ Can grow dynamically
’ Insertion and lookup are O(log2 n) on average

Symbol Tables, Hashing, and Hash Tables – 3 Compiler Design – © Muhammed Mudawwar
Hash Tables and Hash Functions
™ A hash table is an array with index range: 0 to TableSize – 1

™ Most commonly used data structure to implement symbol tables

™ Insertion and lookup can be made very fast – O(1)

™ A hash function maps an identifier name into a table index


’A hash function, h(name), should depend solely on name
’ h(name) should be computed quickly
’h should be uniform and randomizing in distributing names
’ All table indices should be mapped with equal probability
’ Similar names should not cluster to the same table index

Symbol Tables, Hashing, and Hash Tables – 4 Compiler Design – © Muhammed Mudawwar
Hash Functions
™ Hash functions can be defined in many ways . . .

™ A string can be treated as a sequence of integer words


’ Several characters are fit into an integer word
’ Strings longer than one word are folded using exclusive-or or addition
’ Hash value is obtained by taking integer word modulo TableSize

™ We can also compute a hash value character by character:


’ h(name) = (c0 + c1 + … + cn–1) mod TableSize, where n is name length
’ h(name) = (c0 * c1 * … * cn–1) mod TableSize
’ h(name) = (cn–1 + D cn–2 + … + D c1 + Dc0))) mod TableSize
’ h(name) = (c0 * cn–1 * n) mod TableSize

Symbol Tables, Hashing, and Hash Tables – 5 Compiler Design – © Muhammed Mudawwar
Implementing a Hash Function
// Hash string s
// Hash value = (sn-1 + 16(sn-2 + .. + 16(s1+16s0)))
// Return hash value (independent of table size)

unsigned hash(char* s) {
unsigned hval = 0;
while (*s != ’\0’) {
hval = (hval << 4) + *s;
s++;
}
return hval;
}

Symbol Tables, Hashing, and Hash Tables – 6 Compiler Design – © Muhammed Mudawwar
Another Hash Function
// Treat string s as an array of unsigned integers
// Fold array into an unsigned integer using addition
// Return hash value (independent of table size)
unsigned hash(char* s) {
unsigned hval = 0;
while (s[0]!=0 && s[1]!=0 && s[2]!=0 && s[3]!=0){
unsigned u = *((unsigned*) s);
hval += u; s += 4;
}
if (s[0] == 0) return hval;
hval += s[0];
if (s[1] == 0) return hval; Last 3 characters
hval += s[1]<<8; are handled in a
if (s[2] == 0) return hval; special way
hval += s[2]<<16;
return hval;
}
Symbol Tables, Hashing, and Hash Tables – 7 Compiler Design – © Muhammed Mudawwar
Resolving Collisions – Open Addressing
™ A collision occurs when h(name1) = h(name2) and name1 z name2
™ Collisions are inevitable because
’ The name space of identifiers is much larger than the table size

™ How to deal with collisions?


’ If entry h(name) is occupied, try h2(name), h3(name), etc.
’ This approach is called open addressing
’ h2(name) can be h(name) + 1 mod TableSize
linear probing
’ h3(name) can be h(name) + 2 mod TableSize

Hash Value Name Attributes


0 sort
1
2 size
. j
. a
TableSize – 1

Symbol Tables, Hashing, and Hash Tables – 8 Compiler Design – © Muhammed Mudawwar
Chaining by Separate Lists
™ Drawbacks of open addressing:
’ As the array fills, collisions become more frequent – reduced performance
’ Table size is an issue – dynamically increasing the table size is a difficulty

™ An alternative to open addressing is chaining by separate lists


’ The hash table is an array of pointers to linked lists called buckets
’ Collisions are resolved by inserting a new identifier into a linked list
’ Number of identifiers is no longer restricted to table size
’ Lookup is O(n/TableSize) when number of identifiers exceeds TableSize

Hash Value Name Attrib Next


0 sort
1
2 j size
.
.
. a
TableSize – 1

Symbol Tables, Hashing, and Hash Tables – 9 Compiler Design – © Muhammed Mudawwar
Definition
Symbol table: A data structure used by a compiler to keep
track of semantics of names.
• Data type.
• When is used: scope.
. The effective context where a name is valid.
• Where it is stored: storage address.
Operations:
• Search: whether a name has been used.
• Insert: add a name.
• Delete: remove a name when its scope is closed.

Compiler notes #5, 20060512, Tsan-sheng Hsu 2


Some possible implementations
Unordered list:
. for a very small set of variables;
. coding is easy, but performance is bad for large number of variables.

Ordered linear list:


. use binary search;
. insertion and deletion are expensive;
. coding is relatively easy.

Binary search tree:


. O(log n) time per operation (search, insert or delete) for n variables;
. coding is relatively difficult.

Hash table:
. most commonly used;
. very efficient provided the memory space is adequately larger than the number
of variables;
. performance maybe bad if unlucky or the table is saturated;
. coding is not too difficult.

Compiler notes #5, 20060512, Tsan-sheng Hsu 3


Hash table
Hash function h(n): returns a value from 0, . . . , m − 1, where n
is the input name and m is the hash table size.
• Uniformly and randomly.
Many possible good designs.
• Add up the integer values of characters in a name and then take the
remainder of it divided by m.
• Add up a linear combination of integer values of characters in a name,
and then take the remainder of it divided by m.
Resolving collisions:
• Linear resolution:try (h(n) + 1) mod m, where m is a large prime
number, and then (h(n) + 2) mod m, . . ., (h(n) + i) mod m.
• Chaining: most popular.
. Keep a chain on the items with the same hash value.
. Open hashing.

• Quadratic-rehashing:
. try (h(n) + 12) mod m, and then
. try (h(n) + 22) mod m, . . .,
. try (h(n) + i2) mod m.

Compiler notes #5, 20060512, Tsan-sheng Hsu 4


Performance of hash table
Performance issues on using different collision resolution
schemes.
Hash table size must be adequately larger than the maximum
number of possible entries.
Frequently used variables should be distinct.
• Keywords or reserved words.
• Short names, e.g., i, j and k.
• Frequently used identifiers, e.g., main.
Uniformly distributed.

Compiler notes #5, 20060512, Tsan-sheng Hsu 5


Contents in a symbol table
Possible entries in a symbol table:
• Name: a string.
• Attribute:
. Reserved word
. Variable name
. Type name
. Procedure name
. Constant name
. ···
• Data type.
• Storage allocation, size, . . .
• Scope information: where and when it can be used.
• ···

Compiler notes #5, 20060512, Tsan-sheng Hsu 6


How names are stored
Fixed-length name: allocate a fixed space for each name
allocated.
• Too little: names must be short.
• Too much: waste a lot of spaces.
NAME ATTRIBUTES STORAGE ADDR ...
s o r t
a
r e a d a r r a y
i 2
Variable-length name:
• A string of space is used to store all names.
• For each name, store the length and starting index of each name.
NAME ATTRIBUTES STORAGE ADDR ...
index length
0 5
5 2
7 10
17 3
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
s o r t $ a $ r e a d a r r a y $ i 2 $

Compiler notes #5, 20060512, Tsan-sheng Hsu 7


Handling block structures
main() /* C code */
{ /* open a new scope */
int H,A,L; /* parse point A */
...
{ /* open another new scope */
float x,y,H; /* parse point B */
...
/* x and y can only be used here */
/* H used here is float */
...
} /* close an old scope */
...
/* H used here is integer */
...
{ char A,C,M; /* parse point C */
...
}
}

Nested blocks mean nested scopes.


Two major ways for implementation:
• Approach 1: multiple symbol tables in one stack.
• Approach 2: one symbol table with chaining.

Compiler notes #5, 20060512, Tsan-sheng Hsu 8


Multiple symbol tables in one stack
An individual symbol table for each scope.
• Use a stack to maintain the current scope.
• Search top of stack first.
• If not found, search the next one in the stack.
• Use the first one matched.
• Note: a popped scope can be destroyed in a one-pass compiler, but it
must be saved in a multi-pass compiler.
main()
{ /* open a new scope */
int H,A,L; /* parse point A */ searching
... direction
{ /* open another new scope */
float x,y,H; /* parse point B */
...
/* x and y can only be used here */ S.T. for S.T. for
/* H used here is float */ x,y,H A,C,M
...
} /* close an old scope */ S.T. for S.T. for
S.T. for
... H, A, L H, A, L H, A, L
/* H used here is integer */
...
{ char A,C,M; /* parse point C */
parse point A parse point B parse point C
...
}
}

Compiler notes #5, 20060512, Tsan-sheng Hsu 9


Pros and cons for multiple symbol tables
Advantage:
• Easy to close a scope.
Disadvantage: Difficulties encountered when a new scope is
opened .
• Need to allocate adequate amount of entries for each symbol table if
it is a hash table.
. Waste lots of spaces.
. A block within a procedure does not usually have many local variables.
. There may have many global variables, and many local variables when
a procedure is entered.

Compiler notes #5, 20060512, Tsan-sheng Hsu 10


One symbol table with chaining (1/2)
A single global table marked with the scope information.
. Each scope is given a unique scope number.
. Incorporate the scope number into the symbol table.

Two possible codings (among others):


• Hash table with chaining.
main() . Chaining at the front when names hashed into the same location.
{ /* open a new scope */
int H,A,L; /* parse point A */
...
H(2) H(1) H(1)
{ /* open another new scope */
float x,y,H; /* parse point B */ L(1)
L(1)
...
/* x and y can only be used here */ x(2) C(3)
/* H used here is float */ y(2) M(3)
...
} /* close an old scope */ A(1) A(3) A(1)
...
/* H used here is integer */ symbol table:
... hash with chaining
{ char A,C,M; /* parse point C */ parse point B parse point C
...
}
}

Compiler notes #5, 20060512, Tsan-sheng Hsu 11


One symbol table with chaining (2/2)
A second coding choice:
• Binary search tree with chaining.
. Use a doubly linked list to chain all entries with the same name.
main()
{ /* open a new scope */
int H,A,L; /* parse point A */
...
{ /* open another new scope */
H(2) H(1) H(1)
float x,y,H; /* parse point B */
...
A(1) L(1) A(1) A(3) L(1)
/* x and y can only be used here */
/* H used here is float */
... x(2) C(3) M(3)
} /* close an old scope */
... y(2)
/* H used here is integer */
parse point B parse point C
...
{ char A,C,M; /* parse point C */
...
}
}

Compiler notes #5, 20060512, Tsan-sheng Hsu 12


Pros and cons for a unique symbol table
Advantage:
• Does not waste spaces.
• Little overhead in opening a scope.
Disadvantage: It is difficult to close a scope.
• Need to maintain a list of entries in the same scope.
• Using this list to close a scope and to reactive it for the second pass if
needed.

Compiler notes #5, 20060512, Tsan-sheng Hsu 13


Records and fields
The “with” construct in PASCAL can be considered an
additional scope rule.
• Field names are visible in the scope that surrounds the record declara-
tion.
• Field names need only to be unique within the record.
Another example is the “using namespace” directive in C++.
Example (PASCAL code):
A, R: record
A: integer
X: record
A: real;
C: boolean;
end
end
...
R.A := 3; /* means R.A := 3; */
with R do
A := 4; /* means R.A := 4; */
...

Compiler notes #5, 20060512, Tsan-sheng Hsu 14


Implementation of field names
Two choices for handling field names:
• Allocate a symbol table for each record type used.
another symbol table
main symbol table
A integer
A record
X record

R record another symbol table


A real

another symbol table


C boolean
A integer

X record

another symbol table


A real

C boolean

• Associate a record number within the field names.


. Assign record number #0 to names that are not in records.
. A bit time consuming in searching the symbol table.
. Similar to the scope numbering technique.

Compiler notes #5, 20060512, Tsan-sheng Hsu 15


Locating field names
Example:
with R do
begin
A := 3;
with X do
A := 3.3
end
If each record (each scope) has its own symbol table,
• then push the symbol table for the record onto the stack.
If the record number technique is used,
• then keep a stack containing the current record number;
• During searching, succeed only if it matches the name and the current
record number.
• If fail, then use next record number in the stack as the current record
number and continue to search.
• If everything fails, search the normal main symbol table.

Compiler notes #5, 20060512, Tsan-sheng Hsu 16

You might also like