Unit 3 Symbol Table Management
Unit 3 Symbol Table Management
Unit-3
Symbol-Table Management
Symbol Tables
❖ A symbol table is a major data structure used in a compiler:
◆ Associates attributes with identifiers used in a program
◆ For instance, a type attribute is usually associated with each identifier
◆ A symbol table is a necessary component
# Definition (declaration) of identifiers appears once in a program
# Use of identifiers may appear in many places of the program text
◆ Identifiers and attributes are entered by the analysis phases
# When processing a definition (declaration) of an identifier
# In simple languages with only global variables and implicit declarations:
• The scanner can enter an identifier into a symbol table if it is not already there
# In block-structured languages with scopes and explicit declarations:
• The parser and/or semantic analyzer enter identifiers and corresponding attributes
Unit-3
Symbol Table Interface
❖ The basic operations defined on a symbol table include:
◆ allocate – to allocate a new empty symbol table
◆ free – to remove all entries and free the storage of a symbol table
◆ insert – to insert a name in a symbol table and return a pointer to its entry
◆ lookup – to search for a name and return a pointer to its entry
◆ set_attribute – to associate an attribute with a given entry
◆ get_attribute – to get an attribute associated with a given entry
Unit-3
Basic Implementation Techniques
❖ First consideration is how to insert and lookup names
❖ Variety of implementation techniques
❖ Unordered List
◆ Simplest to implement
◆ Implemented as an array or a linked list
◆ Linked list can grow dynamically – alleviates problem of a fixed size array
◆ Insertion is fast O(1), but lookup is slow for large tables – O(n) on average
❖ Ordered List
◆ If an array is sorted, it can be searched using binary search – O(log2 n)
◆ Insertion into a sorted array is expensive – O(n) on average
◆ Useful when set of names is known in advance – table of reserved words
Unit-3
Hash Tables and Hash
Functions
❖A hash table is an array with index range: 0 to TableSize – 1
❖ Most commonly used data structure to implement symbol
tables
❖ Insertion and lookup can be made very fast – O(1)
❖ A hash function maps an identifier name into a table index
◆ A hash function, h(name), should depend solely on name
◆ h(name) should be computed quickly
◆ h should be uniform and randomizing in distributing names
◆ All table indices should be mapped with equal probability
◆ Similar names should not cluster to the same table index
Unit-3
Hash Functions
❖ Hash functions can be defined in many ways . . .
❖ A string can be treated as a sequence of integer words
◆ Several characters are fit into an integer word
◆ Strings longer than one word are folded using exclusive-or or addition
◆ Hash value is obtained by taking integer word modulo TableSize
❖ We can also compute a hash value character by character:
◆ h(name) = (c0 + c1 + … + cn–1) mod TableSize, where n is name length
◆ h(name) = (c0 * c1 * … * cn–1) mod TableSize
◆ h(name) = (cn–1 + a ( cn–2 + … + a ( c1 + a c0))) mod TableSize
◆ h(name) = (c0 * cn–1 * n) mod TableSize
Unit-3
Implementing a Hash Function
// Hash string s
// Hash value = (sn-1 + 16(sn-2 + .. + 16(s1+16s0)))
// Return hash value (independent of table size)
unsigned hash(char* s) {
unsigned hval =
0; while (*s != ’\0’)
{
hval = (hval << 4) + *s;
s++;
}
return hval;
}
Unit-3
Another Hash Function
hval += s[0];
if (s[1] == 0) return hval;
Last 3 characters
hval += s[1]<<8; are handled in a
if (s[2] == 0) return hval; special way
hval += s[2]<<16;
return hval;
}
Unit-3
Resolving Collisions – Open Addressing
❖ A collision occurs when h(name1) = h(name2) and name1 s name2
❖ Collisions are inevitable because
◆ The name space of identifiers is much larger than the table size
❖ How to deal with collisions?
◆ If entry h(name) is occupied, try h2(name), h3(name), etc.
◆ This approach is called open addressing
◆ h2(name) can be h(name) + 1 mod TableSize
linear probing
◆ h3(name) can be h(name) + 2 mod TableSize
Hash Value Name Attributes
0 sort
1
2 size
. j
. a
TableSize – 1
Unit-3
Chaining by Separate Lists
❖ Drawbacks of open addressing:
◆ As the array fills, collisions become more frequent – reduced performance
◆ Table size is an issue – dynamically increasing the table size is a difficulty
Unit-3
Symbol Class Definition
class Symbol { // Symbol class definition
friend class Table; // To access private members
public:
Symbol(char* s); // Initialize symbol with name s
~Symbol(); // Delete name and clear pointers
const char* id(); // Return pointer to symbol
name
Symbol* nextinlist(); // Next symbol in list
Symbol* nextinbucket(); // Next symbol in bucket
. . .
private: // Other methods
Unit-3
Symbol Class Implementation
// Initialize symbol and copy s
Symbol::Symbol(char* s){
name = new char[strlen(s)
+1]; strcpy(name,s);
next = list = 0;
}
// Delete name and clear pointers
Symbol::~Symbol(){
delete [] name;
name = 0;
next = list
} = 0;
const char* Symbol::id() {return name;}
Symbol* Symbol::nextinbucket() {return next;}
Symbol* Symbol::nextinlist() {return list;}
Unit-3
Symbol Table Class
Definition
const unsigned HT_SIZE = 1021; // Hash Table Size
class Table { // Symbol Table class
public:
Table(); // Initialize table
Symbol* clear(); // Clear symbol table
Symbol* lookup(char*s); // Lookup name s
Symbol* lookup(char*s,unsigned h); // Lookup s with hash h
Symbol* insert(char*s,unsigned h); // Insert s with hash h
Symbol* lookupInsert(char*s); // Lookup and insert s
Symbol* symlist() {return first;} // List of symbols
unsigned symbols(){return count;} // Symbol count
. . . // Other methods
private:
Symbol* ht[HT_SIZE]; // Hash table
Symbol* first; // First inserted
symbol
Symbol* last; // Last inserted symbol
unsigned count; // Symbol count
};
Unit-3
Initialize and Clear a Symbol Table
// Initialize a symbol table
Table::Table() {
for (int i=0; i<HT_SIZE; i++) ht[i] = 0;
first = last = 0;
count = 0;
}
Unit-3
Lookup a Name in a Symbol Table
// Lookup name s in symbol table
// Return pointer to found
symbol
// Return
Symbol* NULL if symbol not
Table::lookup(char* s) {
found
unsigned h = hash(s);
return lookup(s,h);
}
// Lookup name s with hash value
h
// Hash value is passed to
avoid its computation
Symbol* Table::lookup(char* s, unsigned h) {
unsigned index = h % HT_SIZE;
Symbol* sym = ht[index];
while (sym != 0) {
if (strcmp(sym->name, s) == 0)
break; sym = sym->next;
}
return sym;
}
Unit-3
Insert a Name into a Symbol Table
// Insert name s with a given hash value h
// New symbol allocated
is
// New symbol inserted at front of a bucket list
// Return
is pointer to newly allocated symbol
// New Table::insert(char*
Symbol* symbol also linked s, unsigned
at end ofh)symbol
{ list
is
unsigned index in table
= h % HT_SIZE;
Symbol* sym = new Symbol(s);
sym->next = ht[index];
ht[index] = sym;
if (count == 0) { first = last =
sym; } else {
last -> list = sym;
last = sym;
}
count++;
return sym;
}
Unit-3
Illustrating Symbol Insertion
Table Structure
ht[0]
[1]
"i"
[2]
.
"n"
.
.
name list next
[HT_SIZE-1]
"a" "main"
first
last
Last Symbol
4 inserted in blue
count
Unit-3
Lookup and then Insert a Name
// Lookup first and then Insert name s
// If name s exists then return pointer to its symbol
// Otherwise, insert a new symbol and copy name s
// Return address of newly added symbol
Symbol* Table::lookupInsert(char* s) {
unsigned h = hash(s); // Computed once
Symbol* sym;
sym = lookup(s,h); // Locate symbol first
if (sym == 0) { // If not found
sym = // Insert a new symbol
insert(s,h);
}
} return sym;
Unit-3