0% found this document useful (0 votes)
122 views11 pages

Grab Age Collector in C Language

This document discusses developing a garbage collector library for the C programming language. It aims to improve C's functionality by adding automatic dynamic memory management. The library includes routines that support garbage collection with an interface similar to standard library functions. The document covers myths and facts about garbage collection, common garbage collection algorithms like reference counting and tracing, the architecture and design of the proposed library, and its implementation and methodology. It aims to provide C programmers the benefits of automatic memory management without needing to track memory usage and freeing.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
122 views11 pages

Grab Age Collector in C Language

This document discusses developing a garbage collector library for the C programming language. It aims to improve C's functionality by adding automatic dynamic memory management. The library includes routines that support garbage collection with an interface similar to standard library functions. The document covers myths and facts about garbage collection, common garbage collection algorithms like reference counting and tracing, the architecture and design of the proposed library, and its implementation and methodology. It aims to provide C programmers the benefits of automatic memory management without needing to track memory usage and freeing.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 11

“A Garbage Collector For the

C Programming Language “

ABSTRACT

Garbage collection absolves the developer from tracking memory usage and knowing
when to free memory. If unreferenced locations are not collected, they pollute the memory
and may lead to exhaustion of available memory during the execution of the program.
Garbage collection (GC) is an important part of many modern language implementations.
The benefits of garbage collection in comparison with manual memory management
techniques are well known to programmers who have used high-level languages like Java and
C#. Software professionals estimate that the programming effort required to manually
perform dynamic memory management is approximately 40% of the total cost of developing
a large software system. Unfortunately, these benefits have not traditionally been available to
developers in ‘C’ programming language because of the lack of support for the GC feature.
In this paper, I proposed a library that builds in this feature, into the ‘C’ programming
language and thus improving its functionality. We have in this library, a set of routines that
embed the feature of automatic dynamic memory management in to the ‘C’ programming
language. These routines defined in our library form, the programmer’s Interface to our
garbage collector. The interface is simple and the syntax is very much similar to that of the
one already available in the standard library (stdlib.h) with very little modifications. In
addition to the routines that support GC we also have routines that support Finalizers amongst
others.
06K41A0548 A SEMINAR REPORT

CONTENTS

1. Introduction

a. Myths about GC

b. Facts about GC

2. Garbage collection algorithms

a. Reference counting collectors

b. Tracing collectors

3. Architecture and Design

4. Implementation and Methedoloty

a. Detailed algorithm adopted by our GC

5. Conclusion and Futre Improvements

6. Bibliography.

2
06K41A0548 A SEMINAR REPORT

INTRODUCTION
C is and has always been one of the most widely and popularly used programming
languages in the world. We, as a part of our B.Tech Degree course have done a lot of
programming in C and have always used the built in functions defined in the standard library
like “malloc”, “calloc” etc. to allocate memory and when the allocated memory is not needed
any more we use the built in function “free” to free up the allocated memory. What I have
attempted to improve the functionality of the ‘C’ programming language by developing a
library that supports garbage collection.
I was inspired by the presence of the GC features in JAVA and proceeded to develop
one for C. So now, the ‘C’ programmer is relived of the task of keeping track of what
memory locations are currently being referenced and what are not. If the programmer uses
our library for allocating memory dynamically, we will make sure (conservatively) that the
allocated memory is automatically freed (i.e.) garbage is collected when it is no longer
referenced.

Myths about GC
• GC is necessarily slower than manual memory management.
• GC will necessarily make my program pause.
• Manual memory management won't cause pauses.
• GC is incompatible with C.

Facts about GC
• Most allocated objects are dynamically referenced by a very small number of
Pointers. The most important small number is ONE.
• Most allocated objects have short lifetimes.
• Allocation patterns (size distributions, lifetime distributions) are bursty, not uniform.
• Optimal strategies can fail miserably.

3
06K41A0548 A SEMINAR REPORT

Garbage Collection Algorithms


Any garbage collection algorithm must ensure two basic things. First it must detect
garbage objects. Second it must reclaim the heap space used by the garbage objects and make
the space available again to the program.

Garbage detection is ordinarily accomplished by defining a set of roots and


determining reachability from the roots. An object is said to be reachable if there is some
path of reference from the roots by which the executing program can access the object. The
roots are always accessible to the program. Any object that is reachable from the roots is
considered to be live. Objects that are not reachable are considered garbage as they can no
longer affect the future course of program execution.

Some garbage collectors, however, may choose not to distinguish between genuine
object references and look-alikes. Such garbage collectors are called conservative because
they may not always free every unreferenced object. Sometimes a garbage object will be
wrongly considered to be live by a conservative collector, because an object reference look
alike referred to it. Conservative collectors trade-off an increase in garbage collection speed
for occasionally not freeing some actual garbage.

Two basic approaches to distinguishing live objects from garbage are reference
counting and tracing. Reference counting garbage collectors distinguish live objects from
garbage objects by keeping a count for each object on the heap. The count keeps track of the
number of references to that object. Tracing garbage collectors actually trace out the graph of
references starting with the root nodes. Objects that are encountered during the trace are
marked in some way. After the trace is complete, unmarked objects are known to be
unreachable and can be garbage collected

4
06K41A0548 A SEMINAR REPORT

Reference Counting Collectors

Reference counting was an early garbage collection strategy. In this approach, a


Reference count is maintained for each object on the heap. When an object is first created and
a reference to it is assigned to a variable, the object's reference count is set to one. When any
other variable is assigned a reference to that object, the object's count is incremented. When a
reference to an object goes out of scope or is assigned a new value, the object's count is
decremented. Any object with a reference count of zero can be garbage collected. When an
object is garbage collected, any objects that it refers to have their reference counts
decremented. In this way the garbage collection of one object may lead to the subsequent
garbage collection of other objects.
An advantage of this approach is that a reference counting collector can run in small
chunks of time closely interwoven with the execution of the program. This characteristic
makes it particularly suitable for real-time environments where the program can't be
interrupted for very long. A disadvantage is that reference counting does not detect cycles
wherein two or more objects that refer to one another. An example of a cycle is a parent
object that has a reference to a child object that has a reference back to the parent. These
objects will never have a reference count of zero even though they may be unreachable by the
roots of the executing program. Another disadvantage of reference counting is the overhead
of incrementing and decrementing the reference count each time. Because of the
disadvantages inherent in the reference counting approach, this technique is currently out of
favour.
Tracing Collectors
Tracing garbage collectors trace out the graph of object references starting with the
root nodes. Objects that are encountered during the trace are marked in some way. Marking is
generally done by either setting flags in the objects themselves or by setting flags in a
separate bitmap. After the trace is complete, unmarked objects are known to be unreachable
and can be garbage collected. The basic tracing algorithm is called "mark and sweep". This
name refers to the two phases of the garbage collection process. In the mark phase, the
garbage collector traverses the tree of references and marks each object it encounters. In the

5
06K41A0548 A SEMINAR REPORT

sweep phase, unmarked objects are freed, and the resulting memory is made available to the
executing program..

ARCHITECTURE AND DESIGN


Our garbage collection algorithm is invoked when any one of the following conditions
are met. The conditions are as follows,
a) Whenever the number of bytes that have been allocated by our allocation routine exceeds a
predefined limit. That is why we call our collector to be periodic and the programmer can set
this period if he wishes to. By default this value has been set to 640K bytes of memory.
b) Whenever the allocation routine is unable to allocate the memory as such. That is if malloc
returns NULL. In this case we call our collector to collect the garbage and then try to re-
satisfy the request for memory.
Our GC Library provides the following routines and they form the programmers
interface to the collector

a) void * GC_malloc( size_t gc_size )


Input : gc_size
Output : Starting address of the memory location that has just been allocated if successful
and NULL if allocation fails.
Description : This function allocates a memory location of size gc_size and returns the
address of that memory location if successful in allocation. If the requested memory cannot
be allocated even after Garbage collection then this routine returns null.

b) Void * GC_malloc_uncollectable( size_t gc_size )


Input : gc_size
Output : Starting address of the memory location that has just been allocated if successful
and NULL if allocation fails.
Description : This function allocates a memory location of size gc_size and returns the
address of that memory location if successful in allocation.
If therequested memory cannot be allocated even after Garbage collection then this
routine returns NULL. The speciality of this function is the memory allocated by
GC_malloc_uncollectable will not be freed during the garbage collection

6
06K41A0548 A SEMINAR REPORT

c) void * GC_malloc_finalizer( size_t gc_size , void ( * gc_finalizer)() )


Input: gc_size , gc_finalizer
Output: Starting address of the memory location that has just been allocated if
successful and NULL if allocation fails.
Description : This function allocates a memory location of size gc_size and returns the
address of that memory location if successful in allocation. If the requested memory cannot
be allocated even after Garbage collection then this routine returns NULL. The speciality of
this function is the memory allocated by GC_malloc_finalizer will be freed during the
garbage collection and during the freeing of this memory location the function gc_finalizer
will get executed.

d) void GC_collect_garbage()
Input: None
Output: None
Description: This function locates the garbage mark them and then it collects the same. The
user can periodically call this function to collect the garbage. This may improve the
performance.

e) void GC_free( void * gc_waste )


Input: The address of the memory location to be freed.
Output: None
Description: This function is responsible for freeing the memory location gc_waste and add
them to the free list. The memory allocated using the function GC_malloc_uncollectable can
be freed by using this function.

f) void GC_set_gc_rate ( size_t max_heap_size )


Input : It takes the memory size as argument.
Output : None

7
06K41A0548 A SEMINAR REPORT

Description : This function is responsible for setting the threshold value for allocated
memory. If the memory allocated dynamically by the user exceeds the threshold, the garbage
collection routine will be called automatically.

IMPLEMENTATION METHODOLOGY

The Two-Phase Abstraction:


Garbage collection automatically reclaims the space occupied by data objects that the
running program can never access again. Such data objects are referred to as garbage. The
basic functioning of our garbage collector consists, abstractly speaking of two phases.
 Distinguishing the live objects from the garbage or garbage detection.
 Reclaiming the garbage objects storage so that the running program can use it or
Garbage collection.
Algorithm – An Overview
1) User asks for allocation of memory.
2) We try to allocate the requested amount of memory.
3) If memory request is satisfied and memory so far allocated < GC Limit
Begin
Do book-keeping activity;
Return the address that corresponds to the allocated memory region.
End
4) If memory request is satisfied and memory so far allocated > GC Limit
Begin
Call the GC routine;
Do book-keeping activity;
Return the address that corresponds to the allocated memory region;
End
5) If memory request is not satisfied
Begin
Call the GC routine;
Try to reallocate;
If (allocation satisfied)
Return address of the allocated object
8
06K41A0548 A SEMINAR REPORT

Else
Return ERROR;
End if
End
Detailed Algorithm Adopted By Our Garbage Collector Steps

1) The user requests for allocation using the function “GC_malloc” or any of its Variants.
2) We try to satisfy his request by allocation an object of the size requested by the user from
the heap.
3) If the allocation in step 2 succeeds then we return the address of the allocated object after
doing book-keeping activity
4) If during the process of allocation in step 2 we were unable to allocate or if the allocation
so far done has exceeded a threshold limit (640kb by default), then we call our GC routine
that tries to collect unused memory (i.e. garbage).
5) Our GC Routine works as follows.
From the addresses that have been allocated (these are available from the bookkeeping
entries) search which of the objects pointed to by these addresses are live and which are not.
This can be found by

I. First making a search that will tell us help us to find the set of root pointers.
II. The book keeping list gives us the information about what to search? And the root
pointers will give us the information about where to search?.
III. Having got the information about what to search? and where to search?, our algorithm
gets in to the mark phase first for finding and marking the garbage and then in to the
sweep phase for collecting the garbage.
IV. Mark Phase
If an object is found to be alive then mark it.This is done by setting a field in
the book-keeping node corresponding to this object.
V. Sweep Phase
Scan all the objects. If an object is marked then unmark it. If an object is not
marked then free it as it has been declared as garbage by the mark phase of our
collector.
VI. Our algorithm

9
06K41A0548 A SEMINAR REPORT

Exits printing the message “ Your request cannot be satisfied even after
garbage collection”, if allocation fails even after trying to garbage collect.
 Returns the pointer to the allocated object of the size requested by the user if
allocation was successful

CONCLUSION AND FUTURE IMPROVEMENTS

 Our garbage collection routine currently pauses the program when it runs. What can
be done is to make the garbage collection routine to run in a separate thread in
parallel in such a way that the program does not stop when GC routine/collector
runs.
 We do not expect the addresses of the memory locations allocated by our garbage
collection routine to be stored in the CPU registers because we are yet to implement
the collector module which searches for the availability of pointers to live objects in
the CPU registers.
 Our garbage collection routine currently uses a sequential search in the mark and
sweep phases to check for the liveliness of the object. In order to improve the
performance a hashing function can be implemented replacing the sequential search.
 Our garbage collection routine is currently platform independent. Modifications can
be made to the code in such a way that the collector can be used in a variety of
platforms.
 A comparative study can be done between our collector and some other
existing garbage collectors for C based on some standard criteria so that modifications
(if any needed) can be done to our collector to identify any shortcomings and pitfalls
and improve its efficiency if so required
 I need to develop the code for it, Its still pending.

10
06K41A0548 A SEMINAR REPORT

BIBLIOGRAPHY

Books:

 Advanced Programming In The Unix Environment by W. Richard Stevens.


 The Design Of The Unix Operating System by Maurice J. Bach.
 The Unix Programming Environment by Brian W. Kernighan and Rob Pike.
 The C Programming Language by Brian W. Kernighan and Dennis M. Ritchie.
 Inside The Java 2 Virtual Machine – Garbage Collection by Bill Venners.

Papers:
 Uniprocessor Garbage Collection Techniques by Paul R. Wilson.
 Garbage Collection: Automatic Memory Management In The Microsoft’s .NET
Framework by Jeffrey Richter.
 Incremental Mature Garbage Collection Using The Train Algorithm by Jacob
Seligmann and Steffen Grarup.

Websites:
 www.hpl.hp.com/personal/Hans_Boehm/gc/
 www.memorymanagement.com

11

You might also like