The Black Art of Programming
The Black Art of Programming
Mark McIlroy
www.blueskytechnology.com.au
www.markmcilroy.com
Contents
1.
Prelude 4
2.
Program Structure 5
3.
4.
2.1.
Procedural Languages 5
2.2.
Declarative Languages 16
2.3.
Other Languages 19
Execution Platforms 20
3.2.
3.3.
Data structures 28
3.4.
Algorithms 43
3.5.
Techniques 65
3.6.
Code Models 84
3.7.
Data Storage 99
3.8.
3.9.
3.10.
4.2.
5.
4.3.
4.4.
4.5.
4.6.
4.7.
4.8.
4.9.
Coding 232
4.10.
Testing 261
4.11.
Debugging 274
4.12.
Documentation 284
1. Prelude
A computer program is a set of statements that is used to
create an output, such as a screen display, a printed
report, a set of data records, or a calculated set of
numbers.
Most programs involve statements that are executed in
sequence.
A program is written using the statements of a
programming language.
Individual statements perform simple operations such as
printing an item of text, calculating a single value, and
comparing values to determine which set of statements to
execute.
Simple instructions are performed in hardware by the
computers central processing unit.
Complex instructions are written in programming
languages and translated into the internal instruction set
by another program.
Computer memory is generally composed of bytes,
which are data items that contain a binary number. These
values can range from 0 to 255.
Memory locations are referred to by number, known as
an address.
A memory location can be used to record information
such as a small number, data from a graphics image, part
of a memory address, a program instruction, and a
numeric value representing a single letter.
2. Program Structure
2.1. Procedural Languages
Programs written in procedural languages involve a set
of statements that are performed in sequence. Most
programs are written using procedural languages.
Third generation languages are languages that operate at
the level of individual data items, if statements, loops
and subroutines.
A large proportion of programs are written using thirdgeneration languages.
2.1.1.
Data
2.1.1.3. Variables
A variable is a data item used within a program, and
identified by a variable name.
Variables may consist of fundamental data types such as
strings and numeric data types, or a variable name may
refer to multiple individual data items.
Variables can be used in expressions for calculations,
and also for comparisons to perform different sections of
code under different conditions.
The value of a variable can be changed using an
assignment statement, which changes the value of a
variable to equal the value of an expression.
2.1.1.4. Constants
Constants such as fixed numbers and strings can be
included directly within program code.
Constants can also be given a name, similar to a variable
name, and used in several places with the program.
10
2.1.2.
Execution
2.1.2.1. Expressions
An expression is a combination of constants, variables
and operators that is used to calculate a value.
An assignment operation involves a variable name and
an expression. The expression is evaluated, and the value
of the variable is changed to equal the result of the
expression.
Expressions are also used within control flow statements
such as if statements and loops.
Numeric expressions include the standard arithmetic
operations of addition, subtraction, multiplication and
division and exponentiation.
The basic string operations are concatenating two strings
to form a single string, extracting a substring, and
comparing strings.
String expressions may include constant strings, string
variables, and operators such as a concatenation operator.
Boolean variables and expressions have only two
possible values, true and false.
An expression containing a relational operator, such as
<=, is a Boolean expression. For example, 5 < 3 has
the value false.
The Boolean operators and, or and not can also be
used in expressions. An and expression has the value
true when both parts are true, an or expression has
the value true when either value is true, and a not
expression reverses the value.
Boolean expressions are used within if statements to
execute code under certain conditions and within loops
11
2.1.2.2. Statements
2.1.2.2.1. Assignment Statements
An assignment statement contains a variable name, an
assignment symbol such as an = sign, and an
expression.
The expression is evaluated, and the value of the variable
is set to equal the result of the expression.
Some languages are expression-focused rather than
statement-focused. In these languages, an assignment
operation may itself be an expression, and may be used
within other expressions.
2.1.2.2.2.2. Loops
A loop statement may contain a Boolean expression. The
expression is evaluated, and if it is true then the code
12
2.1.2.2.2.3. Goto
Some languages support a goto statement. A goto
statement causes a jump to a different point in the
program to continue execution.
Code that uses goto statements can develop very
complex control flow and may be difficult to debug and
modify.
Some languages also support structured goto operations,
such as a statement that terminates the current loop midway through the loop code.
These operations do not complicate the control flow to
the same extent as general goto statements, however
these operations can be easily missed when code is being
read.
For example, a statement in an early part of a complex
loop may result in the loop being exited when it is
executed. This statement complicates the control flow
and may make interpreting the loop code more difficult.
2.1.2.2.2.4. Exceptions
In some languages, exception handling subroutines and
sections of code can be defined.
13
2.1.2.3. Subroutines
Subroutines are independent blocks of code that are
referred to by name.
Programs are composed of a collection of subroutines.
When execution reaches a subroutine call the program
execution jumps to the beginning of the subroutine.
Control flow returns to the point following the
subroutine call when the subroutine terminates.
Subroutines may include parameters. These are variables
that can be accessed within the subroutine. The value of
the parameters is set by the calling code when the
subroutine call is performed.
Calling code can pass constant data values or variables as
the parameters to a subroutine call.
14
2.1.2.4. Comments
Comments are included within program code for the
benefit of a human reader. Comments are identified as
separate text items, and are ignored when the program is
compiled.
Comments are used to include additional information
within the code that is relevant to a particular calculation
or process, and to describe details of the function within
a complex section of code.
15
16
2.2.1.
Code Structure
17
18
19
Hardware
3.1.2.
Operating systems
3.1.3.
Compilers
3.1.4.
Interpreters
3.1.5.
Virtual Machines
3.1.6.
22
3.1.7.
Linking
24
3.2.2.
Time Slicing
3.2.3.
3.2.4.
Parallel Programming
3.2.5.
3.2.6.
27
3.3.1.1. Arrays
3.3.1.1.1. Standard Arrays
Arrays are the fundamental data structure that is used
within third-generation languages for storing collections
of data.
An array contains multiple data items of the same type.
Each item is referred to by a number, known as the array
index.
Indexes are integer values and may start at 0, 1, or some
other value depending on the definition and the language.
Arrays can have multiple dimensions. For example, data
in a two-dimensional array would be indexed using two
independent numbers. A two dimensional array is similar
to a grid layout of data, with the row and column number
being used to refer to an individual data item.
Arrays can generally contain any data type, such as
strings, integers and structures.
Access to an array element may be extremely fast, and
may be only slightly slower than accessing an individual
data variable.
Arrays are also known as tables.
This particularly applies to an array of structures, which
may be similar to a table with rows of the same format
but different data in each column. A table also refers to
an array of data that is used for reference while a
program executes.
28
30
x
x
x
x
x
x
3.3.1.2. Structures
A structure is a collection of individual data items.
Structures are also known as records in some languages.
A programming structure is similar in format to a
database record.
Arrays of structures are visually similar to a grid layout
of data with each row having the same type, but different
columns containing different data types.
31
3.3.1.3. Objects
In object orientated programming, a data structure
known as an object is used.
An object is a structure type, and contains a collection of
individual data items.
However, subroutines known as methods are also
defined with the object definition, and methods can be
executed by using the method name with a data variable
of that object type.
3.3.2.
3.3.2.4. Btrees
A B-tree is a tree structure that contains multiple
branches at each node.
A B-tree is more complex to implement than a binary
tree or other structures, however a B-tree is self
34
3.3.3.
3.3.3.1. Stacks
A stack is a data structure that stores a series of items.
When items are removed from the stack, they are
retrieved in the opposite order to the order in which they
were placed on the stack.
This is also known as a LIFO, Last-In-First-Out
structure.
The fundamental operations with a stack are PUSH,
which places a new data item on the top of the stack, and
35
3.3.3.2. Queues
A queue is used to store a number of items.
Items that are removed from the queue appear in the
same order that they were placed into the queue.
A queue is also known as a FIFO, First-In-First-Out
structure.
Queues are used in transferring data between
independent processes, such as interfaces with hardware
devices and inter-process communication.
36
3.3.4.
a
b
c
3.3.5.
if
integer
while
3.3.5.2. Heap
A heap is an area of memory that contains memory
blocks of different sizes. These blocks may be linked
together using a linked list arrangement.
Heaps are used for dynamic memory allocation. This
may include memory allocation for strings, and memory
allocated when new data items are created as a program
runs.
Implementing a heap can be done using pointers and a
large block of memory. This requires accessing the
memory as a binary block, and creating links and spaces
within the block, rather than treating the memory space
as a program variable.
Unused blocks are linked together to form a free list,
which is used when new allocations are required.
39
a
b
c
3.3.5.3. Buffer
A buffer is an area of memory that is designed to be
treated as a block of binary data, rather than an
individual data variable.
Buffers are used to hold database records, store data
during a conversion process that involves accessing
individual bytes within the block, and as a transfer
location when transferring data to other processes or
hardware devices.
Buffers can be accessed using pointers. In some
languages, a buffer may be handled as an array definition
with the array containing small integer data types, with
the assumption that the memory block occupies a
contiguous section of memory.
3.3.6.
Language-Specific Structures
41
3.3.7.
Structur
e
Array
Access
Method
Random
Access
Time
Addition
&
Deletion
Time
Full
Scan
Memor
y
Usage
Yes
1 item
Direct
Index
Search
(sorted)
Search
(unsorted)
Log2(n)
1
n/2
n/2
Linked
List
Search
n/2
Yes
1 item
+1
link
Binary
Tree
Search
(Fully
Balanced)
Search
(Fully
Unbalance
d)
log2(n) 1
log2(n) 1
(addition)
Yes
1 item
+2
links
n/2
n/2
(addition)
String
1 hash
function
1 hash
function
No
1 item
+
imple
mentat
ion
overhe
ad
Hash
Table
42
3.4. Algorithms
An algorithm is a step by step method for calculating a
particular result or performing a process.
For example, the following steps define the sorting
algorithm known as a bubble sort.
3.4.1.
Sorting
43
3.4.1.2. Quicksort
These algorithms involve using an order of n*log2(n)
comparisons to complete the sorting process. In the
previous example, this would be equal to approximately
20 million comparisons for the list of one million items.
The quicksort algorithm involves selecting an element at
random within the list. All the items that have a lower
value than the pivot element are moved to the beginning
of the list, while the items with a value that is greater
than the pivot element are moved to the end of the list.
This process is then applied separately to each of the two
parts of the list, and the process continues recursively
until the entire list is sorted.
44
3.4.2.
45
46
subroutine tree_scan
if left node exists
call tree_scan on left node
end
output current node value
if right node exists
call tree_scan on right node
end
end
3.4.3.
Binary Search
47
top_item as integer
bottom_item as integer
middle_item as integer
found = False
bottom_item = start_item
top_item = end_item
while not found And bottom_item < top_item
middle_item = (bottom_item + top_item) / 2
if search_val = data(middle_item)
found = True
else
if search_val < data(middle_item)
top_item = middle_item - 1
else
bottom_item = middle_item + 1
end
end
end
if not found Then
if search_val = data(bottom_item)
found = True
middle_item = bottom_item
end
end
binary_serach = middle_item
end
3.4.4.
48
49
3.4.5.
Solving Equations
51
MAX_ATTEMPTS then
failed to
false
0
true
x2
end
52
3.4.6.
3.4.7.
In some applications, structures may contain substructures or connections that have the same form as the
main structure.
For example, an engineering design may be based on a
structure that contains sub-structures with the same form
as the main structure.
An investment portfolio may contain several
investments, including investments that are parts of other
investment portfolios.
In these cases, the values relating to the main structure
can be determined recursively.
The involves calling a subroutine to process each of the
sub-structures, which in turn may involve the subroutine
calling itself to process sub-structures within the
substructure.
This process continues until the end of the chain is
reached and no further sub-structures are present. When
this occurs, the calculation can be performed directly.
This returns a result to the previous level, which
calculates the result for that level and returns to the
previous level and so forth, until the process unwinds to
the main level and the result for the main structure can be
calculated.
In some cases a loop may occur. This could not happen
in a standard physical structure, but in other applications
an inner substructure may also contain the entire outer
structure.
In the investment portfolio example, portfolio A may
contain an investment in portfolio B, which invests in
portfolio C, which invests back into portfolio A.
54
3.4.8.
55
3.4.9.
Check Digits
57
3.4.10.2.Postfix Expressions
A postfix expression is an alternative format for
expressing an expression, that places the operators after
the values that they operate on.
Using this format, brackets are not required, and operator
precedence does not need to be applied to the expression
as the precedence is implied in the order of the symbols.
For example, the infix expression 2 + 3 * 5 would be
converted to a postfix expression of 3 5 * 2 +
Postfix expressions can be evaluated directly from left to
right.
This can be done using a stack, where a value in the
expression is pushed on to the stack, and an operator
pops the arguments from the stack, calculates the result,
and pushes the result on to the stack.
When a valid expression is evaluated, a single result
should remain on the stack after the expression
evaluation is complete, and this should equal the result of
the expression.
Expressions may be stored internally in a postfix format,
so that they can be directly evaluated.
Code generation effectively generates code to evaluate
expressions in a postfix order.
3.4.10.4.Evaluation
The expression can be evaluated by reading each
instruction in sequence. If the instruction is a push
instruction, then the data value is pushed on to the stack.
If the instruction is an operator, then the operator pops
the arguments from the stack, calculated the result, and
pushes the result on to the stack.
*
4
*
3
Operation
Stack contents
push 4
4
push 5
5 4
multiply
20
push 3
3 20
subtract
-17
push 2
2 -17
push 7
7 2 -17
multiply
14 -17
add
-3
This process ends with the stack containing the result -3,
which is the correct result of the original expression.
61
a+
other phrase)
a?
.
(a)
a-z
a|b
62
64
3.5. Techniques
3.5.1.
65
66
State
Next Character
Next State
Within a comment
not /
/
1
2
No
not * or /
/
*
1
2
3
No
not *
*
3
4
Yes
not * or /
*
/
3
4
1
Yes
/
/
Not *
2
*
Not * or /
1
Not * or /
Not /
/
3.5.2.
Small Languages
Lexical analysis
Parsing
Code Generation
Execution
69
3.5.2.2. Parsing
Parsing or syntax analysis is the process of identifying
the structures within the input text.
The language can be defined using a grammar definition
which would specify the structure of the language.
One approach is to use an algorithm to convert the
grammar to a finite state automaton for parsing, however
this may be a complex process.
Another alternative is to use a recursive descent parser,
which is a fast and simple parsing technique that is
relatively easy to implement.
stack)
Assignment operator
POP variable name
stack and stores
71
Loop
Allocate label position X
Allocate label position Y
Define this position in the code as the position of label Y
Generate code for the loop expression condition
Generate a TEST jump instruction to jump to label X if
stack result is false
Generate the code for the loop statements
Generate a jump instruction to label Y
Define label X as this position in the code
3.5.3.
Recursion
3.5.4.
Language Grammars
statementlist:
<statement>
<statement> <statementlist>
statement:
IF <expression> THEN
<statementlist> END
WHILE <expression>
<statementlist> END
variable_name = <expression>
expression:
<expression>
<addexpression> AND
<addexpression> OR
<expression>
75
addexpression:
<addexpression>
<multexpression> +
<multexpression> -
<addexpression>
multexpression: <unaryexpression> * <
multexpression >
<unaryexpression > / <
multexpression >
unaryexpression: constant
variable_name
- <expression>
( <expression> )
3.5.5.
76
Initial rule:
addexpression:
<addexpression> +
<multexpression>
77
Adjusted rule
addexpression:
<addexpression>
<multexpression> +
78
3.5.6.
Bitmaps
79
01011011
00001000
00001000
01011011
5B
80
01011011
00000100
01011111
For example
Value
Test Value
NOT Test Value
Value AND NOT Test Value
3.5.7.
01011111
00000100
11111011
01011011
Genetic Algorithms
81
83
3.6.1.
3.6.2.
84
3.6.3.
85
A loop of code would then read the table and call the
menu or screen display function for each entry in the
table.
This approach is also used in circumstances such as
evaluating externally-defined formulas, where the
formula may be parsed and intermediate code may be
stored in an array. Each instruction in the array would
then be executed in sequence using a loop of code.
Table driven code can lead to a large drop in code
volumes and an increase in consistency in circumstances
where it is applicable.
Table entries can be stored in program variables as arrays
of structure types, or in database tables.
Table driven code is a large-data, small code model,
rather standard procedural code which is a large code,
small data model.
Small code models, such as using table driven code or
run-time routines to execute expressions, are generally
simpler to write and debug and more consistent in output
than small data models.
3.6.4.
86
This is a fundamentally different approach to functioncentred code, which involves defining subroutines
separately to the data items that they may process.
Object orientated code can be used to implement flexible
function-driven code, and is particularly useful for
situations that involve objects within other objects.
However, object orientated code can become extremely
complex and can be difficult to debug and maintain.
For example, many object orientated systems support a
hierarchical system known as inheritance, where objects
can be based on other object. The object inherits the data
and methods of the original object, as well as any data
and methods that are implemented directly.
In an object model containing many levels, data and
methods may be implemented at each level and
interpreting the structure of the code may be difficult.
Also, an object model based on the application objects,
rather than an independent set of concepts, can be
complex and require extensive changes when major
changes are made to the structure of the application
objects.
This issue also applies to database structures based on
specific details such as specific products or specific
accounts, rather than general concepts.
3.6.5.
3.6.6.
3.6.7.
91
3.6.8.
3.6.9.
Task-Specific Languages
92
93
95
96
Code Model
Structure
97
98
Individual Files
3.7.2.
3.7.3.
Databases
99
3.7.3.2.1. Keys
The primary key of a record is a data item that is used to
locate the record. This can be an individual field, or a
composite key that is derived from several individual
fields combined into a single value.
A foreign key is a field on a record that contains the
primary key of another record. This is used to link
related records.
Keys fields are generally short text fields such as an
account number or client code. Composite keys may be
composed of individual key fields and other fields such
as dates.
101
103
104
3.7.3.2.4. Indexes
Databases include data structures know as indexes to
increase the speed of locating records. An index may be
a self-balancing tree structure such as a B-tree.
The index is maintained internally within the database. In
the case of query languages the index is generally
transparent to the caller, and increases access speed but
does not affect the result.
In other cases, indexes can be selected in program code
and a search can be performed directly against a defined
index.
Indexes are generally defined against all primary keys.
105
106
107
Date
123456
234.21
1/1/80
Temperature
1.23
Pressure
31.2
110
Date
123456
123456
123456
1/1/80
1/1/80
1/1/80
Type
Value
TEMPRETURE
PRESSURE
HUMIDITY
1.23
31.2
234.21
3.7.3.3.7. Timestamps
A timestamp is a field containing a date, time and
possibly other information such as a user logon and
program name.
Timestamps may be stored as data items in a record to
identify the conditions under which a record was created,
and the details of when it was last changed.
Timestamps can be used for debugging and
administrative purposes, such as reconciling accounts,
tracing data problems, and re-establishing a sequence of
events when information is unavailable or conflicting.
113
114
115
3.8.2.
=
=
6.254 x 109
8.73 x 10-6
Number range
-32,768 to +32,767
-2,147,483,648 to
116
Precision
Range
approx 7 digits
approx
approx 15 digits
approx
3.8.3.
3.8.4.
Numeric Operators
addition
subtraction
multiplication
division
modulus
Exponentiation, yx
3.8.5.
Modulus
(integer
division)
line-on-page = total-lines Mod lines-per-page
3.8.6.
Rounding Error
+
=
0.333333333333333
0.666666666666666
0.999999999999999
121
3.8.7.
Invalid Operations
3.8.8.
122
Range
2 bytes signed
-32,768 to
2 bytes unsigned
0 to +65536
+32,767
3.8.9.
124
3.8.12.1.Roman Numerals
In the Roman number system, major numbers are
represented by different symbols. I is 1, V is 5, X is 10, L
is 50, C is 100 and so on.
Numerals are added together to form other numbers. If a
lower value appears to the left of another value, it is
added to the main value, otherwise it is subtracted.
The first ten roman numerals are
I
II
III
IV
V
VI
VII
VIII
IX
X
1
2
3
4
5
6
7
8
9
10
125
For example
31504
3 x 104
3x
10000
+ 1 x 103
+1x
+ 5 x 102
+ 0 x 101
+ 4 x 100
+ 5 x 100
+ 0 x 10
+4x1
1000
2 x 34
2 x 8110
+ 1 x 33
+ 1 x 2710
+ 3 x 32
+ 3 x 910
+ 0 x 31
+ 0 x 310
126
+ 2 x 30
+2x1
= 21810
1 x 25
+ 0 x 24
+ 1 x 23
+ 1 x 22
+ 1 x 21
+ 1 x 20
1 x 3210
+ 0 x 1610
+ 1 x 810
+ 1 x 410
+ 0 x 210
+1x1
= 4510
127
For example,
128
8D3F16
8 x 163
409610
+ D x 162
+ 1310 x
+ 3 x 161
+ 3 x
+ F x 160
+ 1510 x 1
25610
1610
=
3615910
=
10001101001111112
8
D
3
F
1000 1101 0011 1111
129
Hexadecimal
Value
Binary
Decimal
01
20
02
21
04
22
08
23
10
24
20
25
40
26
80
27
00000001
00000010
00000100
00001000
00010000
16
00100000
32
01000000
64
10000000
128
0F
F0
FF
00001111
11110000
11111111
15
240
255
130
3.9.1.
Access Control
No access
View-only access
View and Update access
Delete access
Run read-only programs
Run update programs
131
3.9.2.
3.9.3.
Encryption of Data
Output
Character
J
&
D
E
9
)
<
:
3.9.4.
Individual Files
3.9.5.
External Access
134
135
Direct Index
Binary Search
Full
Scan
(sorted)
(unsorted)
100
1,000
10,000
5,000
100,000
50,000
1,000,000
500,000
10,000,000
5,000,000
1
1
1
5.6
9.0
12.3
15.6
18.9
22.3
50
500
3.10.1.1.2. Tags
Searching by string values can be avoided in some cases
by assigning temporary numeric tags to data items for
internal processing.
For example, the primary and foreign keys of database
records may be based on string fields, or widely-ranging
numeric codes.
When values are read into arrays for internal
calculations, each entry could be assigned a small
number, such as the array index, as a tag.
Other data structures would then use the tag to refer to
the data, rather than the string key. This would allow that
data to be accessed directly using a direct array index.
3.10.1.1.4. Sublists
In cases where searching lists is unavoidable, the search
time may be reduced if the data is broken into subsections.
This is effectively a partial sort of the data.
139
3.10.1.1.5. Keys
Where data is located by searching for several different
data items, the data items can be combined into a single
text key.
This key can then be used to locate the data using one of
the string index methods, such as a hash table or a binary
search on a sorted array.
140
3.10.1.1.7. Caching
Caching is used to store data temporarily for faster
access.
This is used in two contexts. Caching may be used to
store data that has been retrieved from a slower access
storage, such as storing data in memory that has been
retrieved from a database or disk file.
Also, caching can be used to store previously calculated
values that may be used in further processing. This
particularly applies to calculations that required reading
data from disk in order to calculate the result.
A structured set of results can be stored in an array.
Where a large number of different types of data are
stored, a structure such as a hash table can be used.
141
3.10.1.2.Avoiding Execution
Execution speed can be increased by avoiding executing
statements unnecessarily.
3.10.1.3.Data types
3.10.1.3.1. Strings
Processing with strings is significantly slower than
processing with numeric variables.
Copying strings from one variable to another, and
performing string operations may involve memory
allocations and copying individual characters within the
string
In contrast, setting a numeric value can be done with a
single instruction.
143
3.10.1.4.Algorithms
144
3.10.1.5.Run-time processing
In some cases, a structure can be compiled or translated
into another structure that can be processed more
quickly.
For example, a text formula may specify a particular
calculation. Processing this formula may involve
identifying the individual elements within the formula,
145
3.10.1.6.Complex Code
When a section of code has been subject to a large
number of changes and additions, the processing can
become very complex. This may involved multiple
nested loops, re-scanning of data structures and complex
control flow. These changes may lead to a significant
drop in execution speed.
In these cases, re-writing the code and changing the data
structures may lead to a drastic reduction in complexity
and an increase in execution speed
146
3.10.1.7.Numeric Processing
Numeric processing is generally faster when the data is
loaded into program arrays before the calculation
commences. Data may be loaded from databases, data
files, or more complex program data structures.
Execution speed may be increased if calculations are not
repeated multiple times. This situation can also arise
within expressions.
For example, an expression of the form x = a * b * c
may appear within an inner processing loop. However, if
part of the expression does not change within the inner
loop, such as the b * c component, then this calculation
can be moved outside the loop.
This may lead to an expression of d = b * c in an outer
loop, and x = a * d in the inner loop.
Where a calculation within an inner loop is effected by
the inner loop but not the outer loop, then a separate loop
can be added before the main loops to generate the set of
results once, and store then in a temporary array for use
within the inner loop.
Special cases within the structure of the calculations and
data may enable the number of calculations to be
reduced.
For example, if a matrix is triangular, so that the data
values are symmetrical around the diagonal, then only
the data on one side of the diagonal needs to be
calculated.
As another example, some parts of the calculation may
be able to be performed using integers rather than
floating point operations. However, the conversion time
between the integer and floating point formats would
also affect the difference in execution times.
147
3.10.1.9.Database Access
Processes that are structured to avoid re-reading database
records will generally execute more quickly than
processes that read records more than once.
Where a record would be read many times during a
process, the record can be read once and stored in a
record buffer or program variables.
When a one-to-many link is being processed, if the child
records are read in the order of the parent record key, not
the child record key, then each parent record would only
need to be read once.
Processing the child records in the order of the child
record key may require re-reading a parent record for
every child record.
149
3.10.1.11. Bugs
Performance problems may sometimes be due to bugs. A
bug may not affect the results of a process, but it may
result in a database record being re-read multiple times, a
loop executing a larger number of times than is
necessary, cached data being ignored, or some other
problem.
Using an execution profiler or a debugger may identify
performance problems in unexpected sections of code
due to bugs.
iterates 1000 times, then the code within the inner loop
will be executed one million times.
Indirect nested loops can occur when a subroutine is
called from within a loop, and the subroutine itself
contains a loop.
This would initially appear to be two single-level loops,
however because the subroutine is called from within
one of the loops, the number of times that the code
would be executed would be the product of the number
of loop iterations, not the sum.
In some development environments an execution
profiling program can be used to determine the
proportion of execution time that is spent within each
section of the code.
In some cases a nested loop may be avoided by changing
the order of nesting.
For example, if an array is scanned using a loop and
another array is scanned within that loop, if the first array
can be directly accessed using an array index, then a
nested loop could be avoided by scanning the second
array first, and using the direct index to access the first
array.
Also, sub-lists could be expanded to provide a single
complete set of data combinations, rather than including
a main set of data and multiple sub-lists of related
information.
152
3.10.2.1.2. Bitmaps
Where several Boolean flags are used, these can be
stored as individual bits using a bitmap method, rather
than using a full integer variable for each individual flag.
In some applications, such as graphics processing,
individual data items may not use a number of bits that is
a multiple of a standard eight-bit byte.
For example, eight separate values can be represented
using a set of three bits. If a large volume of data
consisted of numbers with three-bit pattens, then several
data items could be stored within a particular byte.
This would involve a calculation to determine the
location of a three-bit value within a block of data, and
the use of bitmaps to extract the three-bit code from an
eight-bit data item.
153
154
155
Assembler
4.1.2.
156
4.1.2.1. Fortran
Fortran and Cobol were the first widely-used
programming languages, with Cobol being used for
business data processing and Fortran for numeric
calculations in engineering and scientific applications.
Fortran is an acronym for FORmula TRANslator
Fortran has strong facilities for calculation and working
with numbers, but is less useful for string and text
processing.
4.1.2.2. Basic
Basic was initially designed for teaching Fortran. Basic
is an acronym for Beginners All Purpose Symbolic
Instruction Code.
Basic has flexible string handling facilities and
reasonable numeric capabilities.
4.1.2.3. C
C was original developed as a systems-level language
and was associated with the development of the UNIX
operating system.
157
4.1.2.4. Pascal
Pascal was initially developed for teaching computer
science courses. Pascal is named after the mathematician
Blaise Pascal, who invented the first mechanical
calculating machine of modern times.
Pascal is a highly structured language that has strict type
checking and multiple levels of subroutine and data
variable scope.
4.1.3.
4.1.3.1. Cobol
Cobol operates at a similar level to third-generation
languages, however it is generally grouped separately as
the structure of cobol is quite different from other
languages.
Cobol is an acronym for COmmon Business Orientated
Language.
Cobol is designed for data processing, such as
performing calculations and generating reports with large
volumes of data. Vast amounts of cobol code have been
written and are particularly used in banking, insurance
and large data processing environments.
In cobol, data fields are defined as a string of individual
characters, and each position in the variable may be
defined as an alphabetic, alphanumeric or numeric
character.
158
4.1.4.
Object-Orientated Languages
4.1.4.1. C++
C++ is a major OO language. C++ is an extension of C.
Some of the facilities of C++ include the ability to define
objects, known as classes, containing data and related
subroutines, known as methods. These classes may
operate at several levels and sub-classes can be defined
that include the data and operations of other class
objects.
C++ also supports operator overloading, which allows
operators such as addition to be applied to newly-created
data types.
4.1.4.2. Java
Java is an object orientated language developed for use
in internet applications. It has a syntax that is similar to
C, but is not based on a previous language design.
Java includes dynamic memory allocation for creating
and deallocating objects, and a strictly defined set of
class libraries (subroutines) that is intended to be
portable across all operating environments.
160
4.1.5.
Declarative Languages
4.1.5.1. Prolog
Prolog is a declarative language. A prolog program
consists of a set of facts, rather than a set of statements
that are executed in order.
Prolog is used in decision-making and goal-seeking
applications.
Once the program has been defined, the prolog
interpreter then uses the defined facts in an attempt to
solve the problem that is presented.
For example, a prolog program may include the moves
of a chess game. The prolog interpreter would then use
the facts that were defined in the program to determine
the moves in a computer chess game.
4.1.5.2. SQL
SQL, Structured Query Language, is a data query
language that is used in relational databases. SQL is a
declarative language and groups of records are defined as
a set, which can be retrieved or updated.
SQL statements may be executed in sequence however
the actual selection of records is based on the structure of
the selection statement.
4.1.6.
Special-Purpose Languages
4.1.6.1. Lisp
Lisp is a highly unusual language that was developed
early in the history of computing, and used in artificial
intelligence research.
All data items in Lisp are stored as lists. All processing
in Lisp involves scanning lists.
The syntax of Lisp is very simple, but involves massive
amounts of brackets as lists are defined within lists
which are within other lists.
The program code itself is also stored in lists, as well as
the data items.
4.1.6.2. APL
A Programming Language, APL is a language that was
developed for actuarial calculations involving insurance
and finance calculations.
APL is highly mathematical and includes operators for
matrix calculations etc. APL code is very difficult to read
and the APL character set includes a wide range of
162
4.1.7.
Hosted Languages
4.1.8.
4.1.9.
Report Generators
165
166
4.2.2.
Version Control
4.2.3.
Infrastructure
168
4.3.1.
Conceptual Basis
169
170
4.3.2.
Level of Abstraction
4.3.4.
Orthogonality
from 5 to 30.
As another example, if a graphics engine supported three
types of lights and three types of view frames, then in an
orthogonal system any light could be used with any view
frame.
In practice some combinations may not be supported,
either because they would be particularly difficult to
implement, they would take an extremely long time to
execute, or because they specified logical contradictions.
For example, a list can be sorted in ascending or
descending order, but cannot be sorted in both orders
simultaneously.
When selected options conflict, one outcome may be
selected by default.
4.3.5.
Generality
174
4.3.6.
Batch Processes
4.3.6.3. Performance
Performance may be a significant issue with batch
processes.
Data that will be read multiple times can be stored in
internal program arrays or data record buffers.
The order in which the processing is performed can also
have a significant impact on execution speed.
When a one-to-many database link is being processed, if
the child records are read in the order of the parent key
then each parent record would only need to be read once.
176
4.3.6.4. Rewriting
Batch processing code is often very old, as it performs
basic functions that change little over time.
Rewriting code may have lead to a reduction in
execution time, particularly if the code has been heavily
modified or a long time has elapsed since it was first
written.
When the code has been heavily modified, the code may
calculate results that are never used, read database
records that are not accessed and re-read records
unnecessarily.
This may occur when code in a late part of the process is
changed to a different method, however the earlier code
that calculated the input figures remains unchanged.
Re-reading records may occur as the control flow within
the module becomes more complex.
Also, re-writing the code may enable structures in the
database and the code to be used that were not available
when the module was originally written.
177
178
4.4.1.
Functional Engines
4.4.2.
4.4.3.
Implementation
4.4.4.
4.4.5.
Code Models
4.4.6.
Sequence Dependance
4.4.7.
Orthogonality
4.4.8.
Generality
182
User Interfaces
183
4.5.2.
Program Interfaces
185
186
187
189
190
4.6.2.
Requirements Analysis
Systems Analysis
Systems Design
Coding
Testing & Debugging
Documentation
193
4.6.3.
4.6.3.2. Consistency
Consistency in design, use of language features and
coding conventions is important in large systems.
When code is consistently developed, different sections
of the code can be read without the need to adjust to
different conventions.
Also, consistency reduces the chance of problems
occurring within interfaces between modules.
4.6.3.3. Infrastructure
Large developments generally involve the development
of some common functions and general facilities.
196
4.6.4.
197
4.6.5.
198
4.6.6.
4.6.6.1. Advantages
200
4.6.6.2. Disadvantages
A just-in-time approach could lead to a system that is
structured in a similar way to the processing itself.
This would not generally be an ideal structure, as this
would lead to a rigid code and data design that was
difficult to modify.
An alternative would be to retain a separation between
the system structure and the processing requirements,
and adapt the system structure independently of the
processing changes.
Testing could become an issue with just-in-time
development, as frequent releases of new versions would
increase the amount of testing involved in the
development process.
If structural changes were not made regularly, a just-intime process could degenerate into a maintenance
process involving patchwork changes to a system.
This could occur when changes were implemented
directly, rather than adapting the system structure to a
new type of process.
If the development process is tied in too closely with the
use of the system, then alternative system designs and
201
4.6.6.3. Implementation
In general, just-in-time development may be more
effective when the system structure is adapted to suit a
new process, rather than implementing the change
directly.
For example, a change to a calculation could be
implemented by adding a new facility for specifying
calculation options, rather than including the specific
change within the system.
This process would allow the system to retain a flexible
and independent structure, based on a range of general
facilities rather than fixed processes.
When a large number of changes had been made, internal
structures could be changed to better reflect the new
functions and processes that were being performed.
4.6.7.
System Redevelopments
4.6.8.
204
4.6.9.
206
Development Process
Description
Infrastructure Development
The development of
general functions,
common routines and
object structures.
Application Development
Development of code to
implement application
specific functions and
processes.
Project Development
Development of a system
from analysis through to
completion on a project
basis.
Evolutionary Development
Continual development
of a system.
Ad-hoc Development
Development of
temporary and
unstructured systems,
such as testing programs
and productivity tools.
Just-In-Time Development
Adapting a system to
meet changing
requirements, and
implementing modules as
they are developed.
System Redevelopment
Developing a system on
a project basis to replace
an existing system.
Maintenance
Minor changes to an
207
Additional functions
added to an existing
system.
208
4.7.2.
Abstraction
4.7.3.
4.7.4.
Rewriting Code
4.7.5.
4.7.6.
Database Development
4.7.7.
Release Cycles
215
Code Interactions
4.8.2.
Localisation
216
4.8.3.
Sequence Dependence
217
4.8.4.
Direct Mapping
220
4.8.5.
Defensive Programming
223
4.8.6.
Segmentation of Functionality
4.8.6.3. Implementation
Separating functions can be done by creating major
sections within the code that contain related functions.
These could be accessed through a defined set of
subroutine calls and operations.
These modules could be extended to create processing
engines. For example, a calculation engine may perform
a range of general calculation and numeric processing
operations. The engine would be accessed through a
clearly defined interface of data and functions, and may
be called directly or may execute independently of the
main program.
Each section only remains independent while it does not
contain processing that relates to another section of the
system.
For example, a calculation section would not contain
input/output processing, such as screen, file or printing
operations.
This approach allows the code section to be used as an
independent unit.
Portability between different systems is also improved
with the approach.
225
4.8.7.
Subroutines
4.8.7.1. Selection
In general, code is simpler and clearer when a collection
of small subroutines is used, rather than a number of
larger subroutines.
Where a loop contains several statements, these may be
split into a separate subroutine. This results in producing
two subroutines. One subroutine contains looping but no
processing, while the other contains processing and no
looping.
When a large subroutine contains separate blocks of code
that are not related, each major block could be split into a
separate subroutine.
When code that performs a similar logical operation
exists in several parts of a system, this could be grouped
into a single subroutine
For example, different sections of code may scan and
update parts of the same data structure, or perform a
similar calculation.
The variations in the process could be handled by
passing flags to the subroutine, or by storing flags with
the data structure itself, so that the subroutine could
automatically determine the processing that was
required.
226
4.8.7.2. Flags
Subroutines are more general if they are passed flags and
individual options, rather than being passed data and then
testing conditions directly.
For example, a subroutine may be passed a Boolean flag
to specify an option, such as sorting a list in ascending or
descending order.
Other options could also be passed. For example, the key
of a data set could be passed, rather than an entire
structure type that contained the key as one data item.
Using flags and options allows the subroutine to be
called in different ways from different parts of the code.
Also, using flags allows the subroutine to perform
different combinations of processing, in contrast to
checking the data directly, which may result in a fixed
result dependant on the data.
Interactions with different parts of the code may also be
reduced.
4.8.8.
Automatic Re-Generation
228
4.8.9.
Design Issue
Description
Interactions
Localisation
Sequence
Dependence
Pre-requisite
checking
Direct Mapping
Defensive
Programming
Systems composed of a
large number of small
subroutines may be
easier to interpret,
debug and modify that
systems composed of
several large and
complex subroutines.
Several small
subroutines may be
clearer than a single
complex subroutine.
Subroutine
Parameters
Subroutines may be
more general, and
interactions may be
reduced, when
subroutines are passed
flags and options, rather
than data used for
internal conditions.
Data used in internal
conditions causes
interactions with distant
code and results in a
fixed outcome for an
individual data
condition.
230
231
4.9. Coding
Coding is the process of writing the program code.
4.9.1.
Development of Code
232
4.9.2.
Robust Code
4.9.3.
4.9.4.
Layout
235
4.9.5.
Comments
236
4.9.6.
Variable Names
238
4.9.7.
4.9.8.
Variable Usage
239
4.9.9.
Constants
240
4.9.11.1.Error Detection
Errors can be detected by using error checking code.
These statements do not alter the results of the process,
but are included to detect errors at an early stage.
This is used to assist in the debugging process, and to
prevent incorrect data being stored and incorrect results
from being produced.
Error detection can involve checking data values at
points in the process for values that are negative, zero, an
empty string, or outside expected ranges.
At the process level, checks could include performing
reverse calculations, scanning output data structures for
invalid values and combinations, and checking the
relationship between various variables to ensure that the
output values are consistent with each other and with the
input values.
243
4.9.11.2.Response to Errors
When an error condition is detected, various actions can
be taken.
Errors within general subroutines and functions may be
handled by returning an error status condition to the
calling routine.
Alternatively, in some languages and situations an
exception is generated, and this exception results in
program termination unless it is trapped by an errorhandling routine. The error handling routine is
automatically called when the exception condition arises.
When the error occurs within a main processing function,
an error message may be displayed for an interactive
process, or logged to an error log file for a batch process.
Following the error, the process may continue operation
and produce a partial result, or the current process may
terminate.
4.9.11.3.Error Messages
244
An error number.
A description of the error.
The place in the program where the error
occurred.
The data that caused the error.
Related major data keys that can be used to
locate the item.
A description of the action that will be taken
following the error.
Possible causes of the error.
Recommended actions.
Actions to avoid the error if the data is actually
correct.
A list or description of alternative data that
would be valid.
For example
ERROR P2345: A negative policy contribution
of $-12.34 is recorded for policy number
0076700044 on 12/03/85. Module
PrintStatement, Subroutine CalcBalance
ERROR 934: Model ShellProfile has no
mesh type. This error may occur if the
conversion from version 3.72 has not been
completed. If this model is a continuous-surface
model, the option continuous surface in the
model parameters screen can be selected to
enable correct calculation. Module RenderMesh,
Subroutine MeshCount
245
4.9.11.4.Error Recovery
In some processes a significant number of errors may be
expected.
This occurs, for example, in compilers while compiling
program code, and in data processing while processing
large batches of data.
In these cases, ending the process when the first error is
detected could result in repeated re-running of the
process as each error is corrected.
Error recovery involves taking action to continue
operation following an error condition, to generate other
results or generate partial output, and to detect the full
list of errors.
During program compilation, a compiler will generally
attempt to process the entire program and produce a list
of all compilation errors within the code.
When an error is detected, an error message may be
logged to a file. Depending on the situation, error
recovery may involve ignoring the error and continuing
operation, terminating part of a process and continuing
other parts, using a default value in place of an invalid
246
4.9.11.5.Self-Compensating Systems
A self-compensating system is a system that
automatically adjusts the system to correct existing
errors.
This can be done by calculating the expected current
position, the actual current position, and creating records
to adjust for the difference.
This is in contrast to a process that simply performs a
fixed process and ignores other data.
For example, a monthly fee process may create monthly
fee transactions. If a previous transaction is incorrect,
this will be ignored by the current month process.
However, if the monthly process calculated the year-todate amount due, and the year-to-date amount paid, and
then created a transaction the represent the difference,
this approach would adjust for the existing incorrect
transaction.
Separate values could also be created for the expected
amount and the unexpected adjustment.
4.9.12. Portability
Portable code is written to avoid using features that are
specific to a particular compiler, language
implementation, or operating environment.
Code that is written in a portable way is easier to convert
to different development environments.
247
4.9.12.2.External Interfaces
User interface structures, file and database structures and
printing operations may all vary from operating system
to operating system.
In general, conversion to alternative environments may
be simpler when related processes are grouped together.
For example, all the user interface code, including calls
248
4.9.12.3.Segmentation of Functionality
In many systems the code can be grouped into several
major sections.
One section may be the user interface code, which
displays screens, menus, and handles input from the user.
Other major sections may include a set of calculation and
processing functions, and formatting and printing code.
This would increase portability by enabling one section
to be modified for an alternative environment, while the
other sections remain unchanged.
However, this approach would not apply to an eventdriven system that used a complex user interface design,
with small processing steps attached to various functions.
250
251
253
254
255
256
4.9.19.2.Integer Implementations
In some languages, Boolean variables are implemented
as integers within the code, rather than as a separate type.
The True and False values would be numeric constants.
Zero is generally used for False, while True can be 1, -1,
or any non-zero number depending on the language.
In languages where Boolean values are implemented as
numbers, problems can arise if numeric values other than
True and False are used with variables that are intended
to be Boolean.
For example, if False is zero and True is 1, but any nonzero value is accepted as true, the following anomalies
can occur.
Value
Condition
Outcome
Reason
1
1
1
is false
= True
is true
no
yes
yes
1 is not equal to 0
1 is equal to 1
1 is non-zero
2
2
2
is false
= True
is true
no
no
yes
2 is not equal to 0
2 is not equal to 1
2 is non-zero
4.9.20. Consistency
Consistency in the use of language features may reduce
the chance of bugs within the code.
257
If the wrong loop type was used for the array, one valid
element would be missed, and would be replaced with
another value that could be a random, initial or previous
value.
This may lead to subtle problems in processing, or create
general instability and occasional random problems.
When maintenance changes are made to an existing
system, the chance of errors occurring may be reduced if
the language features are used in a way that is consistent
with the existing code.
258
value through the loop, the last value plus one, or some
other value.
Most compilers would not detect this problem, as
technically this code is a valid set of statements within
the language.
This problem could be avoided by using the variable
num_words after the loop instead of the loop variable
itself.
Calling subroutines from within expressions can also
result in undefined operations. For example, in some
cases the order in which an expression is evaluated is not
defined, and when an expression contains two subroutine
calls, these calls could occur in either order.
In the case of Boolean expressions, if the first part of an
AND expression is false, then the second part does not
need to be evaluated, as the result of the AND expression
will be false, regardless of the result of the second
expression.
In these cases, languages may define that the second part
will never be executed, that the second part will always
be executed, or the result may be undefined.
260
4.10. Testing
Testing involves creating data and test scenarios, calling
subroutines or using the system, and checking the results.
Testing is done at the level of individual subroutines,
complete modules, and an entire system.
Testing cannot repair a system that is poorly designed or
coded. The reliability of a system is determined during
the design and coding stages, not during the testing stage.
261
4.10.2.1.Function Testing
Testing individual functions involves testing individual
subroutines and modules as each function is developed.
Debugging is easier and more reliable if testing is done
as each small stage is developed, rather than writing a
large volume of code and then conducting testing.
Function testing is conducted by writing a test program
or test harness.
This may be a separate program, or a section of code
within an existing system. The testing routine generates
data, calls the function being tested, and either logs the
results to a file or checks the figures using an alternative
method.
4.10.2.2.System Testing
System testing involves testing an entire system. This is
done by creating test data, running processes and
functions within the system, and checking the results.
System testing generally results in the production of a list
of known problems. The actual error or outcome in each
situation is described, along with the data and steps
required to re-produce the problem.
When a range of problems have been fixed, a new test
version of the system can be released for further testing.
4.10.2.3.Automated testing
262
4.10.2.4.Stress Testing
Stress testing involves testing a system with large
volumes of data and high input transaction rates. This is
done to ensure that the system will continue to operate
normally with large data volumes, and to ensure that the
response times would be within acceptable limits.
Stress testing can be done by generating a large volume
of random or sequential data, populating the database
tables, and running the system processes.
Stress testing can also be done with internal data
structures. For example, a testing routine for testing a
general set of binary tree functions, or a sorting module,
could create a large volume of data, insert the data into
the structure, and call the functions that operate on the
data structure.
263
4.10.2.6.Parallel Testing
In some cases, an alternative system is available that can
perform similar functions to the system being tested.
This may be an existing system in the case of a system
redevelopment, a temporary system, or an alternative
system such as a commercial system that provides
related functions.
Parallel testing can be done by loading the same set of
data into both systems and comparing the results.
The results should also be checked independently, to
avoid carrying over bugs from a previous system into a
new system.
264
4.10.3.1.Actual Data
Actual data is derived from previous systems, manual
keying, uploading from external systems, and the
existing system itself when changes are being made.
Tests conducted on actual data can be used to generate a
set of known results. These results can then be regenerated and compared to the previous data after
changes have been made. This process can be used to
check whether existing functions have been altered by
changes made to the system.
4.10.3.3.Edge Cases
265
4.10.3.4.Random Data
Random data generation may produce test cases that are
unusual and unexpected, even though they are valid
system inputs.
Testing with random data can be useful in checking the
full range of functionality of a system or module.
Randomly generated data and function sequences may
test scenarios that were not contemplated when the
system was designed and written.
4.10.4.2.Alternative Algorithms
In some cases an alternative method can be used to check
the results that have been generated. This may involve
writing a testing routine to calculate the same result,
using a method that may be simple, but may execute
slowly and may only apply to that particular situation.
When this is the case, the test program can generate a
large number of test cases, call the module being tested,
and check the results that are produced.
4.10.4.3.Known Solutions
When testing is conducted on an existing system, a set of
standard results can be produced. This may include a set
of data that is updated within a database, or a set of
calculated figures that are stored in a file for future
comparison.
This process allows a test program to re-generate the data
or figures, and check the results against the known
correct output.
4.10.4.4.Manual Calculations
The inputs to calculations and the calculation results can
be logged to a file for manual checking. This may
include writing a simple program to scan the file and
check the figures, or it may involve using a program such
as a spreadsheet to check the data and re-calculate the
figures.
Checks can be run against the input data that was
supplied to the system, and also against calculations
within the set of output data, such as verifying totals.
267
4.10.4.5.Consistency of Results
In some cases, the output from a process should have a
certain form, and individual outputs should be related in
certain ways.
In these situations, a test program can be used to check
that the individual results are consistent with each other.
These tests could also be built into the routine itself, and
used to check output as it is produced.
When these tests are fast and simple they can remain in
the code permanently, otherwise they can be activated
for testing and debugging.
For example, a set of output weights should sum to 1, a
sorted output list should be in a sorted order, and the
output of a process that decomposes a value into parts
should equal the original value.
As another example, checking that a sort routine has
successfully sorted a list is a simple process, and
involves scanning the list once and checking that each
item is not smaller than the previous item.
4.10.4.6.Reverse Calculations
Some calculations and processes can be performed in
two directions, with one direction being difficult and the
other straightforward.
For example, the value of y in the equation y = x2 +
x is determined by calculating the result from the value
of x.
However, if the value of y is already known but the
value of x is required, then the equation cannot be
268
269
Testing Methods
Function testing
System Testing
Automated Testing
Stress Testing
Parallel Testing
Test Data
Actual Data
Edge Cases
Random Data
Detecting Errors
System Operation
Alternative Algorithms
Known Solutions
Manual Calculations
Consistent Results
Reverse Calculations
Performing a calculation in
reverse to determine the original
figure, and comparing this result
with the actual input value to
verify that the generated result is
correct.
272
Declarative Code
Testing Output
Structure Checks
273
4.11. Debugging
Debugging is the process of locating bugs within a
program and correcting the code. A bug is an error in the
code, or a design flaw, that leads to an incorrect result
being produced.
4.11.1.2.Data Traces
A log file can be written of the value of variables as the
program runs.
This can be used to view the sequence of changes to
various data variables.
275
4.11.1.4.Internal Checks
Internal checks are code that is added into the system to
determine a point in the process when the results become
incorrect.
This can be used to determine the approximate location
of the problem. This method is particularly useful for
bugs that randomly occur, where a sequence of steps to
reproduce the problem cannot be determined.
276
4.11.1.7.Checking Results
Some processes produce several results that can be
checked against each other. For example, in process that
decomposes a value into component parts, the sum of the
parts should match the original value.
277
278
number
number * expression
number + expression
number - expression
number / expression
( expression )
279
280
Approach
Description
Reading code
Program traces
Debuggers
Cleaning code
281
Detecting errors
Reverse calculations
Performing a calculation
in reverse and
recalculating the input
value from the generated
output, to check that the
input values match.
Alternative algorithms
Using an alternative
algorithm to generate the
same output, and check
that the output data
matches.
Consistent output
Data checking
Memory corruptions
Declarative code
282
Scanning structures
283
4.12. Documentation
4.12.1. System Documentation
During the development of a system, several documents
may be produced. This typically includes a Functional
Specification, which defines in detail the functions and
calculations that the system should perform.
Other documents, such as design documents may also be
produced.
System documentation can be used during maintenance,
and also during enhancements to a system.
284
(a)
Arithmetic
a+b
ab
a*b
a/b
a MOD b
-a
a^b
a&b
a<b
String
Relational
a <= b
a>b
a >= b
Logical
Boolean
Bitwise
Boolean
Addresses
Array
Reference
Subroutine
Call
Element
Reference
Assignment
Bracketed
Expression
Addition
Subtraction
Multiplication
Division
Modulus
Unary Minus
Exponentiation
Concatenation
Less Than
Less Than or Equal
To
Greater Than
a=b
Greater Than or
Equal To
Equality
a <> b
Inequality
a OR b
Logical Inclusive OR
a XOR b
Logical Exclusive
OR
a AND b
Logical AND
NOT a
Logical NOT
a OR b
Bitwise Inclusive OR
a XOR b
a AND b
Bitwise Exclusive
OR
Bitwise AND
NOT a
ref a
deref a
Bitwise NOT
Reference
Dereferece
a[b]
a(b)
a.b
a=b
Assignment
a
a added to b
b subtracted from a
a multiplied by b
a divided by b
a b * int( a / b )
The negative of a
ab a raised to the power of b
b appended to a
True if a is less than b,
otherwise false
True if a is less than or equal to
b, otherwise false
True if a is greater than b,
otherwise false
True if a is greater than or
equal to b, otherwise false
True if a and b have the same
value, otherwise false
True if a and b have different
values, otherwise false
True if either a or b is true,
otherwise false
True if a is true and b is false,
or a is false and b is true,
otherwise false.
True if a and b are both true,
otherwise false
True if a is false, false if a is
true
1 is either a or b is 1, otherwise
0
1 if a is 1 and b is 0, or a is 0
and b is 1, otherwise 0.
1 if a is 1 and b is 1, otherwise
0
1 if a is 0, 0 if a is 1
The address of the variable a
The data value referred to by
pointer a
The element b within the array
a
A call of subroutine a, passing
parameter b
Item b within structure a
Set the value of variable a to
equal the value of expression b
285