Non Formatted Text
Non Formatted Text
gg| Cambridge
UNIVERSITY PRESS
CAMBRIDGE
International Examinations
UH.
a/t_oU>
Cambridge International
AS and A level
Computer
Science
Coursebook
Cambridge
UNIVERSITY PRESS
CAMBRIDGE
UNIVERSITY PRESS
www.cambridge.org
Information on this title: education.cambridge.org
© Cambridge University Press 2015
A catalogue record for this publication is available from the British Library
ISBN 978-1-107-54673-8 Paperback
(i) where you are abiding by a licence granted to your school or institution by the
Copyright Licensing Agency;
(ii) where no such licence exists, or where you wish to exceed the terms of a licence,
and you have gained the written permission of Cambridge University Press;
(iii) where you are allowed to reproduce without permission under the provisions
of Chapter 3 of the Copyright, Designs and Patents Act 1988, which covers, for
example, the reproduction of short passages within certain types of educational
anthology and reproduction for the purposes of setting examination questions.
The past paper questions on pages 107-108 and 316 are taken from the 9608 Specimen
papers 1 and 3 respectively
and are reproduced with the permission of Cambridge International Examinations.
All other examination-style questions and comments that appear in this book were written by
the authors.
Contents
Contents
Introduction v
Chapter 3 Hardware 35
Glossary 430
Index 434
Acknowledgements 442
Introduction
This full-colour, illustrated textbook has been written by experienced authors specifically for
the Cambridge International AS and A Level Computer Science syllabus ( 9608 ).
The presentation of the chapters in this book reflects the content of the syllabus:
• The book is divided into four parts, each of which is closely matched to the corresponding
part of the syllabus.
• Each chapter defines a set of learning objectives which closely match the learning
objectives set out in the syllabus.
• The syllabus defines two assessment objectives: A 01 Knowledge with understanding and
A 02 Skills. Papers 1 and 3 have a major focus on A01 and Papers 2 and 4 have a major
focus on A 02 . The chapters in Parts 1 and 3 have been written with emphasis on the
promotion of knowledge and understanding. The chapters in Parts 2 and 4 have been
written with an emphasis on skill development.
The chapters in Parts 1 and 3 have a narrative. We would encourage students to read the
whole chapter first before going back to revisit the individual sections.
The chapters in Parts 2 and 4 contain many more tasks. We would encourage students to
approach these chapters step-by-step. Whenever a task is presented, this should be carried
out before progressing further.
Chapter 1
Information Representation
Learning objectives
vi
% int«
0=
Task - exercises
Question:
Construct a partial drawing list for the graphic shown in figure 1.06. You can take
measurements from the image and use the bottom left corner of the box as the origin of a
coordinate system. You can invent your own format for the drawing list.
Discussion Point
What is the two’s complement of the binary value 1000? Are you surprised by this?
Extension Question:
Graphic files can be stored in a number of formats. For example, JPEG, GIF, PNG and TIFF
are
just a few of the possibilities. What compression techniques, if any, do these use?
Extension Question - extended questions for
consideration of more advanced aspects or topics
beyond the immediate scope of the Cambridge
International AS and A Level syllabus.
Or
For multiples of bytes, the terminology used has recently changed. Traditionally, computer
scientists have used the terminology kilobyte, megabyte, gigabyte etc. in a way that
conflicted
with the definition of these prefixes established by the International System of Units (SI).
Following the SI convention, one kilobyte would represent 1000 bytes. Computer scientists
have used one kilobyte to represent 1024 bytes. There have been a number of variations on
how this was written, for example Kbyte, KB or kB but the basic contradiction remained. In
order to resolve this unsatisfactory situation, the International Electrotechnical Commission
(IEC) in 1998 proposed a new set of definitions for such quantities. 1024 bytes is now
identified
as one kibibyte where the kibi can be considered as representing kitobinary. This proposal
has
been accepted by other international standards bodies.
Method 1. Convert to the corresponding positive binary number then find the denary
value
Converting to two’s complement leaves unchanged the 1 in the least significant bit
position then changes all of the remaining bits to produce 01001111. _
Exam-style Questions
1 A file contains binary coding. The following are two successive bytes in the file:
10010101 00110011
One possibility for the information stored is that the two bytes together represent one
unsigned integer binary
number.
i Give the denary number corresponding to this. Show your working. [2]
ii Give the hexadecimal number corresponding to this. Show your working. [2]
Chapter 1
Information Representation
Learning objectives
As a child we first encounter numbers when learning to count. Specifically we learn to count
using 1,2,3,4,5,6,7,8,9,10. These are natural numbers expressed in what can be described
as the denary, decimal or base-10 system of numbers. Had we learned to count using 0,1,2,
3,4,5,6,7,8,9 we would have more clearly understood that the number system was base-10
because there are 10 individual, distinct symbols or digits available to express a number.
A little later we learn that the representation of a number has the least significant digit at the
right-hand end. For example, writing a denary number as 346 has the meaning:
3 x 10 2 + 4 x 10 1 + 6 x 10°
All computer technology is engineered with components that represent or recognise only
two states. For this reason, familiarity with the binary number system is essential for an
understanding of computing systems. The binary number system is a base-2 system which
uses just two symbols, 0 and 1. These binary digits are usually referred to as ‘bits’.
All data inside a computer system are stored and manipulated using a binary code.
However,
if there is ever a need to document some of this binary code outside of the computer system
it is not helpful to use the internal code.
Hexadecimal numbers are in the base-16 system and therefore require 16 individual
symbols
to represent a number. The symbols chosen are 0-9 supplemented with A-F. A few examples
of the hexadecimal representation of binary numbers represented by eight bits are shown in
Table 1.01.
Binary
Hexadecimal
Denary
00001000
08
00001010
0A
10
00001111
OF
15
11111111
FF
255
Table 1.01 Hexadecimal representations of binary numbers and the denary values
Note that each grouping of four bits is represented by one hexadecimal symbol. Also note
that it is common practice to include leading zeros in a hexadecimal number when used in
this way.
Question 1.01
To convert a binary number to a denary number the straightforward method is to sum the
individual position values knowing that the least significant bit represents 2°, the next one 2
1
and so on. This is illustrated by conversion of the binary number 11001 as shown in Figure
1.01.
Position values
2 4 = 16
00
II
ro
<N
22=4
21=2
iH
II
o
CM
Binary digits
Starting from the least significant bit, the denary equivalent is 1 + 0 + 0 + 8 + 16 = 25.
An alternative method is to use the fact that 1 x 16 is equal to 2 x 8 and so on. To carry out
the conversion you start at the most significant bit and successively multiply by two and add
the result to the next digit:
1x2=2
add 2 to 1, then 2x3=6
add 6 to 0, then 2 x 6 = 12
add 12 to 0, then 2 x 12 = 24
add 24 to 1 to give 25.
When converting a denary number to binary the procedure is successive division by two
with the remainder noted at each stage. The converted number is then given as the set of
remainders in reverse order.
246 * 2
123 - 2
61 t 2
30 t 2
15-2
7-2
3-2
1-2
Thus the binary equivalent of denary 246 is 11110110. As a check that the answer is sensib
you should remember that you are expecting an 8-bit binary number because the largest
denary number that can be represented in seven bits is 2 7 — 1 which is 127. Eight bits can
represent values from 0 to 2 8 -1 which is 255.
123 with remainder 0
61 with remainder 1
30 with remainder 1
15 with remainder 0
7 with remainder 1
3 with remainder 1
1 with remainder 1
0 with remainder 1
To convert a hexadecimal number to binary, each digit is treated separately and converted
into a 4-bit binary equivalent, remembering that F converts to 1111, E converts to 1110 and
so on. Subsequent conversion of the resulting binary to denary can then be done if needed.
To convert a binary number to hexadecimal you start with the four least significant bits
and convert them to one hexadecimal digit. You then proceed upwards towards the most
significant bit, successively taking groupings of four bits and converting each grouping to the
corresponding hexadecimal digit.
TASK 1.01
The discussion here relates only to the coding of integer values. The coding of non-integer
numeric values (real numbers) is considered in Chapter 16 (Section 16.03).
It is convenient at this point to emphasise that the coding used in a computer system is
almost exclusively based on bits being grouped together with eight bits representing a byte.
A byte, or a group of bytes, might represent a binary value but equally might represent a
code. For either case, the right-hand bit is referred to as the least significant and the left-
hand
bit as the most significant or top bit. Furthermore, the bits in a byte are numbered right to left
starting at bit 0 and ending at bit 7.
to
KEY TERMS
Byte: a group of eight bits treated as a single unit
Computers have to store integer values fora number of purposes. Sometimes the
requirement is only for an unsigned integer to be stored. However, in many cases a signed
integer is needed where the coding has to identify whether the number is positive or
negative.
An unsigned integer can be stored simply as a binary number. The only decision to be made
is how many bytes should be used. If the choice is to use two bytes (16 bits) then the range
of
values that can be represented is 0 to 2 16 -1 which is 0 to 65535.
If a signed integer is to be represented, the obvious choice is to use one bit to represent
the + or - sign. The remaining bits then represent the value. This is referred to as ‘sign and
magnitude representation’. However, there are a number of disadvantages in using this
format.
The approach generally used is to store signed integers in two’s complement form. Here we
need two definitions. The one’s complement of a binary number is defined as the binary
number obtained if each binary digit is individually subtracted from 1 which, in practice,
means that each 0 is switched to 1 and each 1 switched to 0. The two’s complement is
defined as the binary number obtained if 1 is added to the one’s complement number.
KEY TERMS
One’s complement: the binary number obtained by subtracting each digit in a binary number
from 1
Two’s complement: the one's complement of a binary number plus 1
If you need to convert a binary number to its two’s complement form you can use the
method indicated by the definition but there is a quicker method. For this you start at the
[east significant bit and move left ignoring any zeros up to the first 1 which is also ignored.
Any remaining bits are then changed from 0 to 1 or from 1 to 0.
For example, expressing the number 10100100 in two’s complement form leaves the right-
hand 100 unchanged then the remaining 10100 changes to 01011 so the result is 01011100.
The differences between a sign and magnitude representation and a two’s complement
representation are illustrated in Table 1.02. For simplicity we consider only the values that
can be stored in four bits (referred to as a ‘nibble’).
Two’s complement
representation
+7
0111
0111
+6
0110
0110
+5
0101
0101
+4
0100
0100
+3
0011
0011
+2
0010
0010
+1
0001
0001
+0
0000
0000
-0
1000
Not represented
-1
1001
1111
-2
1010
1110
-3
1011
1101
-4
1100
1100
-5
1101
1011 *
-6
1110
1010
-7
1111
1001
-8
Not represented
1000
There are several points to note here. The first is that sign and magnitude representation
has
a positive and a negative zero which could cause a problem if comparing values. The
second,
somewhat trivial, point is that there is an extra negative value represented in two’s
complement.
The third and most important point is that the representations in two’s complement are
such that starting from the lowest negative value each successive higher value is obtained
by
adding 1 to the binary code. In particular, when all digits are 1 the next step is to roll over to
an all-zero code. This is the same as any digital display would do when each digit has
reached
its maximum value.
It can be seen that the codes for positive values in the two’s complement form are the same
as the sign and magnitude codes. However, this fact rather hides the truth that the two’s
complement code is self-complementary. If a negative number is in two’s complement form
then the binary code for the corresponding positive number can be obtained by taking the
two’s complement of the binary code representing the negative number.
TASK 1.02
Take the two’s complement of the binary code for -5 and show that you get the code for + 5 .
Method 1. Convert to the corresponding positive binary number then find the denary
value
Converting to two’s complement leaves unchanged the 1 in the least significant bit
position then changes all of the remaining bits to produce 01001111.
Parti
2x1
18
38
78
Method 2. Sum the individual position values but treat the most significant bit as a
negative value
From the original binary number 10110001 this produces the following:
2 7 + 0 + 2 5 + 2 4 +0 + 0 + 0 + l =
Discussion Point:
What is the two's complement of the binary value 1000? Are you surprised by this?
One final point to make here is that the reason for using two’s complement representations
is to simplify the processes for arithmetic calculations. The most important example of this is
that the process used for subtracting one signed integer from another is to convert the
number
being subtracted to its two’s complement form and then to add this to the other number.
TASK 1.03
Using a byte to represent each value, carry out the subtraction of denary 35 from denary 67
using binary arithmetic with two’s complement representations.
Binary coded decimal (BCD)
One exception to grouping bits in bytes to represent integers is the binary coded decimal
(BCD) scheme. If there is an application where single denary digits are required to be stored
or transmitted, BCD offers an efficient solution. The BCD code uses four bits (a nibble) to
represent a denary digit. A four-bit code can represent 16 different values so there is scope
for a variety of schemes. This discussion only considers the simplest BCD coding which
expresses the value directly as a binary number.
If a denary number with more than one digit is to be converted to BCD there has to be a
group of four bits for each denary digit. There are, however, two options for BCD; the first is
to store one BCD code in one byte leaving four bits unused. The other option is packed BCD
where two 4-bit codes are stored in one byte. Thus, for example, the denary digits 8503
could
be represented by either of the codes shown in Figure 1 . 02 .
00000101
00000000
00000011
00000011
There are a number of applications where BCD can be used. The obvious type of application
is where denary digits are to be displayed, for instance on the screen of a calculator or in a
digital time display. A somewhat unexpected application is for the representation of currency
values. When a currency value is written in a format such as $300.25 it is as a fixed-point
decimal number (ignoring the dollar sign). It might be expected that such values would be
stored as real numbers but this cannot be done accurately (this type of problem is discussed
in more detail in Chapter 16 (Section 16.03). One solution to the problem is to store each
denary digit in a BCD code.
0000 0000
0010 0110
0000 0000 I
1000 0101
0000 0000 I
10101011
In the first decimal place position, the 2 has been added to the 8 to get 10 but the BCD
scheme only recognises binary codes for a single-digit denary number so the addition has
failed. The same problem has occurred in the addition for the second decimal place values.
The result shown is ‘point ten eleven’, which is meaningless in denary numbers. The ‘carry’
of
a digit from one decimal place to the next has been ignored.
0.26
0.85
0000 0000
0010 0110
0000 0000
1000 0101
0000 0000
10101011
0110
10001
01110001
100010001
0000 0001
0001 0001
In Chapter 5 (Section 5.02) there is a brief discussion of how a processor can recognise
problems arising from arithmetic operations using numbers coded as binary values.
“
ASCII code
The scheme which has been used for the longest time is the ASCII (American Standard
Code
for Information Interchange) coding scheme. This is an internationally agreed standard.
There are some variations on ASCII coding schemes but the major one is the 7 -bit code. It
is
customary to present the codes in a table for which a number of different designs have been
used.
Table 1.03 shows an edited version with just a few of the codes. The first column contains
the binary code which would be stored in one byte, with the most significant bit set to
zero and the remaining bits representing the character code. The second column presents
the hexadecimal equivalent as an illustration of when it can be useful to use such a
representation.
Binary code
Hexadecimal equivalent
Character
Description
00000000
00
NUL
Null character
00000001
01
SOH
Start of heading
00000010
02
STX
Start of text
00100000
20
Space
00100001
21
Exclamation mark
00100100
24
Dollar
00101011
2B
Plus
00101111
2F
Forward slash
00110000
30
Zero
00110001
31
One
00110010
32
Two
01000001
41
Uppercase A
01000010
42
Uppercase B
01000011
43
Uppercase C
01100001
61
a
Lowercase a
01100010
62
Lowercase b
01100011
63
Lowercase c
The full table shows the 2 7 ( 128 ) different codes available for a 7 -bit code. You should not
try
to remember any of the individual codes but there are certain aspects of the coding scheme
which you need to understand.
Firstly, you can see that the majority of the codes are for printing or graphic characters.
However, the first few codes represent non-printing or control characters. These were
introduced to assist in data transmission or in entering data at a computer terminal. It is fair
to say that these codes have very limited use in the modern computer world so they need no
further consideration.
Secondly, it can be seen that the obvious types of character that could be expected to be
used in a text based on the English language have been included. Specifically there are
upper- and lower-case letters, punctuation symbols, numerals and arithmetic symbols in the
coding tables.
It is worth emphasising here that these codes for numbers are exclusively for use in the
context of stored, displayed or printed text. All of the other coding schemes for numbers are
for internal use in a computer system and would not be used in a text.
There are some special features that make the coding scheme easy to use in certain
circumstances. The first is that the codes for numbers and for letters are in sequence in
each
case so that, for example, if 1 is added to the code for seven the code for eight is produced.
The second is that the codes for the upper-case letters differ from the codes for the
corresponding lower-case letters only in the value of bit 5 . This makes conversion of upper
case to lower case, or the reverse, a simple operation.
Unicode
Despite still being widely used, the ASCII codes are far from adequate for many purposes.
For this reason new coding schemes have been developed and continue to be developed
further. The discussion here describes the Unicode schemes but it should be noted that
these have been developed in tandem with the Universal Character Set (UCS) scheme;
the only differences between these schemes are the identifying names given to them. The
aim of Unicode is to be able to represent any possible text in code form. In particular this
includes all languages in the world. However, Unicode is designed so that once a coding set
has been defined it is never changed. In particular, the first 128 characters in Unicode are
the ASCII codes.
Unicode has its own special terminology. For example, a character code is referred to as
a ‘code point'. In any documentation there is a special way of identifying a code point. An
example is U +0041 which is the code point corresponding to the alphabetic character A.
The
0041 are hexadecimal characters representing two bytes. The interesting point is that in a
text where the coding has been identified as Unicode it is only necessary to use a one-byte
representation for the 128 codes corresponding to ASCII. To ensure such a code cannot be
misinterpreted, the codes where more than one byte is needed have restrictions applied.
Figure 1.05 shows the format used for a two-byte code.
11 ??????
10 ??????
The most significant bit for an ASCII code is always 0 so neither of the two-byte
representations here can cause confusion.
1.04 Images
Images can be stored in a computer system for the eventual purpose of displaying the image
on a screen or for presenting it on paper, usually as a component of a document. Such an
image can be created by using an appropriate drawing package. Alternatively, when an
image
already exists independently of the computer system, the image can be captured by using
photography or by scanning.
Parti
Vector graphics
KEY TERMS
The most important property of a vector graphic image is that the dimensions of the objects
are not defined explicitly but instead are defined relative to an imaginary drawing canvas. In
other words, the image is scalable. Whenever the image is to be displayed the file is read,
the
appropriate calculations are made and the objects are drawn to a suitable scale. If the user
then requests that the image is redrawn at a larger scale the file is read again and another
set
of calculations are made before the image is displayed. This process cannot of itself cause
distortion of the image.
11
TASK 1.04
Construct a partial drawing list for the graphic shown in Figure 1 . 06 . You can take
measurements from the image and use the bottom left corner of the box as the origin of a
coordinate system. You can invent your own format for the drawing list.
A vector graphic file can only be displayed directly on a graph plotter, which is an expensive
specialised piece of hardware. Otherwise the file has to be converted to a bitmap before
presentation.
Bitmaps
The fundamental concept underlying the creation of a bitmap file is that the picture
element (pixel) is the smallest identifiable component of a bitmap image. The image is
stored as a two-dimensional matrix of pixels. The pixel itself is a very simple construct; it has
a position in the matrix and it has a colour.
12
KEYTERMS
Picture element (pixel): the smallest identifiable component of a bitmap image, defined by
just two
properties: its position in the bitmap matrix and its colour
The other decision that has to be made concerns the resolution of the image which can be
represented as the product of the number of pixels per row times the number of rows. When
considering resolution it is important to distinguish between the resolution of a stored image
and the resolution of a monitor screen that might be used to display the image. Both of these
have to be considered if a screen display is being designed.
From the above discussion it can be seen that a bitmap file does not define the physical
size of a pixel or of the whole image. The image is therefore scalable but when the image
is scaled the number of pixels in it does not change. If a well-designed image is presented
on a suitable screen the human eye cannot distinguish the individual pixels. However, if
the image is magnified too far the quality of the display will deteriorate and the individual
pixels will be evident. This is illustrated in Figure 1.07 which shows an original small image,
a
magnified version of this small image and a larger image created with a more sensible,
higher
resolution.
Figure 1.07 (a) a bitmap logo; (b) an over-magnified version of the image; (c) a sensible
larger version
The above account has considered the two approaches for storing images and when they
are
appropriate.
File size is always an issue with an image file. A large file occupies more memory space and
takes longer to display or to be transmitted across a network. A vector graphic file will have a
smaller size than a corresponding bitmap file. A bitmap file has to store the pixel data but the
file must also have a header that defines the resolution of the image and the coding scheme
for the pixel colour.
You can calculate the minimum size (the size not including the header) of a bitmap file
knowing the resolution and the colour depth. As an example, consider that a bitmap file is
needed to fill a laptop screen where the resolution is 1366 by 768. If the colour depth is to be
24 then the number of bits needed is:
The result of this calculation shows the number of bits but a file size is always quoted as a
number of bytes or multiples of bytes. Thus our file size could be quoted as:
KEYTERMS
TIP
For multiples of bytes, the terminology used has recently changed. Traditionally, computer
scientists have used the terminology kilobyte, megabyte, gigabyte etc. in a way that
conflicted
with the definition of these prefixes established by the International System of Units (SI).
Following the SI convention, one kilobyte would represent 1000 bytes. Computer scientists
have used one kilobyte to represent 1024 bytes. There have been a number of variations on
how this was written, for example Kbyte, KB or kB but the basic contradiction remained. In
order to resolve this unsatisfactory situation, the International Electrotechnical Commission
(IEC) in 1998 proposed a new set of definitions for such quantities. 1024 bytes is now
identified
as one kibibyte where the kibi can be considered as representing kilobinary. This proposal
has
been accepted by other international standards bodies.
1.05 Sound
Natural sound consists of variations in pressure which are detected by the human ear.
Atypical
sound contains a large number of individual waves each with a defined frequency. The result
is
a wave form in which the amplitude of the sound varies in a continuous but irregular pattern.
If there is a need to store sound or transmit it electronically the original analogue sound
signal has to be converted to a binary code. A sound encoder has two components. The first
is a band-limiting filter. This is needed to remove high-frequency components. The ear would
not be able to detect these and they could cause problems for the coding if not removed.
The
other component in the encoder is an analogue-to-digital converter (ADC).
The method of operation of the ADC is described with reference to Figure 1.08. The
amplitude
of the wave (the red line) has to be sampled at regular intervals. The blue vertical lines
indicate
the sampling times. The amplitude cannot be measured exactly; instead the amplitude is
approximated by the closest of the defined amplitudes represented by the horizontal lines. In
Figure 1.08, sample values 1 and 4 will be an accurate estimate of the actual amplitude
because
the wave is touching an amplitude line. In contrast, samples 5 and 6 will not be accurate
because
the actual amplitude is approximately half way between the two closest defined values.
14
Sound
amplitude
Time
■>
In practice, for coding sound, two decisions have to be made. The first is the number of
bits to be used to store the amplitude values, which defines the sampling resolution. If only
three bits are used then eight levels can be defined as shown in Figure 1.08. If too few are
used there will be a significant quantisation error. In practice 16 bits will provide reasonable
accuracy for the digitised sound.
The other decision concerns the choice of the sampling rate, which is the number of
samples
taken per second. This should be in accordance with Nyquist’s theorem which states that
sampling must be done at a frequency at least twice the highest frequency in the sample.
Once again file size can be an issue. Clearly an increased sampling rate and an increased
sampling resolution will both cause an increase in file size.
Simply recording sound and storing a digital representation is not enough for many
applications. Once a digital representation of the sound has been stored in a file, it can be
manipulated using sound-editing software. This will typically have features for:
1.06 Video
The emphasis here is on the visual aspect of a video recording and, in particular, how the
image is displayed on a screen. It might be imagined that a video would be stored very
simply as a succession of still images or frames and the only concern would be the frame
rate
defined as the number of frames displayed per second. In practice the issues are far more
complex. They have not been made any more simple by the recent changes that have taken
place with regards to screen technology.
The basic principle of operation is that the display of an individual frame is created line by
line. One of the issues is the choice of resolution. The resolution can be defined in terms
of the number of lines per frame and the number of pixels per line. There needs to be
compatibility between the resolution of the stored image and the resolution of the display
screen. However, the technology used has to be chosen with regard to the sensitivity of the
human eye. One constraint is that unless the screen is refreshed at least 50 times per
second
Chapter 1: Information Representation
Parti
the eye will notice the flicker. However, provided that the refresh rate is 25 times per second
i the eye cannot see that any motion on the screen is not actually continuous.
The traditional solution to this problem has been to use interlaced encoding. This was used
t in television broadcasting and then adapted for video recordings. The image for each frame
’ is split into two halves, one containing the odd numbered lines and the other the even. The
first half is displayed completely then the second half follows. This produces what appears to
the eye as being a high refresh rate but is halving the transmission bandwidth requirements.
The alternative approach is to use progressive encoding where a full frame is displayed
each
r time. As improved transmission bandwidths become more generally available it is likely that
multimedia content
For another time the issue of file size will be discussed, this time in the context of starting
f with a file that needs to have its size reduced to reduce memory storage requirements and
There are two categories of compression. The first is lossless compression where the file
size is reduced but no information is lost and when necessary the process can be reversed
to
re-create the original file. The second is lossy compression where the file size is reduced
with some loss of information and the original file can never be recovered. In many
applications a combination of lossless and lossy methods may be used.
KEY TERMS
Lossy compression: coding techniques that cause some information to be lost so that the
exact
original file cannot be recovered in subsequent decoding
15
If a file contains text then compression must be lossless because it is
not sensible to allow any loss of information. One possible compression
method would be Huffman coding. The procedure used to carry out the
compression is quite detailed but the principle is straightforward. Instead
of having each character coded in one byte an analysis is carried out to find
the most often used characters. These are then given shorter codes. The
original stream of bytes becomes a bit stream. A possible set of codes if a
text contained only eight different letters is shown in Table 1.04.
The important point to note here is the prefix property. None of the codes
begins with the sequence of bits representing a shorter code. Thus there
Lossy compression can be used in circumstances where a sound file or an image file can
have
some of the detailed coding removed or modified when it is likely that the human ear or eye
will hardly notice any difference. One example would be to reduce the colour depth for the
coding of a bitmap.
Code
Character
10
01
111
110
0001
1
0000
0011
0010
Graphic files can be stored in a number of formats. For example, JPEG, GIF, PNG and TIFF
are
just a few of the possibilities. What compression techniques, if any, do these use?
If the image coding for a video is to be compressed, one approach is to tackle the spatial
redundancy in individual frames using techniques applicable to an image file. Plowever, this
is unlikely to be an efficient technique because, in general, one frame is very similar to the
preceding one. It will be more effective to tackle this temporal redundancy by changing the
frame by frame coding to one which mainly records differences between adjacent frames.
A video contains images and sound but these do not go to the same part of any receiving
and
displaying system. Clearly the audio and visual parts of a video must be handled
independently
but in a way that guarantees synchronisation. The solution to this is to package the audio
and
visual components in what is known as a multimedia container format. This concept is
currently
being developed by several different organisations or companies. The use is not restricted to
one video file and one sound file. Rather, one multimedia container file will have many audio
and video streams plus other streams, perhaps for subtitles or chapter headings.
SuQgEI
• ASCII and Unicode are standardised coding schemes for text characters.
• An image can be stored either in a vector graphic file or in a bitmap file.
Exam-style Questions
1 A file contains binary coding. The following are two successive bytes in the file:
10010101
00110011
a One possibility for the information stored is that the two bytes together represent one
unsigned integer binary
number.
[2]
[2]
[1]
Parti
c Another possibility for the information stored is that the two bytes individually represent two
signed integer
binary numbers in two’s complement form.
i State which byte represents a negative number and explain the reason for your choice.
ii Give the denary number corresponding to each byte. Show your working. [ 3 ]
d Give two advantages from representing signed integers in two’s complement form rather
than using a sign and
magnitude representation. . pj
e Give three different examples of other options for the types of information that could be
represented by two
bytes. For each example, state whether a representation requires two bytes each time, just
one byte or only
part of a byte each time. [ 3 ]
a If the designer has some images stored in files there are two possible formats for the files.
i Describe the approach used if a graphic is stored in a vector graphic file. [2]
iii State which format gives better image quality if the image has to be magnified and explain
why. [2]
i If the resolution is to be 640 x 480 and the colour depth is to be 16, calculate an
approximate size for the
bitmap file. Show your working and express the size using sensible units. [2]
ii Explain one possible approach to lossy compression that could be used. [2]
3 An audio encoder is to be used to create a recording of a song. The encoder has two
components,
ii Two important factors associated with the use of an ADC are the sampling rate and the
sampling
resolution. Explain the two terms. Use a diagram if this will help your explanation. [5]
b The other component of an audio encoder has to be used before the ADC is used.
process. Describe two techniques that the sound-editing software could provide. [3]
Chapter 2
Learning objectives
Parti
Cable
The options for a cable are twisted pair, coaxial or fibre-optic. (The first two use copper
for the transmission medium.) In discussing suitability for a given application there are a
number of factors to consider. One is the cost of the cable and connecting devices. Another
is the bandwidth achievable, which governs the possible data transmission rate. There are
then two factors that can cause poor performance: the likelihood of interference affecting
transmitted signals and the extent of attenuation (deterioration of the signal) when high
frequencies are transmitted. These two factors affect the need for repeaters or amplifiers in
transmission lines. Table 2.01 shows some comparisons of the different cable types.
Twisted pair
Coaxial
Fibre-optic
Cost
Lowest
Higher
Highest
Lowest
Higher
Much higher
Affected
Most affected
Least affected
interference
Worst affected
Less affected
Least affected
More often
More often
Less often
metallic shield
Figure 2.02 (a) Coaxial cable and (b) a bundled fibre-optic cable
Cambridge International AS and A level Computer Science
20
Wireless
The alternative to cable is wireless transmission. The three options here are radio,
microwave
or infrared, which are all examples of electromagnetic radiation; the only intrinsic difference
between the three types is the frequency of the waves.
When making a choice of which wireless option to use, all of the factors discussed when
comparing cable media need to be considered again. In addition, the ability for the radiation
to transmit through a solid barrier is an important factor. Also the extent to which the
transmission can be focused in a specific direction needs to be considered. Figure 2.03
shows the approximate frequency ranges for the three types of radiation. The factors listed
on the left increase in the direction of the arrow, so the bandwidth increases through radio
and microwave to infrared but the ability of the waves to penetrate solid objects is greatest
for radio waves. Interference is not consistently affected by the frequency.
Frequency range
Radio
Microwave
Infrared
3KHz-3GHz
3-300GHZ
300GHz-400THz
Figure 2.03 Frequency ranges and frequency dependency of factors affecting wireless
transmission
The increased attenuation for infrared transmission, which has the highest frequency, leads
to it only being suitable for indoor applications. The fact that it will not penetrate through
a wall is then of benefit because the transmission cannot escape and cause unwanted
interference elsewhere. For most applications, microwave transmission is the option of
choice with the improvement in bandwidth being the determining factor.
It is worth noting that cables are often referred to as ‘guided media’ and wireless
as‘unguided
media’. This is slightly misleading because only radio wave transmission fits this description.
There are a number of points to make when considering the relative advantages of
• Outside these frequencies, no permission is needed to use the air for transmission but
cables can only be laid in the ground with the permission of landowners.
• For global communications, the two competing technologies are transmission through
fibre-optic cables laid underground or on the sea bed and satellite transmission
(discussed in Section 2.02); currently neither of these technologies is dominant.
• Interference is much more significant for wireless transmission and its extent is
dependent on which frequencies are being used for different applications.
• Mobile (cell) phones now dominate Internet use and for these only wireless transmission
is possible.
• For home or small office use, wired or wireless transmission is equally efficient; the lack
of cabling requirement is the one factor that favours wireless connections for a small
network.
Prior to the existence of the Internet there were two major periods of networking
development. The first occurred in the 1970s when what are now referred to as wide area
networks (WANs) were created. The ARPANET in the USA is the one usually mentioned first
in this context. The second period of development was triggered by the arrival of the PC in
the 1980s which led to the creation of the first examples of what are now referred to as local
area networks (LANs). These developments continued into the 1990s (with, along the way,
the addition of metropolitan networks (MANs)) but most importantly with the increasing aim
of connecting up what were originally designed and created as independent, stand-alone
networks. The era of internetworking had arrived and, in particular, the Internet started to
take shape.
It is important to understand that the Internet is not a WAN; it is the biggest internetwork in
existence. Furthermore, it has never been designed as a coherent entity; it has just evolved
to reach its current form and is still evolving to whatever future form it will take. One of the
consequences of the Internet not having been designed is that there is no agreed definition
of its structure. However, there is a hierarchical aspect to the structure particularly with
respect to the role of an Internet Service Provider (ISP). The initial function of the ISP was
to give Internet access to an individual or company. This function is now performed by
what may be described as an ‘access ISP’. Such ISPs might then connect to what might be
called ‘middle tier’ or regional ISPs which in turn are connected to tier 1 ISPs which may
alternatively be termed ‘backbone’ ISPs. An ISP is a network and connections between ISPs
are handled by Internet Exchange Points (IXPs). The other networks which can be
considered
to share the top of the hierarchy with tier 1 ISPs are the major content providers.
Discussion Point:
How many ISPs or major Internet providers are you familiar with?
Communication systems not originally designed for computer networking provide significant
infrastructure support for the Internet. The longest standing example is what is often referred
to as POTS (plain old telephone service) but is more formally described as a PSTN (public
switched telephone network). At the time of the early period of networking the telephone
network carried analogue voice data but digital data could be transmitted provided that
a modem was used to convert the digital data to analogue with a further modem used to
reverse the process at the receiving end. A dial-up network connection was available which
provided modest-speed, shared access when required. However, an organisation could
instead pay for a leased line service which would provide a dedicated link with guaranteed
transmission speed which was permanently connected. Typically, organisations have made
use of leased lines to establish MANs or WANs.
More recently, the PSTNs have upgraded their main communication lines to fibre-optic cable
employing digital technology. This has allowed them to offer improved leased line services to
ISPs but has also given them the opportunity to provide their own ISP services. In this guise
22
they provide two types of connectivity service. The first is a broadband network connection
for traditional network access. The second is WiFi hotspot technology, in which a public
place or area is equipped with an access point which has a connection to a wired network
that provides Internet access. Mobile devices in the vicinity of the access point can connect
to it wirelessly and from this connection gain Internet access.
The highest altitude satellites are in geostationary Earth orbit (GEO) over the equator and
these
are used to provide long-distance telephone and computer network communication. Only
three GEO satellites are needed for full global coverage. Closer to Earth are a group of
medium-
Earth-orbit (MEO) satellites some of which provide the global positioning system (GPS). Ten
MEO satellites are needed for global coverage. Finally, low-Earth-orbit (LEO) satellites work
in
‘constellations’ to supplement the mobile phone networks. Fifty LEO satellites are needed for
full global coverage but currently there are several hundreds of them up there.
Because of its height above the ground a satellite has the advantage that it can act as a
component in a network and can connect with other components that are separated by
greater distances than would be possible if only ground-based components were used. The
disadvantage is that the greater transmission distance causes transmission delays which
can
cause problems for the underlying technology supporting network operation.
Altitude (km)
35786
15000
5000
///
///
///
EARTH
GEO
MEO
LEO
It is common practice to talk about ‘using the web’ or‘using the Internet’ as though these
were just two different ways of saying the same thing. This is not true. The Internet is, as
has been described above, an internetwork. By contrast, the World Wide Web (WWW) is a
distributed application which is available on the Internet.
Specifically, the web consists of an enormous collection of websites each having one or
more
web pages. The special feature of a web page is that it can contain hyperlinks which, when
clicked, give direct and essentially immediate access to other web pages.
Although the Internet has a structure which is in part hierarchical it is at heart a mesh
structure. The device that acts as a node in this mesh is the router. Routers are found in
what can be described as the backbone fabric of the Internet as well as in the ISP networks.
The details of how a router works are discussed in Chapter 17 (Sections 17.03 and 17.04).
Parti
At the periphery of the Internet there are different types of network. Whenever networks of a
► different underlying technology need to communicate, the device needed is a gateway.
Part
r of the functionality provided by a gateway can be the same as that provided by a router.
One definition of a server is a specialised type of computer hardware designed to provide
functionality when connected to a network. A server does not contribute to the functioning
of the network itself but, rather, it is a means of providing services via the network. In the
context of the Internet, a server may act as any of the following:
KEY TERMS
File server functionality is very often provided by what is called a ‘server farm’, in which a
very
large numbers of servers work together in a clustered configuration. Tier 1 content providers
use server farms and they are also used in the provision of cloud storage, which an ISP can
offer as part of its service portfolio.
One example of the use of a proxy server is when a web server could become overwhelmed
by web page requests. When a web page is requested for the first time the proxy server
saves
a copy in a cache. Then, whenever a subsequent request arrives, it can provide the web
page without having to search through the filestore of the main server. At the same time a
proxy server can act as a firewall and provide some security against malicious attacks on the
server Security is discussed further in Chapter 8 (Section 8.02).
Following the arrival of the PC in the 1980s it was soon realised that the use of stand-alone
PCs was not viable in any large organisation. In order to provide sufficient resource to any
individual PC it had to be connected to a network. Initially servers were used to provide
extra facilities that the PCs shared (such as filestore, software applications or printing). A
further development was the implementation of what came to be known as the ‘client-
server’ architecture. At the time, the traditional architecture of a mainframe computer with
connected terminals was still in common use and the client-server approach was seen
as a competitor in which networked PCs (the clients) had access to one or more powerful
! minicomputers acting as servers.
k another part. In order for the client and server to cooperate, software called ‘middleware’
has
to be present. This basic concept still holds in present-day client-server applications but the
23
The server is now a ‘web server’ which is a suite of software that can be installed on virtually
any computer system. A web server provides access to a web application. The client is the
web browser software. The middleware is now the software that supports the transmission of
data across a network together with the provision for scripting (see Section 2.09).
It is worth emphasising that the original uses of the web involved a browser displaying web
pages which contained information. There was provision for downloading of this information
but the web pages were essentially static. For a client-server application, the web page is
‘dynamic’ which means that what is displayed is determined by the request made by the
client. In this context, there is almost no limit to the variety of applications that can be
supported. The only requirement is that the application involves user interaction. The most
obvious examples of a client-server application can be categorised as ‘ecommerce’ where
a customer buys products online from a company. Other examples are: e-business, email,
searching library catalogues, online banking or obtaining travel timetable information. Most
applications require a ‘web-enabled’ database to be installed on the server or accessible
from the server. In contrast, the monthly payroll run typifies the type of application which is
unsuitable for implementation as a dynamic web application and will continue to be handled
by batch processing.
24
Streaming media are a major component of the use of the Internet for leisure activities like
listening to music or watching a video. Before discussing such applications the use of the
term bit stream needs an explanation. In general, data prior to transmission is stored in
bytes and it is possible to transmit this as a ‘byte stream’. However, streamed media is
always
compressed using techniques discussed in Chapter 1 (Section 1.07). Some compression
techniques involve converting each byte into a representation with fewer bits. Thus, to allow
the decoding process at the receiver end to work properly, the data must be transferred as a
bit stream. So, to summarise, any reference to streaming media would normally imply that
bit
streaming is used.
For one category of streaming media, the source is a website that has the media already
stored. One option in this case is for the user to download a file then listen to it or watch it at
some future convenient time. However, when the user does not wish to wait that long there
is the streaming option. This option is described as viewing or listening on demand. In this
case the delivery of the media and the playing of the media are two separate processes. The
incoming media data are received into a buffer created on the user’s computer. The user’s
machine has media player software that takes the media data from the buffer and plays it.
The other category of streaming media is real-time or live transmission. In this case the
content is being generated as it is being delivered such as when viewing a sporting event. At
the receiver end the technology is the same as before. The major problem is at the delivery
end because a very large number of users may be watching simultaneously. The way
forward
now is to transmit the media initially to a large number of content provider servers which
then transmit onwards to individual users.
A crucial point with media streaming is whether the technology has sufficient power to
provide a satisfactory user experience. When the media is created it is the intention that the
media is to be delivered to the user at precisely the same speed as used for the creation; a
song that lasted four minutes when sung for the recording will sound very peculiar if, when
it is received by a user, it lasts six minutes. More specifically, the process of delivering the
content will be quantified by the bit rate. For example, a relatively poor-quality video can
be delivered at a bit rate of 300 kbps but a reasonably good-quality audio file only requires
delivery at 128 kbps. Figure 2.05 shows a simple schematic diagram of the components
involved in the streaming.
User's computer
The bit rate for delivery to the user from the buffer must match the defined rate for the
specific media in use but the planned transmission rate to the buffer should be higher to
allow for unexpected delays. These rates are controlled by the media player by continuous
monitoring of the extent of filling of the buffer in relation to the defined high- and low-water
marks. It is essential to have a buffer size that is sufficiently large for it never to get filled.
The rate of transmission to the buffer is limited by the bandwidth of the network
connection. For a connection via a PSTN, a broadband link is essential. For good-quality
movie presentation the broadband requirement is about 2.5 Mbps. Because this will not
be available for all users it is often the practice that an individual video is made available
at different levels of compression. The most highly compressed version will be the poorest
quality but the bit rate may be sufficiently low for a reasonable presentation with a relatively
low bandwidth Internet connection.
25
TASK 2.01
Consider a bit-streaming scenario for a video where the following values apply:
Assume that the video is playing and that the buffer content has dropped to the low-water
mark. The media player sets the controls for data input to begin again.
Calculate the amount of data that will be input in two seconds to the buffer and the amount
of
data that will be removed from the buffer in the same time period.
From this data, estimate when the buffer will have filled up to the high-water mark.
Assuming that the incoming transmission is halted at this time, calculate how long it will be
before the buffer content has again fallen to the low-water mark level.
2.07 IP addressing
The functioning of the Internet is based on the implementation of the TCP/IP protocol suite
as will be explained in Chapter 17 (Section 17.04). One aspect of this is IP addressing which
is
used to define from where and to where data is being transmitted.
IPv4 addressing
Currently the Internet functions with IP version 4 (IPv4) addressing. The reason for the
strange name is of no consequence but the fact that this was devised in the late 1970s
is of considerable consequence. Had the PC and the mobile phone not been invented,
the scheme would be still sufficient for needs. Unfortunately for this scheme, these
developments did take place and have come to dominate Internet usage.
The IPv4 addressing scheme is based on 32 bits (four bytes) being used to define an IPv4
address. It is worth putting this into context. The 32 bits allow 2 32 different addresses. For
big
numbers like this it is worth remembering that 2 10 is approximately 1000 in denary so the
32
bits provide for approximately four billion addresses. The population of the world is about
seven billion and it is estimated that approaching half of the world’s population has Internet
access. From this we can see that if there was a need to supply one IP address per Internet
user the scheme would just about be adequate. However, things are not that simple.
KEY TERMS
The original addressing scheme was designed on the basis of a hierarchical address with
a group of bits defining a network (a netID) and another group of bits defining a host on
that network (a hostID). The aim was to assign a unique universally recognised address for
each device on the Internet. The separation into two parts allows the initial transmission
to be routed according to the netID. The hostID only needs to be examined on arrival at
the identified network. Before proceeding, it is important to note that the term ‘host’ is a
little misleading because some devices, particularly routers, have more than one network
interface and each interface requires a different IP address.
The other feature of the original scheme was that allocated addresses were based on the
concept of different classes of networks. There were five classes but only the first three need
concern us here. The structures used for the addresses are shown in Table 2.02.
Class
Class identifier
Number of bits
for netID
Number of bits
for hostID
Class A
0
7
24
Class B
10
14
16
Class C
110
21
It can be seen from Table 2.02 that the most significant bit or bits identify the class. A group
of
the next most significant bits define the netID and the remaining, least significant, bits define
the hostID. The rationale was straightforward. The largest organisations would be allocated
to Class A. There could only be 2 7 i.e. 128 of these but there could be 2 24 distinct hosts for
each of them. This compared with 2 21 , approximately two million, organisations that could
be
allocated to Class C but each of these could only support 2 s i.e. 256 hosts.
Parti
The problems with this scheme arose once LANs supporting PCs became commonplace.
The number of Class B netIDs available was insufficient but if organisations were allocated
to
Class C the number of hostIDs available was too small. There have been a number of
different
modifications made available to solve this problem.
Before considering some of these, the representation used for an IP address needs to be
introduced. During transmission, the technology is based on the 32-bit binary code for
the address; for documentation purposes, a dotted decimal notation is used. Each byte is
written as the denary equivalent of the binary number represented by the binary code. For
example, the 32 bit code:
10000000 00001100 00000010 00011110
is written in dotted decimal notation as:
128.12.2.30
The first approach developed for improving the addressing scheme is called ‘classless inter¬
domain routing’ (CIDR). This retains the concept of a netID and a hostID but removes the
rigid
structure and allows the split between the netID and the hostID to be varied to suit individual
need. The simple method used to achieve this is to add an 8-bit suffix to the address that
specifies the number of bits for the netID. If, for instance, we define the suffix as 21, that
means that 21 bits are used for the netID and there are 11 bits remaining (of a 32-bit
address)
to specify hostIDs allowing 2 n , i.e. 2048, hosts. One example of an IP address using this
scheme is shown in Figure 2.06. The 21 bits representing the netID have been highlighted.
The remaining 11 bits represent the hostID which would therefore have the binary value
1.
Binary code:
netID
suffix
It should be noted that with this scheme there is no longer any need to use the most
significant bit or bits to define the class. However, it does allow already existing Class A, B or
C
addresses to be used with suffixes 8,16 or 24, respectively.
TASK 2.02
Create an example of the binary code for a Class C address expressed in CIDR format. Give
the
corresponding dotted decimal representation.
Sub-netting
used. The organisation would need seven individual Class C netIDs. Each of these would
point to one of the LAN gateways (which have to function as routers). Each netID would be
associated with 256 hosts so an organisation with just 150 computer workstations would
leave 1642 IP addresses unused and unavailable for use by any other organisation.
Figure 2.07 Connecting LANs using the original classful IPv4 scheme
28
The sub-netting solution for this organisation would require allocating just one Class C netID.
For example, the IP addresses allocated might be 194.10.9.0 to 194.10.9.255 where the
netID
comprises the first three bytes, represented by the decimal values 194,10 and 9.
The sub-netting now works by having a defined structure for the 256 codes constituting the
hostID. A sensible solution for this organisation is to use the top three bits as a code for the
individual LANs and the remaining five bits as codes for the individual workstations. Figure
2.08 shows a schematic diagram of this arrangement.
On the Internet, all of the allocated IP addresses have a netID pointing to the router. The
router then has to interpret the hostID to direct the transmission to the appropriate host on
one of the LANS. For example:
• hostID code 00001110 could be the address for workstation 14 on the head office LAN
(LAN 000).
• hostID code 01110000 would be the address for workstation 16 on LAN 3 (LAN Oil).
Parti
With 150 workstations the organisation hasn’t used all of the 256 allocated IP addresses.
However, there are only 106 unused which is a reasonable number to have available in case
of future expansion.
The solution for dealing with the addressing is to use network address translation (NAT).
Figure 2.09 shows a schematic diagram of how this can be used. The NAT box has one
IP address which is visible over the Internet so can be used as a sending address or as a
receiving address. Internally the IP addresses have to be chosen from one of the three
ranges
of IP addresses shown in Table 2.03 that have been allocated for such networks. (You do not
need to remember these numbers!)
Lower bound
Upper bound
10.0.0.0
10.255.255.255
172.16.0.0
172.31.255.255
192.168.0.0
192.168.255.255
The important point is that each address can be simultaneously used by any number of
different private networks. There is no knowledge of such use on the Internet itself or in
any other private network. The interface in the NAT box has software installed to examine
each incoming or outgoing transmission. There can be a security check before an incoming
transmission is directed to the correct internal address. The diagram shows undefined
arrows from the router connected to the NAT box. These indicate that the network structure
within the organisation could take many different forms.
Discussion Point:
Can you find out which IP addressing scheme is being used when you are connected to the
Internet?
Today there are combinations of IPv4 approaches in use and these allow the Internet
to continue to function. Respected sources argue that this cannot continue beyond the
current decade. There must soon be a migration to IP version 6 (IPv6), which uses a 128-bit
addressing scheme allowing 2 128 different addresses, a huge number! In practice, this will
allow more complex structuring of addresses. Documenting these addresses is not going to
be fun. The addresses are written in a colon hexadecimal notation. The code is broken into
16-bit parts with each of these represented by four hexadecimal characters. Fortunately,
some abbreviations are allowed. A few examples are given in Table 2.04.
IPv6 address
Comment
68E6:7C48:FFFE:FFFF:3D20:1180:695A:FF01
A full address
72E6::CFFE:3D20:1180:295A:FF01
6C48:23:FFFE:FFFF:3D20:1180:95A:FF01
::192.31.20.46
If IPv6 addressing is used, how many addresses would be available per square metre of the
Earth’s surface? Do you think there will be enough to go round?
30
In everyday use of the Internet, a user needs to identify a particular web page or email box.
The user will not wish to have to identify an IP address using its dotted decimal value. To get
round this problem the domain name system (DNS) was invented in 1983. The DNS system
allocates readable domain names for Internet hosts and provides a system for finding the IP
address for an individual domain name.
Domain name system (DNS): a hierarchical distributed database installed on domain name
servers
that is responsible for mapping a domain name to an IP address
As a result the naming system is hierarchical. There are more than 250 top-level domains
which are either generic (e.g. .com, .edu, and .gov) or represent countries (e.g. .uk and .nl).
The domain name is included in a universal resource allocator (URL), which identifies a web
page, or an email address. A domain is named by the path upward from it. For example,
.eng.
cisco.com. refers to the .eng subdomain in the .cisco domain of the .com top-level domain
(which is the reverse of that used for a pathname of a file).
Looking up a domain name to find an IP address is called ‘name resolution’. For such a
query
there are three possible outcomes:
Parti
If the domain is under the jurisdiction of the server to which the query is sent then an
authoritative and correct IP address is returned.
• If the domain is not under the jurisdiction of the server, an IP address can still be returned
if it is stored in a cache of recently requested addresses but it might be out of date.
• If the domain in the query is remote then the query is sent to a root server which can
provide an address for the name server of the appropriate top-level domain which in turn
can provide the address for the name server in the next lower domain. This continues
until the query reaches a name server that can provide an authoritative IP address.
► web-hosting organisation. Finally, the HTML files have to be uploaded to the server
provided
j The following is the simplest sequence of events associated with a user accessing the
application:
2 The user types in the URL of the web application or selects it from the bookmark list.
r 4 The browser connects to the IP address and sends a request for the web page.
Once the page is displayed the user can activate the application by clicking on a suitable
^ feature or by entering data as appropriate.
f HTML
We now need to consider the framework for creating a file using HTML. This is a text file
constructed using pairs of what are referred to as ‘tags’. The basic overall structure can be
l represented as:
31
In between each pair of opening and closing tags there can beany number of lines of text.
These can be used to display on the browser screen any or all of the following: text, images,
videos, forms, hyperlinks, icons and so on.
&
The facilities offered by HTML can be supplemented by the inclusion of scripted code,
written
in JavaScript or PHP.
JavaScript
JavaScript is written by the application developer into the HTML text but its effect is to allow
the user at the client end to interact with the application and to cause processing to take
place on the client computer. For this to work the browser must have JavaScript enabled. In
the early days of the use of JavaScript it was necessary to ensure this and to include explicit
reference to the use of JavaScript in the HTML file. However, JavaScript is now the default
scripting language so a script runs automatically. The important point is that this has nothing
to do with what is installed on the server.
One way to incorporate JavaScript is to write the code in a separate file which is then called
from within the HTML. Here we only consider the case when JavaScript code is contained
within the HTML itself. This is easily done (and easily recognised in an example HTML file)
by
containing the script in script tags:
If the developer wants the script to be accessed immediately when the web page is
displayed
the script tags are included in the HTML header section.
clDOCTYPE html>
You can input a value in Celsius and this will be converted to Fahrenheit.
The question now is when would a developer want to use JavaScript? The answer to this is
‘whenever the developer wants the user to have processing carried out on the client
computer
which does not involve the software running on the server’. This might involve running a
program as illustrated by the above simple example. More often the JavaScript is used for
collecting data which is to be used by a program running on the server. In particular, data
validation and verification can be handled using JavaScript (see Chapter 8, Section 8.04).
Parti
PHP
PHP is also a full-blown computer programming language. The difference is that any PHP
script is processed on the server. As for JavaScript, the PHP can be contained in a separate
file accessed by the HTML. The example considered here will have the script written inside
the file containing the HTML. In this case the HTML file must be named with a .php
extension
ratherthan the usual .html extension. The PHP code is included within special tags:
The JavaScript program shown in the previous section could be converted to PHP to run on
the server in the following way:
dDOCTYPE html>
This particular example has to be run by supplying the value for $tempc as a parameter to
the URL for the file. This is done when the URL is entered into the address bar of the
browser.
To provide the value 25 the format is to append ?vaiue =25 to the URL following the .php file
extension (e.g. index.php?value=25).
As before this simple example shows how to identify some PHP code within HTML and see
what it is doing. It is worth noting that variables start with $ and they are case sensitive.
The first character has to be in lower case so $ _ get, which is the method for getting the
parameter value, can be recognised as not being a variable.
The main question is, again, why would a developer choose to include PHP script in some
HTML? The answer is that an application will not run quickly if it is constantly transmitting
data back and forward between the client computer and the server. For the particular case
of a database application it is imperative that the database remains on the server (or within
the system of which the server is part) and that only the results of queries are displayed on
a browser screen. Also any SQL associated with the use of the database needs to be
running
on the server not on the client. An example of this will be considered after SQL has been
introduced in Chapter 10 (Section 10.6).
• The main transmission media are copper (twisted pair, coaxial) cables, fibre-optic cables
and wireless
(radio, microwave, infrared).
• Factors to consider are bandwidth, attenuation, interference and the need for repeaters.
Exam-style Questions
1 A new company has been established. It has bought some new premises which consist of
a
number of buildings on a single site. It has decided that all of the computer workstations in
the different buildings need to be networked. They are considering ways in which the
network
might be set up.
a One option they are considering is to use cabling for the network and to install it
themselves.
i Name the three types of cabling that they might consider. [2]
[4]
ii Explain two factors, other than cost, that they need to consider when choosing suitable
cabling.
b Another option they are considering is to use wireless technology for at least part of the
network.
i Identify one advantage, other than cost, of using wireless rather than cable networking.
ii Identify one disadvantage (other than cost) of using wireless rather than cable networking.
[2]
[1]
[1]
c The final option they are considering is to use the services of a PSTN.
[1]
[3]
i Name the type of software used by the system and the type of hardware on which the
software is installed. [2]
Parti
ii Name two types of application that use the Domain Name System and for each give a brief
description of
how it is used.
[4]
In the classful IPv4 addressing scheme, the 32-bit binary code for the address has the top
(most significant)
bit set to 0 if it is of class A, the top two bits set to 10 if class B or the top three bits set to 110
if class C. In a
document an IPv4 address has been written as 205.124.16.152.
Give the name for this notation for an IP address and explain how it relates to the 32-bit
binary code.
[2]
[2]
[3]
If the CIDR scheme for an IPv4 address is used the IP address 205.124.16.152 would be
written as:
205.124.16.152/24
State the binary code for the hostID in this address with a reason.
[2]
A client-server web application has been developed which uses a file containing the
following code:
clDOCTYPE html>
We can give you an estimate of how many you will need if you are tiling a floor
with our tiles.
You need to tell us the length and the width of the room
(in metres) .
i Name the role of the person who would create this file. [1]
iii A browser is needed to run the application. State where the browser software is installed.
[1]
i Identify two component parts of the file which involve JavaScript and explain their purpose.
[4]
ii Explain the sequence of events executed by the client computer and the web server when
this
Chapter 3
Hardware
Learning objectives
As a broad generalisation it can be said that there are two main uses of a computer system.
The first is to run programs.
In the discussion of computer system architecture in Chapter 5 (Section 5.01) you will see
that the simplest model consists of a processor with access to a stored program. The history
of computing is one of increasing performance. In the context of increasing performance of
the system in running programs, the first requirement is for the speed of the processor to
increase. However, this potential for improvement can only be realised if the time taken for
the processor to access the stored program decreases to match the increased processor
speed. The reality so far has been that access speeds have improved but they haven’t kept
pace fully with the improvement in processor speeds.
The second main use of a computer system is to store data. Here the major issues with
regards to increasing performance are capacity and cost; access speeds are not so
important.
The terminology used to describe components for storing programs and data is not always
consistent. One variation is to distinguish between memory as the component which the
processor can access directly and the (file-) store used for long-term storage. An alternative
is
to distinguish between the primary and the secondary storage.
The memory system hierarchy is a useful concept for considering the choice of components
in a memory system. Figure 3.01 uses a simplified version of a memory system hierarchy to
show the trends in the important factors affecting this choice. The factors increase in the
direction of the arrow.
Component
Category
Register
Processor component
Cache memory
Primary storage
Main memory
Hard disk
Secondary storage
Auxiliary storage
Access time
Capacity Size
Figure 3.01 Trends in the factors affecting the choice of memory components
Cost
A
The individual entries in the Component column are discussed in Sections 3.02 and 3.03.
Computer users would really like to have a large amount of primary storage that costs little
and allows quick access. This is not possible; the fastest components cost more and have
limited capacity. In practice, the choice made is a compromise.
It could be argued that there is a need for secondary storage because the use of only
primary
storage would be far too expensive. However, it is more sensible simply to recognise that
long-term storage of data requires separate dedicated components.
37
The processor has direct access to three types of storage component. The registers, as
discussed in Chapters (Section 5.02), are contained within the processor. External to the
processor there is cache memory and main memory, which together constitute the primary
storage. Cache memory is used to store data that at any time is the most likely to be needed
again by the processor.
There is another way of categorising memory components. The first category is called
random-access memory (RAM). This is a potentially misleading term because a
programmer does not expect a program to make random decisions about which memory
location should be accessed.
KEY TERMS
Random-access memory (RAM): volatile memory that can be read from or written to any
number of
times
Read-only memory (ROM): non-volatile memory that cannot be written to but can be read
from any
number oftimes
38
The name has been chosen because such memory can be accessed at any location
independently of which previous location was used (it might have been better called ‘direct-
access memory’). A better description is read-write memory because RAM can be
repeatedly
read from or written to. Another distinguishing characteristic of RAM is that it is volatile which
means that when the computer system is switched off the contents ofthe memory are lost.
There are two general types of RAM technology. Dynamic RAM (DRAM) is constructed
from capacitors which leak electricity and therefore need regularly recharging (every few
milliseconds) to maintain the identity ofthe data stored. Static RAM (SRAM) is constructed
from flip-flops (discussed in Chapter 18 (Section 18.02)) which continue to store data
indefinitely while the computer system is switched on.
SRAM provides shorter access time but unfortunately it compares unfavourably with DRAM
in
all other aspects. DRAM is less expensive to make, it can store more bits per chip and
despite
the need for recharging it requires less power to operate. So, once more, a compromise is
needed. The norm is for cache memory to be provided by SRAM with the main memory
being constructed from DRAM technology.
The second category of memory component is called read-only memory (ROM). Again
this name does not give a full picture ofthe characteristics of this type of component. ROM
shares the random-access or direct-access properties of RAM except that it cannot be
written
to. The other important characteristic is that the data in ROM is not lost when the computer
system is switched off; the memory is non-volatile.
ROM has specialised uses involving the storage of data or programs that are going to be
used unchanged over and over again. ROM may be programmable (PROM) or erasable
PROM
(EPROM) or even electrically erasable PROM (EEPROM). These terms relate to the
manufacture
and installation ofthe ROM and do not impact on its basic use in a computer system.
Discussion Point:
Can you find out what memory components are in the computer system you are using and
any details about them such as the type and storage capacity?
Before discussing storage devices it is appropriate to discuss some terminology that can
confuse. For any hardware device, whether an integral part of the computer system ora
connected peripheral, its operation requires appropriate software to be installed. This
software is referred to as the ‘device driver’. This should not be confused with the term
‘drive’
associated specifically with a storage device. Furthermore, the term ‘drive’ was initially
introduced to refer to the hardware that housed a storage medium item and provided the
physical mechanism for transferring data to it or reading data from it. However, as so often
happens, such distinctions are often ignored. As a result, for example, references to a ‘hard
disk’, a ‘hard disk drive’ and to a ‘hard drive’ have the same meaning.
Magnetic media
Magnetic media have been the mainstay of filestore technology for a very long time. The
invention of magnetic tape for sound recording pre-dates the invention of the computer by
many years so, not unexpectedly, this technology was the first to be utilised as a storage
device. In contrast the hard disk was invented as a technology specifically for computer
storage, arriving a few years later than the first use of magnetic tape.
For either type of magnetic media the interaction with it is controlled by a read head and
a write head. A read head uses the basic law of physics that a state of magnetisation will
affect an electrical property; a write head uses the reverse law. Although they are separate
devices the two heads are combined in a read-write head. The two alternative states of
magnetisation are interpreted as a 1 or 0.
A schematic diagram of a hard disk is shown in Figure 3.02. Points to note about the
physical
construction are that there is more than one platter (disk) and that each platter has a read-
write head for each side. The platters spin in unison. The read-write heads are attached to
actuator arms which allow the heads to move over the surfaces of the platters. The motion
of each actuator head is synchronised with the motion of the other heads. A cushion of air
ensures that a head does not touch a platter surface.
39
The logical construction is that data is stored in concentric tracks. Each track consists of a
sequence of bits but these are formatted into sectors where each sector contains a defined
number of bytes. The sector becomes the smallest unit of storage. To store a file, a sufficient
number of sectors have to be allocated but these may or may not be adjacent to each other.
As files are created and subsequently deleted or edited the use of the sectors becomes
increasingly fragmented which degrades the performance of the disk. A defragmentation
program can reorganise the allocation of sectors to files to restore performance. This is
discussed in Chapter 7 (Section 7.03).
A hard drive is considered to be a direct-access read-write device because any sector can
be
chosen for reading or writing. However, the data in a sector has to be read sequentially.
The above account only gives a simplified version of hard drive technology. One particular
omission is consideration of how manufacturers can effectively deal with the fact that the
physical length of a track increases from the innermost track to the outermost track. If this
fact is ignored the data storage capacity must be less than it potentially could be. The other
omission is the simple fact that the storage capacity of disk drives has continued to improve
and sizes have continued to shrink. There is every reason to believe that this performance
improvement is due to continue for some time.
There has always been a need for a storage device that can be removed from the computer
system. For large installations an organisation’s requirement is normally driven by security
concerns and the need for suitable back-up procedures. For individuals the need may be the
storage of personal data or personally owned programs or simple transfer of data between
computers or between a computer and, for example, a camera. The first technology to
dominate the use by individuals was the floppy disk but this was superseded by optical
storage.
Optical media
As with the magnetic tape medium, optical storage was developed from existing technology
not associated with computing systems. The compact disc (CD) evolved into CD digital
audio
(CD-DA) and this became the technology used in the CD-ROM. This was extensively used
for
distributing software but was of no value as a replacement for the floppy disk. The read-write
version (CD-RW) which came later provided the needed write functionality. However, the
CD has now given way to the DVD (originally ‘digital video disc’ but later renamed as ‘digital
versatile disc’). The latest and most powerful technology is the Blu-ray disc (BD).
A schematic diagram of an optical disc drive is shown in Figure 3.03. The disc spins and the
laser
beam is reflected from a surface which is sandwiched between a substrate and a protective
outer coating. For a CD-ROM, the reflective surface is
manufactured with indentations, called ‘pits’, separated by
what are referred to as ‘lands’. When the disc is being read,
the travel of the laser beam to a pit causes a difference
in phase compared to reflection from a land. This phase
difference is recognised by the photodiode detector and
attached circuitry and interpreted as a 1 or 0. For CD-RW
and DVD-RW technologies, the reflective surface is a special
alloy material. When data is being written to the disc (the
‘burn’ process) the heat generated by the absorption of the
laser light changes the material to liquid form. Depending on
the intensity of the laser light the material reverts to either a
crystalline or an amorphous solid form when it cools. When
the disc is read, the laser light is reflected from the crystalline
solid but not from the amorphous solid allowing the coding
of a 1 orO.
Parti
Chapter 3: Hardware
While the disc is spinning the optical head that directs the laser beam is made to move so
that the point of contact of the laser beam with the disc follows a single spiral path from the
centre of the disc to the periphery. Despite there only being this one path the formatting of
the data into sectors allows the disc to be used as a direct-access device just as is the case
for
a magnetic hard disk.
Another similarity with magnetic disk technology is that the storage capacity is dependent on
how
close together individual physical representations of a binary digit can get. There are two
aspects
governing this for an optical disc. The first is that if the disc is spinning at constant
revolutions per
second the outer part of the disc travels faster than the inner part. Early technology
counteracted
this by spinning at a constantly changing speed keeping the bit density constant along the
spiral
path. The second is that the wavelength of the light controls how well the light can be
focused; the
shorter the wavelength the better the focus. The original infrared diode laser used in a CD-
ROM
has much longer wavelength than the red laser light used in a DVD. The more recently used
blue
laser light has an even shorter wavelength. This change in wavelength is one of the reasons
forthe
improvements in the storage capacity of the modern technology.
Solid-state media
41
memory cells connected in series. The special feature is that blocks of memory cells can
have their contents erased all at once ‘in a flash’. Furthermore, before data can be written to
a block of cells in the memory the data in the block first has to be erased. When data is read,
a whole block of data has to be read in one operation.
The technology can be used for ‘solid-state’ drives, which can replace hard disk drives. The
more frequent use is either in a memory card or in a USB flash drive. In the latter case the
flash memory is incorporated in a device with the memory chip connected to a standard USB
connector. This is currently the technology of choice for removable data storage but how
long this will remain so is very uncertain with alternative technologies such as phase-change
random access memory (PRAM) already under development.
Carry out some research into the technologies currently available for storage.
Consider first the options available for the storage device inside a laptop computer. Create a
table showing cost, storage capacity and access speed for typical examples. Then consider
the options available for peripheral storage devices. Create a similar table for these.
Can you identify which technologies remain viable and which ones are becoming
uncompetitive? Are there any new technologies likely to come into common use?
tones but a printer at any position on a page could only print black or nothing. The solution
to this was halftoning. This technique approximated a grey tone by printing an array of black
dots; varying the size of the dots changed the tone displayed. The technique, of course,
relies
on the limitations of the human eye which does not register the individual dots if they are
sufficiently small.
It is now standard practice for grey-scale images or colour images to be presented using a
halftoning technology. This requires a raster image processor, which can be a combination
of hardware and software, to control the conversion of data stored in a graphics file to the
physical screen display or printed page.
3.05 Screens and associated technologies
Screen technology associated with computer systems has a long evolutionary history. For
many years the only example was the visual display unit (VDU) which was used as a
computer
monitor or terminal. The VDU employed the cathode ray tube (CRT) technology used in a
television set but the functionality offered by the device was limited to recording keyboard
input and displaying text output.
42
Computer mouse
A significant step forward came with the introduction of graphical user interfaces (GUIs) as
standard features for microcomputer systems in the 1980s. The screen technology remained
the same but the functionality was completely transformed by the arrival of screen windows
and icons. To use the GUI effectively, the user needed a pointing device. The computer
mouse was introduced for this purpose. The screen became not just an output device but
also an input device activated by a mouse click.
More recently the tracker ball mouse was phased out and the optical mouse was introduced.
This
technology dispenses with the mechanical aspects associated with the movement of a
rubber
There are two aspects to computer mouse technology. The first is the behaviour instigated
by a button click which needs no further discussion; the second is the operation of the
mouse in controlling a screen cursor. The important point to emphasise here is that a mouse
has no knowledge of an absolute position; all it can do is
allow a relative movement to be recorded so that it can
influence the screen cursor position.
ball. The mouse shines a light beam from a light emitting diode down onto the surface the
mouse
is resting on. This light is reflected back on to a sensor fitted to the underside of the mouse.
As the
mouse is moved along the surface the sensor acts like a camera taking successive images
of the
surface. Image processing software then interprets these images to establish the movement
that
has taken place and this data is transmitted to the computer as before.
Screen display
We can now consider the technology associated with the creation of a screen display.
Chapter 1
(Section 1.04) described how an image could be stored as a bitmap built up from pixels.
Screen
displays are also based on the pixel concept but with one major difference: a screen pixel
consists of three sub-pixels typically one each for red, green and blue. Varying the level of
light
emitted from the individual sub-pixels allows a full range of colours to be displayed.
There have been a number of very different technologies used to create a pixel. In the
original
cathode ray tube (CRT) technology, there is no individual component for a pixel. The inner
surface of the screen is covered with phosphor, which is a material that emits light when
irradiated. An individual pixel is created by controlling the direction of the electron beam
irradiating the phosphor. This is modified for colour displays where individual red, green and
blue phosphors are arranged so as to create an array of pixels.
Phosphors are also used in one of the major flat-screen technologies, the plasma screen.
There is now a construction based on individual cells constituting a matrix of pixels. Each
cell contains plasma and a phosphor. When an electrical charge is applied to the plasma
it releases radiation that hits the phosphor and causes light emission. Each pixel or, more
accurately, each sub-pixel is a light source. The sub-pixel emits one of red, green or blue
light.
KEY TERMS
Liquid-crystal display (LCD): a screen back-lit by light-emitting diodes and with liquid crystal
cells
sandwiched between polarisers
In the flat-screen technology that is most used at present, the pixel is not a light source.
The liquid-crystal display (LCD) screen has individual cells containing a liquid crystal to
create the pixel matrix but these do not
emit light. The pixel matrix is illuminated
by back-lighting and each pixel can affect
the transmission of this light to cause the
on-screen display. A typical arrangement is
shown in Figure 3.05.
Polarizer
Colour Filter
Colour Filter glass
TFT Glass
Polarizer
//ii\ ii i li \ \ \
Backlight
technologies but the principle of their functioning is the same and colour displays use red,
green
and blue combinations as before.
More recently, a different technology has been introduced. This is based on the use of an
organic light-emitting diode (OLED) to create the pixel. The OLED is used directly as a light
source so this technology requires no back-lighting.
Touch screens
As well as providing improved display capability, flat-screen technology has allowed a new
mechanism for interaction with the display. Touch-screen technology is now a major feature
of a whole range of computer-based products.
Consider the different possibilities for interacting with a screen display. Create a table
showing the advantages and disadvantages for each technique.
The modern version of a touch-sensitive screen has the layers of technology providing the
display with extra layers of technology added immediately beneath the surface of the screen.
There have been two approaches used. The first is the resistive touch screen. This type
has two layers separated by a thin space beneath the screen surface. The screen is not rigid
so when a finger presses on to the screen the pressure moves the topmost of these two
separated layers so that it makes contact with the lower layer. The point of contact creates a
voltage divider in the horizontal and vertical directions. These allow the position of the point
of contact to be transmitted to the processor.
The second technology is the capacitive touch screen. This does not require a soft
screen but instead makes use of the fact that a finger touching a glass screen can cause
a capacitance change in a circuit component immediately below the screen. The most
effective technology is projective capacitive touch (PCT) with mutual capacitance. This has
a circuit beneath the screen which contains an array of capacitors. This enables multi-touch
technology, which allows more functionality than just pointing at one location on a screen.
KEY TERMS
Resistive touch screen: a flexible surface which causes contact between electrically resistive
layers
beneath when touched
Capacitive touch screen: a rigid surface above a conductive layer which undergoes a
change in
electrical state when a finger touches the screen
Discussion Point:
Investigate which flat-screen technologies are used in any computer, laptop, tablet or mobile/
cell phone that you use. Discuss the benefits and drawbacks associated with their use.
The standard method of inputting significant amounts of text data into a computer system
has always been to use a QWERTY keyboard (named after the top left row of alphabetic
characters). The central part of the keyboard layout matches that of a standard typewriter,
allowing skilled typists to continue to function effectively. When numbers only need to be
input a skilled operator will use a numeric keypad. What might be described as a traditional
mobile phone has a different type of keypad which can be used to input text data. The
technology underpinning all of these devices is the same assuming that there are actual
physical keys to be used.
When the keyboard is being used to input text it appears as though a key press immediately
transfers the appropriate character to the computer screen but this is an illusion. The key
press has to be converted to a character code which is transmitted to the processor. The
processor, under the control of the operating system, ensures that the text character is
displayed on the screen. The same process takes place if the keyboard is used to initiate
some action, perhaps by using a shortcut key combination, except that the processor has to
respond by taking the requested action.
To achieve this functionality the keyboard has electrical circuitry together with its own
microprocessor and a ROM chip. The keys are positioned above a key matrix which consists
of a set of rows of wires and another set of columns of wires. Pressing a key causes contact
at a specific intersection. The microprocessor continuously tests to see if any electrical
circuit involving a row wire and a column wire has become closed. When the microprocessor
recognises that a circuit has become closed, it can identify the particular intersection that
is causing this. It then uses data stored in the ROM to create the appropriate character code
relating to the key associated with that intersection and sends this code to the processor.
Inkjet printer
Two technologies have come to dominate the printing of documents from data stored in a
computer system. The technologies can be used irrespective of whether text or an image is
being printed. The technology that is cheapest to buy is the inkjet printer but the purchase
price is soon dwarfed by the cost of replacement ink. A genuine advantage of an inkjet
printer
is its relatively small size.
The working principle of an inkjet printer is very simply explained: a sheet of paper is fed in;
the printhead moves across the sheet depositing ink on to the paper; the paper is moved
forward a fraction and the printhead carries out another traversal and so on until the sheet
has been fully printed. The precision of the mechanical operations involved is one of the
factors governing the quality of the printing. The other factor is the accuracy of the process
of applying the ink to the paper. The printhead consists of nozzles that spray droplets on to
the paper. The number of nozzles in a printhead is truly amazing, running into the
thousands.
This is only possible because the manufacturing process can produce an individual nozzle
with a diameter considerably less than that of a human hair. There are two alternative
technologies for causing the ejection of the ink droplet (thermal bubble or piezoelectric) but
neither has significant advantages or disadvantages.
Ink is supplied to the printhead from one or more ink cartridges. Often the printhead is part
of the cartridge. For black and white printing only one cartridge is required but for colour
printing more are needed. The simplest technology for colour printing uses three colour
cartridges (one for each of the subtractive primaries: cyan, magenta and yellow) in addition
to the black cartridge. Suitable positioning of combinations of overlapping droplets in
principle allows any colour to be created. Good quality printing requires a printing resolution
of several hundred dots per inch which is achievable because of the large number and small
size of the nozzles. The number of dots per inch is defined by the printhead geometry and
cannot be changed but the number of dots per pixel can be dictated by the controlling
software. Increasing the number gives better colour definition for the pixel but the pixel size
is
45
increased giving poorer resolution forthe image. Better resolution can only be achieved with
poorer colour definition.
Laser printer
The alternative technology is the laser printer. Laser printers have always been more
expensive to buy and used to offer much higher-quality printing but the comparison is no
longer so clear cut.
A schematic diagram of the workings of a laser printer is shown in Figure 3.06. The
operation
can be summarised as follows:
9 The drum is discharged before the process starts again for the next page.
The above sequence represents black and white printing. For colour printing, separate
toners
are required for the colours and the process has to take place for each colour. Although the
technology is completely different the logical aspect of the printing is the same as that for
inkjet printing. Colours are created from cyan, magenta, yellow and black. The technology
produces dots; quality depends on the number of dots per inch and software can control the
number of dots per pixel.
It will have the capability to act as a flatbed scanner with the option for this also to provide
a photocopying facility. Effectively, a scanner reverses the printing process in that it takes
an image and creates from it a digital representation rather than the digital representation
being used to create an image on paper. The principles of the operation of a typical scanner
are straightforward. The sheet of paper is held in a fixed position and a light source covering
the width of the paper moves from one end of the sheet to the other. The reflected light is
directed by a system of mirrors and lenses on to a charge-coupled device (CCD). The finer
details of how a CCD works are not important but the three aspects to note are:
• It produces for each cell an electrical response proportional to the light intensity.
Chapter 3: Hardware
In Chapter 1 (Section 1.04) the difference between a bitmap and a vector graphic was
discussed. If a vector graphic file has been created the image can be displayed on a screen
or printed by first converting the file to a bitmap version. However, specialised technical
applications often require a more accurate representation to be created on paper. This
requires the use of a graphics plotter. A plotter uses pens to write, usually, on a large sheet
of paper constrained by sprockets along one pair of sides. The sprockets can move the
paper forwards or backwards and pens can either be parked or in use at any given time. The
controlling circuitry and software can create the drawing directly from the original vector
graphic file.
Engineers and designers working in manufacturing are potential users of graph plotters.
They
are also potential users of the 3D printer. The name could be said to be a little misleading
but
its meaning is generally understood. It is a device that offers an alternative technology for
computer-aided manufacture (CAM).
The original concept was that the starting point is a 3D design created in a suitable
computer-
aided design (CAD) package. The design is split into layers. The data for the first layer is
transmitted to the 3D printer. Rather than using ink to draw the
layer, the 3D printer uses a nozzle to squirt material on to the
printer bed to create a physical layer to match the design. This
process is repeated for successive layers. When the whole object
has been formed it has to be cured in some way to ensure that
the layers are, in effect, welded together and the material has
been converted to the form required for the finished product.
IP telephony and video conferencing are the two obvious technologies requiring voice input
to a computer system and voice output from a computer system. Voice recognition is an
alternative technique for data input to a computer.
For input, a microphone is needed. This is a device that has a diaphragm, a flexible material
which is caused to vibrate by an incoming sound. If the diaphragm is connected to suitable
circuitry the vibration can cause a change in an electrical signal. A condenser microphone
uses capacitance change as the mechanism; an alternative is to use a piezoelectric crystal.
The electrical signal has to be converted to a digital signal by an analogue-to-digital
converter before it can be processed by a sound (audio) card inside the computer.
For output, a loudspeaker or speaker is needed. This is involved in what is effectively the
reverse process to that for input. The computer sound card produces a digital signal which
is converted to analogue by a digital-to-analogue converter. The analogue signal is fed to
the speaker. In the traditional technology the current flows through a coil suspended within
Cambridge International AS and A level Computer Science
the magnetic field provided by a permanent magnet in the speaker. As the direction of the
current keeps reversing, the coil moves backwards and forwards. This movement controls
the movement of a diaphragm which causes sound to be created.
Summary
• Primary storage is main memory, consisting of RAM (DRAM or SRAM) and ROM.
• Output devices include screens (CRT, plasma, LCD, OLED), printers (inkjet, laser and 3D),
plotters and speakers.
• Touch screens (capacitive or resistive) are used for both input and output.
Exam-style Questions
iii RAM may be either DRAM or SRAM. Explain the difference between these. [2]
i For each type of storage identify one feature of the basic internal operation which
ii For two of the three types of storage identify two similarities in the basic internal operation.
[2]
i Identify four aspects of the basic internal operation of a keyboard that makes this happen.
[4]
ii Describe an alternative method for a user to enter some text into a computer system. [2]
i Describe two differences between how an inkjet printer works and how a laser printer
works. [4]
ii Identify two similarities in the logical approach used in these two types of printer. [2]
Learning objectives
W At
AKHm
jaS
AH
^ ■ construct the truth table for each of the logic gates above
^ • a problem statement
• a logic expression
• a logic circuit
• a logic expression
In everyday language the answer will be either yes or no. (‘Yes’, in fact.) However, the
question
could be rephrased to make use of the language of Boolean logic:
can be described as an example of a logic assertion or a logic proposition that can have
only one of the two alternative Boolean logic values TRUE or FALSE.
KEY TERMS
50
• You should take an umbrella if it is raining or if the weather forecast is for rain later
• The air-conditioning system is set to come on in an office only during working hours but
also only if the temperature rises to above 25°C.
Each of these statements contains two logic propositions which are highlighted. In each
statement these logic propositions are combined in some way. Finally, each statement has
the addition of an outcome which is dependent on the combination of the two propositions.
Each of these is, therefore, an individual example of a problem statement.
KEY TERMS
The problem statements identified above can be more formally expressed in a form that is
suitable for handling with Boolean logic. To do this it is necessary to use Boolean operators.
The three basic Boolean operators are AND, OR and NOT.
Here, both A and B represent any logic proposition or assertion that has a value TRUE or
FALSE.
Parti
mm
Each original problem statement has now been rephrased as a form of logic expression
with a defined outcome. The format of each expression here does not follow any formally
defined convention but the structure does allow the underlying logic to be understood. In
general, a logic expression consists of logic propositions combined using Boolean operators
and the expression optionally may be stated with a defined output.
KEY TERMS
Logic expression: logic propositions combined using Boolean operators, which may be
written with a
defined outcome
TASK 4.01
A document can only be copied if it is not covered by copyright or if there is copyright and
' permission has been obtained.
Any logic expression can be constructed using only the Boolean operators AND, OR and
NOT
but it is often convenient to use other operators. Here are the definitions for the six operators
with which you need to be familiar:
• NOTAisTRUEifAisFALSE
• AORBisTRUEifAisTRUEorBisTRUE
The truth table is a simple but powerful technique for representing any logic expression or
for
describing the possible outputs from a logic circuit.
A truth table is presented by making use of the convention that TRUE can be
represented as 1 and FALSE can be represented as 0. The simplest use of a truth table is
to represent the logic associated with a Boolean operator.
As an example let us consider the AND operator. The labelling of the truth table follows
the convention that the initially defined values are represented by A and B and the value
obtained from the simple expression using the AND operator is represented byX. In other
words we write the truth table for X = A AN D B. Remembering that AN D only returns true if
both A and B are true we expect a truth table with only one instance of X having the value 1.
The truth table has four rows corresponding to the four combinations of the truth values
for A and B. Three of these lead to a 0 in the X column as expected.
1
Table 4.01 The truth table for
the AND operator
51
TASK 4.02
Without looking further on in the chapter, construct the truth table for the OR operator.
The digital circuits that constitute the inner workings of a computer system all operate on the
basis that at any one time an individual part of the circuit is either in an ‘on 7 state, which
can
be represented by a 1, or in an ‘off’ state, represented by a 0. The physical circuitry consists
of
integrated circuits constructed from transistors. There can be billions of transistors in a single
integrated circuit.
We will view a logic circuit as comprising component parts called logic gates. Each different
logic gate has an operation that matches a Boolean operator.
KEY TERMS
Logic gate: a component of a logic circuit that has an operation matching that of a Boolean
operator
Discussion Point:
There will be no further discussion of integrated circuits in this book but you might wish
to do some research and have a look at the structure of a small-scale integration chip.
When drawing a circuit, standard symbols are used for the logic gates. As an example,
the symbol shown in Figure 4.01 represents an AND gate.
The first point to note here is that the shape defines the type of gate. The second point
is that the inputs are on the left-hand side and the output is on the right-hand side. In
general, the number of inputs is not limited to two but the discussion in this book will only
consider circuits where the number of inputs does not exceed two.
Figure 4.02 shows the logic gate symbols and the associated truth tables for each of the six
Boolean operators introduced in Section 4.02.
NOT
AND
OR
NAND
Parti
NOR
XOR
1
0
Figure 4.02 Logic gate symbols and their associated truth tables
There are two other points to note here. The NOT gate is a special case having only one
input. The NAND and NOR gates are each a combination of a gate and the NOT gate so
they
produce complementary output to that produced by the AND and OR gates.
TASK 4.03
Draw a circuit where A and B are input to an AND gate from which the output is carried to a
) NOT gate from which there is an output X. Show that this has the same outcome as having
one
NAND gate.
r Could the same outcome be produced by positioning a NOT gate before the AND gate?
You need to remember the symbol for each of these gates. A good start here is to remember
► that AN D has the proper D symbol and OR has the curvy one. You also need to
remember the
r definitions for the gates so that you can construct the corresponding truth table for each
gate.
Question 4.01
I Can you recall from memory the symbols and definitions of the six logic gates introduced in
this chapter?
You need to be able to construct a logic circuit from either a problem statement or from
a logic expression. If you are given a problem statement the best approach is to first
convert it to a logic expression and then to identify the individual Boolean operations in
the logic expression. This approach will be illustrated here.
Consider the following problem statement: A bank offers a special lending rate to
customers subject to certain conditions. To qualify, a customer must satisfy certain criteria:
• The customer has been with the bank for two years.
53
To convert this statement to a logic expression you need to represent each condition by
a symbol (in the same way that a problem might be tackled in normal algebra):
• Let A represent an account held for two years.
Note the use of brackets to ensure that the meaning is clear. You may think that not all of
the brackets are needed. In this example, an extra pair has been included to guide the
construction of the circuit where only two inputs are allowed for any of the gates.
It can be seen, therefore, that the logic circuit corresponding to this logic expression
derived from the original problem statement could be constructed using four AND gates
and two OR gates as shown in Figure 4.03.
You also need to be able to construct a truth table from either a logic expression or a
logic circuit. We might have continued with the problem in Worked Example 4.01 but
four inputs will lead to 16 rows in the truth table. Instead, we consider a slightly simpler
problem with only three inputs and therefore only eight rows in the truth table. We will
start with the circuit shown in Figure 4.04.
Figure 4.04 A circuit with three inputs for conversion to a truth table
Parti
Table 4.02 shows how the truth table needs to be set up initially. There are several points
to note here. The first is that you must take care to include all of the eight different possible
combinations of the input values. Therefore, you present the values in increasing binary
number value from 000 to 111. The second point is that for such a circuit it is not sensible to
try to work out the outputs directly from the input values. Instead a systematic approach
should be used. This involves identifying intermediate points in the circuit and recording
the values at each of them in the columns headed ‘Workspace’ in Table 4.02.
Inputs
Workspace
Output
1
1
Figure 4.05 shows the same circuit but with four intermediate points labelled M, N, P and
Q identified. Each one has been inserted on the output side of a logic gate.
Figure 4.05 The circuit in Figure 4.04 with intermediate points identified
Now you need to work systematically through the intermediate points. You start by
filling in the columns for M and N. Then you fill in the columns for P and Q which feed
into the final AND gate. The final truth table is shown as Table 4.03. The circuit has two
combinations of inputs that lead to a TRUE output from the circuit.
The columns containing the intermediate values (the workspace) could be deleted at this
stage.
Inputs
Workspace
Output
0
0
0
0
0
0
Table 4.03 The truth table for the circuit shown in Figure 4.05
One final point to make here is that you may be able to check part of your final solution
by looking at just part of the circuit. For this example, if you look at the circuit you
will see that the path from input C to the output passes through two AND gates. It
follows, therefore, that for all combinations with C having value 0 the output must be 0.
Therefore, in order to check your final solution you only need to examine the other four
combinations of input values where C has value 1.
56
TASK 4.04
An oven has a number of components which should all be working properly. For each
component there is a signalling mechanism that informs a management system if all is well
or if there is a problem when the oven is being used. Table 4.04 summarises the signal
values
that record the status for each component.
Signal
Value
Component condition
0
Fan not working
If the thermometer reading is in range but either or both the fan and light are not working, the
management system has to output a signal to activate a warning light on the control panel.
Draw a logic circuit for this fault condition.
For any given logic problem there will be different circuits that deliver the same output values
from a given set of inputs. In some cases it will be possible to simplify an initial circuit design
by reducing the number of logic gates. As a trivial example you may have noticed that the
circuit in Figure 4.04 includes an AND gate immediately followed by a NOT gate. These
could
have been combined as a NAND gate. For more complex examples, there are techniques
available which will be discussed in Chapter 18 (Sections 18.03 and 18.04).
Create the truth table for the circuit shown in Figure 4.06 and show that it is the same as that
for an OR gate.
B —L
• The outcome of a logic expression or a logic circuit can be expressed as a truth table.
Exam-style Questions
1 a The following are the symbols for three different logic gates.
33 “ 33 " -D>°-
i Construct the truth table for the circuit using the following template:
Inputs
Workspace
Output
X
0
[3]
[2]
57
ii There is an element of redundancy in this diagram. Explain what the problem is.
[8]
[2]
58
In a competition, two teams play two matches against each other. One of the teams is
declared the winner
if one of the following results occurs:
• The team wins one match and loses the other but has the highest total score.
ii By assigning the symbols A, B and C to these three propositions express the outcome of
the competition
3 A domestic heating system has a hot water tank and a number of radiators. There is a
computerised management
system which receives signals dependent on whether or not the conditions for components
are as they should be.
Signal
Value
Component condition
A
0
a Consider the following fault condition. The water level in the hot water tank is too low and
the temperature
in the hot water tank is too high. The management system must output a signal to switch off
the system.
i Construct a truth table for this fault condition including the A, B and C signals. . [4]
ii Construct the circuit diagram for this fault condition to match this truth table. [5]
b Consider the fault condition where the hot water tank temperature is within limits but the
water flow in the
radiators is too low and the water level in the hot water tank is too low. Construct the circuit
diagram for this fault
condition which requires the management system to output a signal to increase water
pressure. [5]
Learning objectives
The simplest form of what might be described as a computer system model or computer
system architecture is usually attributed to John von Neumann. This recognises the fact that
he was the first to describe the basic principles in a publication.
• The memory contains a ‘stored program’ (which can be replaced by another at any time)
and the data required by the program.
implies, the ALU is responsible for any arithmetic Figure 5.01 A schematic diagram of the
architecture of a simple CPU
or logic processing that might be needed when a
program is running. The functions of the control
unit are more diverse. One aspect is controlling the flow of data throughout the processor
and, indeed, throughout the whole computer system. Another is ensuring that program
instructions are handled correctly. A vital part of the control unit is a clock which is used
by the unit to synchronise processes. Strictly speaking there are two clocks. The first is an
internal clock which controls the cycles of activity within the processor. The other is the
system clock which controls activities outside the processor. The CPU will have a defined
frequency for its clock cycle, which is usually referred to as the clock speed. The frequency
defines the minimum period of time that separates successive activities within the system.
In an advertisement for a laptop computer, the system is described as 4GB, 1TB, 1.7 GHz.
2 Have the values quoted been presented correctly? To answer this you need to refer back
to the discussion in Chapter 1 (Section 1.04) about terminology.
3 Calculate the minimum time period that could separate successive activities on this
system.
Registers
The other components of the CPU are the registers. These are storage components which,
r because of their proximity to the ALU, allow very short access times. Each register has
limited
storage capacity, typically 16,32 or 64 bits. A register is either general purpose or special
f store a single value at anyone time. A value is stored in the Accumulator that is to be used
. by the ALU for the execution of an instruction. The ALU can then store a different value in
the
KEY TERMS
Accumulator: a general-purpose register that stores a value before and after the execution of
an
instruction by the ALU
Figure 5.01 shows some of the special-purpose registers as individual components. The box
labelled ‘Other registers’ can be considered to comprise the Accumulator plus the special-
purpose registers not identified individually. The full names of the special-purpose registers
included in the simple CPU which we are going to discuss are given in Table 5.01 with a
brief
description of their function.
Register name
Abbreviation
Register’s function
Current instruction
register
CIR
Stores the current instruction while it is being
decoded and executed
Index register
IX
Memory address
register
MAR
MDR(MBR)
Program counter
PC
Status register
SR
Two points are worth making at this point. The first is that the alternative name for the MDR
emphasises that this particular register must act as a buffer because transfers of data within
the processor take place much more quickly than transfers outside the processor. This
61
A further point to note here is that the index register (IX) can be abbreviated as IR but in
some
sources the current instruction register (CIR) is abbreviated as ‘IR’, which is an unnecessary
potential cause of confusion. In this book, the index register is always IX and the current
instruction register is CIR. Finally, there is also possible confusion if the abbreviation PC is
used. This will only be used in this book when register transfer notation is being used as you
will see later in the chapter. Everywhere else, a PC is a computer.
The SR is used when an instruction requires arithmetic or logic processing. Each individual
bit in the SR operates as a flag. The bit is set to 1 if a condition is detected. As an example,
the
use of the following three flags will be illustrated:
1 Consider the addition of two positive values where the sum of the two produces an
answer that is too large to be correctly identified with the limited number of bits used to
represent the values. For, example if an eight-bit binary integer representation is being
used and an attempt is made to add denary 66 to denary 68 the following happens:
0100 0010
0100 0100
1000 0110
Flags: N V C
110
The value produced as an answer is denary -122. Two positive numbers have been
added to get a negative number. This impossibility is detected by the combination of
the negative flag and the overflow flag being set to 1. The processor has identified the
problem and can therefore send out an appropriate message.
2 Consider using the same eight-bit binary integer representation but this time two
negative numbers (-66 and -68 in denary) are added:
10111110
10111100
( 1 ) 01111010
Flags: N V C
Oil
This time we get the answer +122. This impossibility is detected by the combination of
the negative flag not being set and both the overflow and the carry flag being set to 1.
Parti
Carryout a comparable calculation for the addition in binary of-66 to +68. What do you
think the processor should do with the carry bit?
A bus is a parallel transmission component with each separate wire carrying a single bit. It
is important not to describe a bus as a storage device. A bus does not hold data. Instead it
is a mechanism for data to be transferred from one system
component to another.
The system bus allows data flow between the CPU, the
memory, and input or output (I/O) devices as shown in the
schematic diagram in Figure 5.02.
CPU
Memory
Input and
Output
k)
t>
<>
k>
k>
k>
Control bus
Address bus
r|>
Data bus
CO
CO
KEY TERMS
Address bus: a component that carries an address to the memory controller to identify a
location in
memory which is to be read from or written to
The crucial aspect of the address bus is the ‘bus width 5 , which is the number of separate
wires
in the bus. The number of wires defines the number of bits in the address’s binary code. In *
the simple computer system considered here we will assume that the bus width is 16 bits
allowing 65 536 memory locations to be directly addressed. Such a memory size would, of
course, be totally inadequate for a modern computer system. Even doubling the address bus
width to 32 bits would only allow the direct addressing of a little over four billion addresses. If
the memory size is too large special techniques have to be used.
The function of the data bus is to carry data. This might be an instruction, an address ora
value. As can be seen from Figure 5.02, the data bus might be carrying the data from CPU
to
memory or from memory to CPU.
However, another option is to carry data to or from an I/O device. The diagram does not
make clear whether, for instance, data coming from an input device is carried first to the
CPU
or directly to the memory. There is a good reason for this. Some computer systems will only
allow input to the CPU before the data can be stored in memory. Other systems will allow
direct transfer to memory.
Bus width is again an important factor in considering how the data bus is used. Before
discussing this, it is useful to introduce the concept of a word. A word consists of a number
of bytes and for any system the word length is defined. The significance of the word length is
that it defines a grouping that the system will handle as one unit. The word length might be
stated as a number of bytes or as a number of bits. Typical word lengths are 16,32 or 64 bits
that is, 2,4 or 8 bytes respectively. For a given computer system, the bus width is ideally the
same as the word length. If this is not possible the bus width can be half the word length so
that a full word can be transmitted by two consecutive data transfers. For our simple system
we assume a data bus width of 16 bits and a word length of two bytes to match this.
KEYTERMS
Data bus: a component that carries data to and from the processor
Word: a small number of bytes handled as a unit by the computer system
Can you find out the bus widths used in the computer system you are using?
64
The control bus is another bidirectional bus which transmits a signal from the control unit to
any other system component or transmits a signal to the control unit. There is no need for
extended width so the control bus typically has just eight wires. A major use of the control
bus is to carry timing signals. As described in Section 5.02, the system clock in the control
unit defines the clock cycle for the computer system. The control bus carries timing signals
at time intervals dictated by the clock cycle. This ensures that the time that one component
transmits data is synchronised with the time that another component reads it.
The clock speed is the most important factor governing the processing speed of the system.
However, it is not the only factor. The performance will be limited if the bus widths are
insufficient for the whole of a data value to be transferred in one clock cycle. For optimum
performance it is also particularly important that memory access is as efficient as possible.
The schematic diagram in Figure 5.02 slightly misrepresents the situation because it looks
as
if the CPU, the memory and the I/O devices have similar access to the data and control
buses.
The reality is different. Each I/O device is connected to an interface called a port. Each port
is connected to the I/O or device controller. This controller handles the interaction between
the CPU and an I/O device. A port is described as ‘internal’ if the connected I/O device is an
integral part of the computer system. An external port allows the computer user to connect a
peripheral I/O device.
In the early days of the PC, the process of connecting a peripheral was time-consuming and
required technical expertise. The aim of the plug-and-play concept was to remove the need
for technical knowledge so that any computer user could connect a peripheral and start
using it straight away. The plug-and-play concept was only fully realised by the creation of
the
USB (Universal Serial Bus) standard. Nowadays anyone buying a new peripheral device will
expect it to connect to a USB port. There is an alternative technology known as FireWire but
this is not so commonly used in computer systems.
Parti
• The computer is at the root of this hierarchy and can handle 127 attached devices.
• Devices can be attached while the computer is switched on and are automatically
configured for use.
• The standard has evolved, with USB 3.0 being the latest version.
Discussion Point:
Carry out an investigation into storage devices that could be connected as a peripheral to a
PC using the USB port.
Fortwo representative devices find out which specific USB technology is being used and
what the potential data transfer speed is. How do these speeds compare with the speed of
access of a hard drive installed inside the computer?
The full name for this is the fetch, decode and execute cycle. This is illustrated by the
flowchart in Figure 5.03.
Figure 5.03 Flowchart for the fetch, decode and execute cycle
If we assume that a program is already running then the program counter already holds the
address of an instruction. In the fetch stage, the following steps happen:
1 This address in the program counter is transferred within the CPU to the MAR.
• the instruction held in the address pointed to by the MAR is fetched into the MDR
For our simple system the program counter will be incremented by 1. However, it should be
noted that the instruction just loaded might be a jump instruction. In this case, the program
counter contents will have to be updated in accordance with the jump condition. This can
only happen after the instruction has been decoded.
In the decode stage, the instruction stored in the CIR is received as input by the circuitry
within the control unit. Depending on the type of instruction, the control unit will send
signals to the appropriate components so that the execute stage can begin. At this stage, the
ALU will be activated if the instruction requires arithmetic or logic processing.
The description of the execute stage is postponed until Chapter 6, in which a simple
instruction set is introduced and discussed.
66
Operations involving registers can be described by register transfer notation. The simplest
form of this can be illustrated by the following representation of the fetch stage of the fetch-
execute cycle:
mar «- [PC]
CIR 4 - [MDR]
The basic format for an individual data transfer is similar to that for variable assignment. The
first item is the destination of the data. Here the appropriate abbreviation is used to identify
the particular register. To the right of the arrow showing the transmission of data is the
definition of this data. In this definition, the square brackets around a register abbreviation
show that the content of the register is being moved possibly with some arithmetic operation
being applied. When two data operations are placed on the same line separated by a
semi-colon this means that the two transfers take place simultaneously. The double pair of
brackets around MAR on the second line needs careful interpretation. The content of the
MAR
is an address; it is the content of that address which is being transferred to the MDR.
There are many different reasons for an interrupt to be generated. Some examples are:
• a hardware fault
• a need for I/O processing to begin
• user interaction
• a timer signal.
There are a number of different approaches possible for the detailed mechanisms used to
handle interrupts but the overriding principles are clearly defined. Each different interrupt
Parti
needs to be handled appropriately and different interrupts might possibly have different
priorities. Therefore, the processor must have a means of identifying the type of interrupt.
One way is to have an interrupt register in the CPU that works like the status register, with
each individual bit operating as a flag for a specific type of interrupt.
As the flowchart in Figure 5.03 shows, the existence of an interrupt is only detected at the
end of a fetch-execute cycle. This allows the current program to be interrupted and left in a
defined state which can be returned to later. The first step in handling the interrupt is to store
the contents of the program counter and any other registers somewhere safe in memory.
Following this, the appropriate interrupt handler or interrupt service routine (ISR) program
is initiated by loading its start address into the program counter. When the ISR program
has been executed there needs to be an immediate check to see if further interrupts need
handling. If there are none, the safely stored contents of the registers are restored to the
CPU
and the originally running program is resumed.
• The von Neumann architecture for a computer system is based on the stored program
concept.
• The CPU contains a control unit, an arithmetic and logic unit, and registers.
• The system bus contains the data, address and control buses.
• A universal serial bus (USB) port can be used to attach peripheral devices.
la A processor has just one general-purpose register. Give the name of this register.
[1]
i its function
iii the register that supplies this data at the start of the fetch stage of the fetch-execute cycle.
[3]
its function
ii the register that supplies this data at the end of the fetch stage of the fetch-execute cycle.
[3]
d Explain three differences between the memory address register and the memory data
register.
[5]
2 The system bus comprises three individual buses: the data bus, the address bus and the
control bus.
a For each bus give a brief explanation of its use.
b Each bus has a defined bus width.
[1]
[2]
[3]
ii Explain the effect of changing the address bus from a 32-bit bus to a 64-bit bus.
3 The fetch stage of the fetch-decode-execute cycle can be represented by the following
statements using register
transfer notation:
mar «- [PC]
a Explain the meaning of each statement. The explanation must include definitions of the
following items:
b Explain the use of the address bus and the data bus for two of the statements.
[ 10 ]
[4]
• data movement
• arithmetic operations
• compare instructions
• modes of addressing
The only language that the CPU recognises is machine code. Therefore, when a program is
running and an instruction is fetched from memory this has to be in the format of a binary
code that matches the specific machine code that the CPU uses.
Different processors have different instruction sets associated with them. Even if two
different processors have the same instruction, the machine codes for them will be different
but the structure of the code for an instruction will be similar for different processors.
For a particular processor, the following components are defined for an individual machine
code instruction:
• whether the opcode occupies the most significant or the least significant bits.
In general, there can be anything up to three operands for an instruction. However, following
on from the approach in Chapter 5, we consider a simple system where there is either one
or
zero operands.
KEY TERMS
Machine code instruction: a binary code with a defined number of bits that comprises an
opcode
and, most often, one operand
The number of bits needed for the opcode depends on the number of different opcodes in
the instruction set for the processor. The opcode is structured with the first few bits defining
the operation and the remaining bits associated with addressing. A sensible instruction
format for our simple processor is shown in Figure 6.01.
Operation
Opcode
Address mode
Register addressing
Operand
4 bits
2 bits
2 bits
16 bits
CU <- [CIR(23:16)]
Indicating that only bits 16 to 23 from the contents of the CIR have been transferred to the
control unit; bits 0 to 15 are not needed in this first step.
Parti
A programmer might wish to write a program where the actions taken by the processor
are directly controlled. It is argued that this can produce optimum efficiency in a program.
However, writing a program as a sequence of machine code instructions would be a very
time-consuming and error-prone process. The solution for this type of programming is to
use assembly language. As well as having a uniquely defined machine code language each
processor has its own assembly language.
The essence of assembly language is that for each machine code instruction there is an
equivalent assembly language instruction which comprises:
If a program has been written in assembly language it has to be translated into machine
code before it can be executed by the processor. The translation program is called an
'assembler’, of which some details will be discussed in Chapter 7 (Section 7.05). The fact
that an assembler is to be used allows a programmer to include some special features in an
assembly language program. Examples of these are:
comments
The first three items on this list are there to directly assist the programmer in writing the
program. Of these, comments are removed by the assembler and symbolic names and
labels
require a conversion to binary code by the assembler. A macro or a subroutine contains a
sequence of instructions that is to be used more than once in a program.
Directives and system calls are instructions to the assembler as to how it should construct
the final executable machine code. They can involve directing how memory should be used
or defining files or procedures that will be used. They do not have to be converted into binary
code.
KEY TERMS
When an instruction requires a value to be loaded into a register there are different ways
of identifying the value. These different ways are described as the ‘addressing modes’. In
Section 6.01, it was stated that, for our simple processor, two bits of the opcode in a
machine
code instruction would be used to define the addressing mode. This allows four different
modes which are described in Table 6.01.
Addressing mode
Operand
Immediate
Direct
Indirect
Indexed
An address to which must be added what is currently in the
index register (IX) to get the address which holds the value
in the instruction
The examples described here do not correspond directly to those found in the assembly
language for any specific processor. Individual instructions will have a match in more than
one real-life set. The important point is that these examples are representative. In particular,
there are examples of the most common categories of instruction.
72
Data movement
These types of instruction can involve loading data into a register or storing data in memory.
Table 6.02 contains a few examples of the format of the instructions with explanations.
Instruction opcode
Instruction operand
Explanation
LDM
#n
LDR
#n
LDD
LDI
Indirect addressing, loading to ACC
LDX
STO
The important point to notice is that the mnemonic defines the instruction type including
which register is involved and, where appropriate, the addressing mode. It is important to
read the mnemonic carefully! The explanations for LDD, LDI and LDX need reference back
to
Table 6.01.
It is possible to use register transfer notation to describe the execution of an instruction. For
example, the LDD instruction is described by:
Parti
The instruction is in the CIR and only the 16-bit address needs to be examined to identify
the location of the data in memory. The contents of that location are transferred into the
accumulator.
TASK 6.01
Arithmetic operations
Table 6.03 contains a few examples of instruction formats used for arithmetic operations.
Instruction opcode
Instruction operand
Explanation
ADD
DEC
Question 6.01
' What would you need to do if, for example, you wanted to add 5 to the content in the
' accumulator?
A program might require an unconditional jump or might only need a jump if a condition
r is met. In the latter case, a compare instruction is executed first and the result of the
comparison is recorded by a flag in the status register. The execution of the conditional jump
instruction begins by checking whether or not the flag bit has been set. Table 6.04 shows the
f format for these types of instruction.
Instruction
opcode
Instruction operand
Explanation
JMP
CMP
CMP
#n
JPE
JPN
Note that the two compare instructions have the same opcode. For the second one, the
immediate addressing is identified by the # symbol preceding the number. In the absence of
the # the operand is interpreted as an address. Note also that the comparison is restricted to
asking if two values are equal.
The other point to note is that a jump instruction does not cause an actual immediate
jump. Rather, it causes a new value to be supplied to the program counter so that the next
instruction is fetched from this newly specified address. The incrementing of the program
counter that took place automatically when the instruction was fetched is overwritten.
74
The two examples here are instructions for a single character to be input or output. In each
case the instruction has only an opcode; there is no operand:
• The instruction with opcode in is used to store in the ACC the ASCII value of a character
typed at the keyboard.
• The instruction with opcode out is used to display on the screen the character for which
the ASCII code is stored in the ACC.
Consider some program instructions are contained in memory locations from 100 and
some eight-bit binary data values are contained in memory locations 200 and onwards,
For illustrative purposes the instructions are shown in assembly language form. At the
start of a part of the program, the memory contents are as shown in Figure 6.02.
Address
Contents
Address
Contents
100
LDD 201
200
0000 0000
101
INC ACC
201
0000 0001
102
ADD 203
202
0000 0010
103
CMP 205
203
0000 0011
104
JPE 106
204
0000 0100
105
DEC ACC
205
0000 0101
106
INC ACC
206
0000 0111
107
STO 206
207
0000 0000
Figure 6.02 The contents of memory addresses before execution of the program begins
The values stored in the program counter and in the accumulator as the program
instructions are executed are shown in Figure 6.03.
Program counter
Accumulator
100
0000 0000
101
0000 0001
102
0000 0010
103
0000 0101
104
0000 0101
106
0000 0101
107
0000 0110
108
0000 0110
Question 6.02
Can you follow through the changes in the values in the two registers in Worked Example
6.01? Are there any changes to the contents of memory locations 100 to 107 or 200 to 207
while the program is executing?
75
Summary
Exam-style Questions
1 Three instructions for a processor with an accumulator as the single general purpose
register are:
ldd
for direct addressing
ldi
for indirect addressing
ldx
for indexed addressing
In the diagrams below, the instruction operands, the register content, memory addresses
and the memory contents
are all shown as denary values.
i Draw arrows on a copy of the diagram below to explain execution of the instruction. [2]
Memory
address
100
101
102
Accumulator 103
104
105
106
107
ii Show the contents of the accumulator as a denary value after execution of the instruction.
[1]
i Draw arrows on a copy of the diagram below to explain execution of the instruction. [3]
Memory Memory
address content
100
101
102
Accumulator 103
104
105
106
107
Index register
Index register
Memory
content
ii Show the contents of the accumulator as a denary value after execution of the instruction.
c i Draw arrows on a copy of the diagram below to explain the execution of the instruction
ldx 103 .
Memory Memory
address content
100
101
102
Accumulator 103
104
105
106
107
ii Show the contents of the accumulator as a denary value after the execution.
Index register
[1]
[3]
[1]
a Name three types of component of an assembly language program that are not intended
to be directly
transformed into machine code by the assembler. For one component, state its purpose.
b Trace the following assembly language program using a copy of the trace table provided.
Note that the
100
LDD 201
101
INC ACC
102
STO 202
103
LDI 203
104
DEC ACC
105
STO 201
105
ADD 204
107
STO 201
108
END
201
10
202
203
204
204
[4]
[6]
Memory addresses
201
202
203
204
10
204
Chapter 7
System Software
Learning objectives
Pa
In the 1960s, the likely scenario for using a computer would be something like this:
1 Enter machine room with deck of punched cards and a punched paper tape reel.
2 Switch on computer.
3 Put deck of cards into card reader and press button.
5 Press button to run the program, entered into memory from the punched cards, which
uses the data entered into memory from the paper tape.
8 Leave machine room with deck of cards, paper tape and line-printer output.
What happened is that the user controlled the computer hardware by pressing buttons. Just
try to imagine how many buttons would be needed if you had to control a computer in the
same way today.
The missing component from the 1960s computer was, of course, an operating system; in
other words some software to control the hardware. An operating system is an example of
a type of software called ‘system software’. This distinguishes it from application software
which is created to perform a specific task for a computer user rather than just helping to run
the system.
Operating system: a software platform that provides facilities for programs to be run which
are of
benefit to a user
Operating systems are extremely complex and it is not possible to give an all-embracing
description of what an operating system is. However, what an operating system does can be
generalised by saying that it provides an environment within which programs can be run that
are of benefit to a user.
The activities of an operating system can be sub-divided into different categories. There is
overlap between many of these but the classification is worthwhile. The following account
provides a very brief explanation of each of the various tasks carried out by the operating
system. Details of how some of them are carried out are discussed in Chapter 20 (Sections
20.01,20.02 and 20.03).
User-system interface
A user interface is needed to allow the user to get the software and hardware to do
something useful. An operating system should provide at least the following for user input
and output:
• a command-line interface
• a graphical user interface (GUI).
Discussion Point:
Program-hardware interface
Programmers write software and users run this software. The software uses the hardware.
The operating system has to ensure that the hardware does what the software wants it to
do.
Program development tools associated with a programming language allow a programmer
to write a program without needing to know the details of how the hardware, particularly the
processor, actually works. The operating system then has to provide the mechanism for the
execution of the developed program.
Resource management
KEY TERMS
The resource management provided by the operating system aims to achieve optimum
efficiency in computer system use. The two most important aspects of this are:
• scheduling of processes
Memory management
• Memory protection ensures that one program does not try to use the same memory
locations’as another program.
• Memory usage optimisation involves decisions about which processes should be in main
memory at any one time and where they are stored in this memory.
Device management
Every computer system has a variety of components that are categorised as ‘devices’.
Examples include the monitor screen, the keyboard, the printer and the webcam. The
management of these requires:
File management
Security management
Chapters 8 (Section 8.02) and 21 (Section 21.04) discuss details of security issues. There
are
several aspects of security management which include:
• prevention of intrusion
Errors can arise in the execution of a program either because it was badly written or
because
it has been supplied with inappropriate data. Other errors are associated with devices not
working correctly. Whatever the cause of an error, the operating system should have the
capability to interrupt a running process and provide error diagnostics where appropriate.
In extreme cases, the operating system needs to be able to shut down the system in an
organised fashion without loss of data.
81
For each of the above categories of operating system task, the individual points mentioned
could often be mentioned in a different category. Make an abbreviated list of these
categories
and add arrows to indicate alternative places where items could be placed.
Question 7.01
It is useful to describe the management tasks carried out by an operating system as being
primarily one of the following types:
Considering the management tasks that have already been categorised, can you identify
them as belonging to one or other of the above types? Are there any problems in doing this?
A utility program is one that might be provided by the operating system but it might also be
one that is installed as a separate entity. It is a program that is not executed as part of the
normal routine of operating system utilisation. Rather it is a program that the user can decide
to run when needed or possibly a program that the operating system might decide to run in
certain circumstances. Some utility programs are associated with hard disk usage.
• removing existing data from a disk that has been used previously
• setting up the file system on the disk, based on a table of contents that allows a file
recognised by the operating system to be associated with a specific physical part of the
disk
Another utility program, which might be a component of a disk formatter, performs disk
contents analysis and, if possible, disk repair when needed. The program first checks for
errors on the disk. Some errors arise from a physical defect resulting in what is called a ‘bad
sector’. There are a number of possible causes of bad sectors. However, they usually arise
either during manufacture or from mishandling of the system. An example is moving the
computer without ensuring that the disk heads are secured away from the disk surface.
Other errors arise from some abnormal event such as a loss of power or an error causing
sudden system shutdown. Asa result some of the files stored on the disk might no longer be
in an identifiable state. A disk repair utility program can mark bad sectors as such and
ensure
that the file system no longer tries to use them. When the integrity of files has been affected,
the utility might be able to recover some of the data but otherwise it has to delete the files
from the file system.
82
A disk defragmenter utility could possibly be part of a disk repair utility program but it is
not primarily concerned with errors. A perfectly functioning disk will, while in use, gradually
become less efficient because the constant creation, editing and deletion of files leaves
them in
a fragmented state. The cause of this is the logical arrangement of data in sectors as
discussed
in Chapter 3 (Section 3.03), which does not allow a file to be stored as a contiguous entity.
A simple illustration of the problem is shown in Figure 7.01. Initially file A occupies three
sectors fully and part of a fourth one. File B is small so occupies only part of a sector.
File C occupies two sectors fully and part of a third. When File B is deleted, the sector
remains unfilled because it would require too much system overhead to rearrange the file
organisation every time there is a change. When File A is extended it completely fills the first
four sectors and the remainder of the extended file is stored in all of Sector 8 and part of
Sector 9. Sector 4 will only be used again if a small file is created or if the disk fills up, when
it
m ight store the first part of a lo nger file.
Initlial position
Sectors 0-3
Sector 4
Sectors 5-7
Sectors 8-9
File A
File B
FileC
File B is deleted
File A
FileC
ria■■
► Part 1
TASK 7.02
it
If you have never used a disk defragmenter or disk repair utility program can you get access
to
a system where you can use one? If so, note the changes that are carried out and recorded
by
f the utility program.
Backup software
It is quite likely that you perform a manual backup every now and then using a flash memory
stick. However, a safer and more reliable approach is to have a backup utility program do
this for you. You can still use the memory stick to store the backed-up data but the utility
program will control the process. In particular it can do two things:
• only create a new backup file when there has been a change.
File compression
A file compression utility program can be used as a matter of routine by an operating system
to minimise hard disk storage requirements. If the operating system does not do this, a user
can still choose to implement a suitable program. However, as was discussed in Chapter 1
(Section 1.07), file compression is most important when transmitting data. In particular, it
makes sense to compress (or‘zip’) a file before attaching it to an email.
r Virus checker
A library program can be defined as a program contained in a program library but both
r ‘library program’ and ‘program library’ are misleading terms. There may be programs in
t a program library but more often they are subroutines that programmers can use in their
L programs.
The most obvious examples of library routines are the built-in functions available for use
, (Section 13.08). Another example is the collection of over 1600 procedures for
mathematical
and statistics processing available from the Numerical Algorithms Group (NAG) library. This
organisation has been creating routines since 1971 and they are universally accepted as
being as reliable as software ever can be.
83
In Section 7.05, the methods available for translation of source code are discussed. For
the purpose of the discussion here you just need an overview of what happens. The
source code is written in a programming language of choice. If a compiler is used forthe
translation and no errors are found, the compiler produces object code (machine code).
This code cannot be executed by itself. Instead it has to be linked with the code for any
subroutines used by it. It is possible to carry out the linking before loading the composite
code into memory and running it.
By contrast, dynamic linking has the routines from a dynamic link library (DLL) already in
memory. While the code is running, it links to the DLL routine that it needs. A DLL is created
so that its routines can be shared. More than one process can dynamically link to a DLL file
at
any one time.
84
As with much of this chapter, the discussion will contain few details of how translators work
because they are dealt with in Chapter 20 (Section 20.05). The need for a language
translator
is easy to explain and, indeed, is explained in Chapter 6 (Section 6.02). Writing a program
directly in machine code would take a very long time and undoubtedly would lead to a
multitude of errors.
Assemblers
• removal of comments
• creation of a symbol table containing the binary codes for symbolic names and labels
• creation of a literal table if the programmer has used constants in the program
• expansion of macros
If errors are not found, the second pass of the assembler generates the object code. This
involves replacing symbolic addresses with absolute addresses.
As noted above, object code is not an executable code. The creation of executable code
requires a linker to be used to ensure that the object code forthe program and the object
codes for associated procedures are transferrable into memory with mutually consistent
memory locations. The actual transfer into memory is carried out by a loader or the loader
element of a link-loader. This carries out any final adjustment of memory addresses that
might be necessary.
—-----
The starting point for using either a compiler or an interpreter is a file containing source
code,
which is a program written in a high-level language.
1 The interpreter program, the source code file and the data to be used by the source code
program are all made available.
5 If an error is found this is reported and the interpreter program halts execution.
7 The interpreter program uses this intermediate code to execute the required action.
8 The next line of source code is read and Steps 4-8 are repeated. .
1 The compiler program and the source code file are made available but no data is needed.
7 The next line of source code is read and Steps 4-7 are repeated.
8 when the whole of the source code has been dealt with one of the following happens:
o If no error is found in the whole source code the complete intermediate code is
converted into object code.
o If any errors are found a list of these is output and no object code is produced.
Execution of the program can only begin when the compilation has shown no errors. This
can take place automatically under the control of the compiler program if data for the
program is available. Alternatively the object code is stored and the program is executed
later
with no involvement of the compiler.
Discussion Point:
What type of facility for language translation are you being provided with? Does your
experience of using it match what has been described here?
For a programmer, the following statements can be made about the advantages and
disadvantages of creating interpreted or compiled programs:
• An interpreter has advantages when a program is being developed because errors can be
identified as they occur and corrected immediately without having to wait for the whole
of the source code to be read and analysed.
• An interpreter has a disadvantage when a program is error free and is distributed to users
because the source code has to be sent to each user.
• A compiler has the advantage that an executable file can be distributed to users so the
users have no access to the source code.
For a user, the following statements can be made about the advantages and disadvantages
of using interpreted or compiled programs:
• For an interpreted program, the interpreter and the source code have to be available each
time that an error-free program is run.
• For a compiled program, only the object code has to be available each time that an error-
free program is run.
• Compiled object code will provide faster execution than is possible for an interpreted
program.
Java
When the programming language Java was created, a different philosophy was applied to
how it should be used. Each different type of computer has to have a Java Virtual Machine
created for it. Then when a programmer writes a Java program this is compiled first of all to
create what is called Java Byte Code. When the program is run, this code is interpreted by
the
Java Virtual Machine. The Java Byte Code can be transferred to any computer that has a
Java
Virtual Machine installed.
Summary
• Operating system tasks can be categorised in more than one way, for example, some are
for helping the user,
others are for running the system.
• Utility programs for a PC include hard disk utilities, backup programs, virus checkers and
file compression utilities.
• Library programs, including Dynamic Link Library (DLL) files, are available to be
incorporated into programs;
they are usually subroutines and are very reliable.
• For a two-pass assembler, typical activities in the first pass are creation of a symbol table
and expansion of
macros; object code is generated in the second pass.
• A Java compiler produces Java Byte Code which is interpreted by a Java Virtual Machine.
Exam-style Questions
1 a One of the reasons for having an operating system is to provide a user interface to a
computer system.
i Name two different types of interface that an operating system should provide. [2]
ii Identify for each type of interface a device that could be used to enter data. [2]
b Identify and explain briefly three other management tasks carried out by an operating
system. [6]
2 a A PC operating system will make available to a user a number of utility programs.
i Identify two utility programs that might be used to deal with a hard disk problem. [2]
ii For each of these utility programs explain why it might be needed and explain
ii A ‘two-pass’ assembler is usually used. Give two examples of what will be done in the first
pass. [2]
i State three differences between how an interpreter works and how a compiler works. [3]
iii If a programmer chooses Java, a special approach is used. Identify one feature of
88
Chapter 8
Learning objectives
Parti
It is easy to define integrity of data but far less easy to ensure it. Only accurate and up-to-
date data has data integrity. Any person or organisation that stores data needs it to have
integrity. Methods that can be used to give the best chance of achieving data integrity are
discussed in this chapter and also in Chapter 10 (Section 10.01).
KEYTERMS
Data privacy is about keeping data private rather than allowing it to be available in the
public domain. The term ‘data privacy’ may be applied to a person or an organisation. Each
individual has an almost limitless amount of data associated with their existence. Assuming
that an individual is not engaged in criminal or subversive activities, he or she should be in
control of which data about himself or herself is made public and which data remains private.
An organisation can have data that is private to the organisation, such as the minutes of
management meetings, but this will not be discussed further here.
For an individual there is little chance of data privacy if there is not a legal framework in
place
to penalise offenders who breach this privacy. Such laws are referred to as data protection
laws. The major aspects of data protection laws relate to personal, therefore private, data
that an individual supplies to an organisation. The data is supplied to allow the organisation
to use it but only for purposes understood and agreed by the individual. Data protection laws
oblige organisations to ensure the privacy and the integrity of this data. Unfortunately having
laws does not guarantee adherence to them but they do act as a deterrent if wrong-doers
can be subject to legal proceedings.
KEY TERMS
Discussion Point:
What data protection laws are in place in your country? Are you familiar with any details of
these laws?
Data protection normally applies to data stored in computer systems with the consent of the
individual. Should these laws be extended to cover storage of data obtained from telephone
calls or search engine usage?
Data can be said to be ‘secure’ if it is available for use when needed and the data made
available is the data that was stored originally. The security of data has been breached if the
data has been lost or corrupted.
It should be clear that data security is a prerequisite for ensuring data integrity and data
privacy. However, by itself it cannot guarantee either.
90
One of the requirements for protection of data is the security of the system used to store the
data. However, system security is not needed just to protect data. There are two primary
aims of system security measures. The first is to ensure system functionality. The second is
to
ensure that only authorised users have access to the system.
The threats to the security of a system can be categorised as being one of the following
types:
• internal mismanagement
• natural disasters
Continuity of operation is vital for large computer installations that are an integral part of the
day-to-day operations of an organisation. Measures are needed to ensure that the system
remains functional whatever event occurs or, if there has to be a system shut-down, at the
very least to guarantee resumption of service within a very short time. Such measures come
under the general heading of disaster recovery contingency planning. The contingency plan
should be based on a risk assessment. The plan will have provision for an alternative
system
to be brought into action. If an organisation has a full system always ready to replace the
normally operational one, it is referred to as a 'hot site’. By definition such a system has to
be
remote from the original system to allow recovery from natural disasters such as earthquake
or flood.
A special case of system vulnerability arises when there is a major update of hardware and/
or software. Traditionally, organisations had the luxury of installing and testing a new system
over a weekend when no service was being provided. In the modern era, globally available
systems are the norm: a company is never closed for business. As a result, organisations
may need to have the original system and its replacement running in parallel for a period to
ensure continuity of service.
Discussion Point:
Major failings of large computer systems are well documented. You could carry out research
to find some examples. Find an example of where the crisis was caused by technology
failure
and a different example where some natural disaster was the cause.
Even if a PC is used by only one person there should be a user account set up. User
accounts
are, of course, essential for a multi-user (timesharing) system. The main security feature
of a user account is the authentication of the user. The normal method is to associate a
password with each account. In order for this to be effective the password needs a large
number of characters including a variety of those provided in the ASCII scheme.
KEYTERMS
TASK 8.01
1 Create an example of a secure password using eight characters (but not one you are
going
to use).
2 Assuming that each character is taken from the ASCII set of graphic characters how many
different possible passwords could be defined by eight characters?
3 Do you think this is a sufficient number of characters to assume that the password would
not be encountered by someone trying all possible passwords in turn to access the
system?
General good practice that helps to keep a personal computer secure includes not leaving
the computer switched on when unattended, not allowing someone else to observe you
accessing the computer and not writing down details of how you access it.
A computer system is not only accessed by users logging in. One potential problem arises
from users attaching portable storage devices which can contain a virus. The safest practice
is for an organisation to have a policy banning the use of such devices. Unfortunately this is
not possible if normal business processes require portability of data.
The threat that is virtually unavoidable arises because of the connection of an organisation’s
systems to the Internet. The major potential problem is that transmissions into the system
from the Internet may contain malicious software. However, a further consequence of
Internet connection is that sensitive data from the system might be exported out to some
other system.
The primary defence to such problems is to install a firewall. Ideally a firewall will be a
hardware device that acts like a security gate at an international airport. Nothing is allowed
through without it being inspected. Alternatively, a firewall might be implemented as
software. The transmission must then enter the system but it can be inspected immediately.
The action of a firewall might be to concentrate solely on the addresses identified in any
transmission. However, in addition, a firewall might examine the data within the transmission
to check for anything inappropriate.
Security measures restricting access to a system do not guarantee success in removing all
threats. It is therefore necessary to have, in addition, programs running on a system to
check
for problems. Options for this are:
• a virus checker which carries out regular system scans to detect any viruses and remove
them or deactivate them
• an intrusion detection system that will take as input an audit record of system use and
look for anomalous use.
Cambridge International AS and A level Computer Science
It hardly needs saying that individuals intent on causing damage to systems are using
methods that are becoming ever more sophisticated. The defence methods have to be
improved continually to counter these threats.
There are a number of scenarios which require security methods for protecting data. The
three discussed here are data loss, access to data and protection of data content.
In addition to problems arising from malicious activity there are a variety of reasons for
accidental loss of data:
For maximum security the backup disks or tapes are stored away from the system in a fire¬
proof and flood-proof location.
This worked well when an incremental backup was done overnight with the full backup
handled at the weekend. With systems running 24/7 and therefore with data potentially
changing at any time, such a simple approach to backup will leave data in an inconsistent
state. One solution is to have a backup program that effectively freezes the file store
while data is being copied but also records elsewhere within the system changes that are
happening due to ongoing system use. The changes can then be made to the system files
when the backup copy has been stored.
If a user has logged in they have been authorised to use the computer system but not
necessarily all of it. In particular, the system administrator may recognise different categories
of user with different needs with respect to the data they are allowed to see and use. The
typical trivial example usually quoted is that one employee should be able to use the system
to look up another employee’s internal phone number. This should not allow the employee at
the same time to check the salary paid to the other employee.
The solution is to have an authorisation policy which in general gives different access
rights to different files for different individuals. For a particular file, a particular individual
might have no access at all or possibly read access but not write access. In another case,
an
individual might have read and append access but not unrestricted write access.
KEY TERMS
Even with appropriate security measures in place it can happen that there is unauthorised
access to a system or interception of data transmission. This can be made a futile activity
for the perpetrator if the data cannot be read. Data can be encrypted to ensure this. Some
details of encryption methods are discussed in Chapter 21 (Section 21.01).
Data integrity can never be absolutely guaranteed but the chances are improved if
appropriate measures are taken when data originally enters a system or when it is
transmitted from one system to another.
The term validation is a somewhat misleading one. It seems to imply that data is accurate if
it has been validated. This is far from the truth. If entry of a name is expected but the wrong
name is entered, it will be recognised as a name and therefore accepted as valid. Validation
can only prevent incorrect data if there is an attempt to input data that is of the wrong type,
in the wrong format or out of range.
Data validation is implemented by software associated with a data entry interface. There are
a number of different types of check that can be made. Typical examples are:
• a presence check to ensure that an entry field is not left blank
• a range check, for example the month in a date must not exceed 12
• a type check, for example only a numeric value for the month in a date.
Verification of data means confirming what has been entered. The most common example
is when a user is asked to supply a new password. There will always be a request for the
password to be re-entered. Clearly, if the user entered a password but did not enter it as
intended, subsequent attempts at access would fail. Verification is usually an effective
process but in general it does not ensure data accuracy because the wrong data could be
entered initially and in the re-entry.
Validation: a check that data entered is of the correct type and format; it does not guarantee
that data
is accurate
It is possible for data to be corrupted during transmission. Typically this applies at the bit
level with an individual bit being flipped from 1 to 0 or vice versa. Verification techniques
need to check on some property associated with the bit pattern.
The simplest approach is to use a simple one-bit parity check. This is particularly easy to
implement if data is transferred in bytes using a seven-bit code. Either even or odd parity
can
be implemented in the eighth bit of the byte. Assuming even parity, the procedure is:
1 At the transmitting end, the number of Is in the seven -bit code is counted.
An alternative approach is to use the checksum method. In this case at the transmitting end
a block is defined as a number of bytes. Then, irrespective of what the bytes represent, the
bits in each byte are interpreted as a binary number. The sum of these binary numbers in a
block is calculated and supplied as a checksum value in the transmission. This is repeated
for each block. The receiver does the same calculation and checks the summation value
with
the checksum value transmitted for each block in turn. Once again an error can be detected
but its position in the transmission cannot be determined.
For a method to detect the exact position of an error and therefore be able to correct an
error it has to be considerably more complex. A simple approach to this is the parity block
check method. Like the checksum method this is a longitudinal parity check; it is used to
check a serial sequence of binary digits contained in a number of bytes.
At the transmitting end, a program reads a group of seven bytes as illustrated in Figure
8.01. The data is represented by seven bits for each byte. The most significant bit in each
byte, bit 7, is undefined so we have left it blank.
Seven-bit codes
1
0
0
1
Part 1
The parity bit is set for each of the bytes, as in Figure 8.02. The most significant bit is set to
achieve even parity.
Parity
bits
Seven-bit codes
0
1
0
1
An additional byte is then created and each bit is set as a parity bit for the bits at that bit
position. This includes counting the parity bits in the seven bytes containing data. This is
illustrated in Figure 8.03.
Parity
bits
Seven-bit codes
0
0
1
1
Parity byte
At the receiving end, a program takes the eight bytes as input and checks the parity sums
for the individual bytes and for the bit positions.
Note that the method is handling a serial transmission so it includes longitudinal checking
but the actual checking algorithm is working on a matrix of bit values. If there is just one
error in the seven bytes this method will allow the program at the receiving end to identify
the position of the error. It can therefore correct the error so the transmission can be
accepted.
Question 8.01
1 Assume that the seven bytes shown in Figure 8.04 contain data. The most significant bit
is undefined because a seven-bit ASCII code is being used to represent character data.
Choose a parity and create the appropriate parity bit for each byte, then create the eighth
byte that would be used for transmission in a parity block check method.
01001000
01000101
01110010
01100011
00101100
01010101
00110010
2 The eight bytes shown in Figure 8.05 have been received in a transmission using the
parity
block method. The first seven bytes contain the data and the last byte contains the parity
check bits.
01001000
11000101
11110001
01100011
01001010
01010101
01110010
01110010
• Important considerations for the storage of data are: data integrity, data privacy and data
security.
• Data protection laws relate to data privacy.
• Security methods for data include backup procedures, user authorisation and access
control.
• Data entry to a system should be subject to data validation and data verification.
• Verification for data transmission may be carried out using: a parity check, a checksum or a
parity
block check method.
Part 1
Exam-style Questions
i Identify the missing word in the sentence ‘Concerns about the integrity of data are concerns
about its.
ii Validation and verification are techniques that help to ensure data integrity when data is
entered into a system.
Explain the difference between validation and verification.
iv Even after validation has been correctly applied data may lack integrity when it comes to
be used. Explain
why that might happen.
i Identify three reasons why data might not be available when a user needs it.
Describe two measures that could betaken to ensure security of the system.
3 a When data is transmitted measures need to be applied to check whether the data
i If data consists of seven-bit codes transmitted in bytes, describe how a simple parity check
system would
be used. Your account should include a description of what happens at the transmitting end
and what
happens at the receiving end.
b For either of these two methods there are limitations as to what can be achieved by them.
[1]
[3]
[2]
[2]
[4]
[3]
[3]
[2]
[2]
[4]
[5]
[3]
[2]
The following diagram represents eight bytes received where the parity block method has
been applied at the
transmitting end. The first seven bytes contain the data and the last byte contains parity bits.
Byte 1
Byte 2
Byte 3
Byte 4
Byte 5
1
1
Byte 6
Byte 7
Byte 8
1
Identify the problem with this received data and what would be done with it by the program
used by the receiver.
Chapter 9
Ethics and Ownership
Learning objectives
$C2
9.01 Ethics
You can find a number of definitions of what we might mean when we talk about ‘ethics’. The
following three sentences are representative:
• Ethics are the rules of conduct recognised in a particular profession or area of human life.
For present purposes we can ignore the first of these definitions. The third definition is the
focus of this chapter. However, the rules of conduct must inevitably reflect, at least in part,
the moral principles that are the foundation of the second definition. The following are some
observations that come to mind when considering moral principles.
Moral principles concern right or wrong. The concept of virtue is often linked to what
is considered to be right. What is right and wrong might be considered from one of the
following viewpoints: philosophical, religious, legal or pragmatic.
Philosophical debate has been going on for well over 2000 years. Early thinkers frequently
quoted in this context are Aristotle and Confucius but there are many more. Religions have
sometimes incorporated philosophies already existing or have introduced their own. Laws
should reflect what is right and wrong. Pragmatism could be defined as applying common
sense.
100
This chapter is not an appropriate place to discuss religious beliefs other than to make
the obvious statement that religious beliefs do have to be considered in the working
environment. Legal issues clearly impact on working practices but they are rarely the
primary
focus in rules of conduct. What remains as the foundation for rules of conduct are the
philosophical views of right and wrong and the pragmatic views of what is common sense.
These will constitute a frame of reference for what follows in this chapter.
The Association for Computing Machinery (ACM) and the Institute of Electrical and
Electronics Engineers (IEEE) are both based in the USA but have a global perspective and
global influence. It is therefore appropriate to consider the code of ethics that they have
proposed but this does not signify that codes of practice published in other countries are not
important.
In presenting the code, the authors make it clear that it in no way does the code represent
a look-up table that will prescribe an action to be taken given a defined circumstance.
They stress that the public interest is the central focus for the code. The code presents a
set of fundamental principles. They advocate that a professional should make an ethical
judgement based on thoughtful consideration of these fundamental principles.
The code defines eight principles. For each principle there is a one-sentence definition in the
preamble. In the full version of the code, each principle is expanded into clauses. Each
clause
refers to a specific aspect that should be considered in the context of that principle. This is a
form of checklist that gives a framework for an ethical judgement.
1 PUBLIC - Software engineers shall act consistently with the public interest.
2 CLIENT AND EMPLOYER - Software engineers shall act in a manner that is in the best
interests of their client and employer consistent with the public interest.
3 PRODUCT - Software engineers shall ensure that their products and related modifications
meet the highest professional standards possible.
6 PROFESSION - Software engineers shall advance the integrity and reputation of the
profession consistent with the public interest.
8 SELF - Software engineers shall participate in lifelong learning regarding the practice of
their profession and shall promote an ethical approach to the practice of the profession.
In total there are 80 clauses for these eight principles (numbered from 1.01 through to 8.09).
There is little to be gained from including all of them in this book. However, you should have
a copy readily available when you are studying this chapter (see https://www.acm.org/about/
se-code).
Examination of some of the clauses soon makes it clearthat many do not contain specific
reference to software engineering but rather, relate to proper behaviour for any group of
professionals. This can be illustrated by the following examples:
2.03 Use the property of a client or employer only in ways properly authorized, and with the
client’s or employer’s knowledge and consent.
5.04 Assign work only after taking into account appropriate contributions of education and
experience tempered with a desire to further that education and experience.
5.05 Ensure realistic quantitative estimates of cost, scheduling, personnel, quality and
outcomes on any project on which they work or propose to work, and provide an
uncertainty assessment of these estimates.
6.06 Obey all laws governing their work, unless, in exceptional circumstances, such
compliance is inconsistent with the public interest.
Clauses 5.04 and 6.06 illustrate a general tendency for the clauses to be more wordy than
they might have been because many of them have a qualifier. The same qualifier appears
more than once. Clause 5.05 is somewhat unusual with regard to the amount of detail. You
would expect a mention of realistic quantitative estimates but probably not the insistence on
an uncertainty assessment.
Discussion Point:
Clause 6.06 advocates law-breaking to serve the public interest. Can you think of
circumstances when you could agree that such action would be ethical? You might wish to
consider ‘whistle-blowing’.
102
In a real-life scenario there might be many individual clauses that should be considered
when a judgement is to be made. For example, let’s consider the following scenario.
You are working on a software engineering project. One day the project manager states
that the project is running behind schedule. As a result, the time allocated for testing of
the software will be limited to one week rather than the one month that was stated in the
project plan.
1 You would rule out any immediate need to consider public interest.
2 You would identify the primary cause of concern as being directly addressed by
clause 3.10: Ensure adequate testing, debugging, and review of software and related
documents on which they work.
3 You would identify the secondary cause of concern as being one of poor management
with clauses 5.01 and 5.11 being the most relevant: Ensure good management for any
project on which they work, including effective procedures for promotion of quality and
reduction of risk. Not ask a software engineer to do anything inconsistent with this Code.
4 You would now consider what action to take and would refer to clauses 6.11,6.12 and
6.13: Recognize that violations of this Code are inconsistent with being a professional
software engineer. Express concerns to the people involved when significant violations
of this Code are detected unless this is impossible, counter-productive, or dangerous.
Report significant violations of this Code to appropriate authorities when it is clear that
consultation with people involved in these significant violations is impossible, counter¬
productive or dangerous.
Question 9.01
There are several other clauses that might be considered as relevant. Flave a look at
clauses
3.02,3.05 and 7.01. Do you consider that any of these offer anything new in helping to judge
what should be done?
Discussion Point:
Search the clauses for all eight principles and identify the ones that mention documentation.
Why is documentation mentioned so many times?
What has been considered so far relates directly to professional working practices and
therefore revolves around the third definition of ethics presented in Section 9.01. When the
question of public good arises, consideration has to relate to the second definition as well. In
different parts of the code there is reference to:
• public concern.
Parti
There is no further indication of how these should be interpreted. It will be helpful to consider
some individual cases to illustrate what might be considered.
Fortunately, there are very few examples which have involved loss of life and certainly
none where large numbers of deaths were caused. However, there have been a number of
incidents where extremely large sums of money were wasted because of rather simplistic
errors.
The first example that could be mentioned is the Ariane 5 rocket which exploded 40 seconds
after blast-off in 1996. To the detriment of the public good, approximately 500 million dollars
were spent for no benefit at all. The problem was caused by a line of code that tried to
convert a 64-bit floating point number into a 16-bit integer. The resulting overflow crashed
the program and as a result also the rocket.
The second example also relates to space exploration. The NASA Mars Climate Orbiter
project centred on a space probe that was due to orbit Mars to study the climate. The probe
got to Mars but unfortunately failed to get into orbit. The cause of the problem was that all
of the software was supposed to use the SI system of units for all calculations. One group of
software engineers used the Imperial system of units. This mismatch only caused a problem
at the stage when the calculations concerned with achieving orbit around Mars were
executed. This time the loss to the public purse was a mere 125 million dollars.
These examples can be said to illustrate the public interest in successful software
engineering. There is a strong argument that the correct application of the code of ethics
with respect to specification and testing of software could have saved a lot of money.
A different type of disaster is the system that never gets built. In 2011 the UK government
scrapped the National Programme for IT in the NHS (National Health Service), which had
been commissioned in 2002. The project failed to produce a workable system. The
estimated
amount spent on the program was 12 billion pounds. The initial estimated cost was less than
three billion pounds. In examples like this the software engineers are not to blame, but if
correctly applied, the part of the code of ethics specifically targeted at project management
would not have allowed this type of fiasco to occur.
In the three examples outlined above the public concern was solely related to the costs
associated with a failed project. There was no public concern relating to the ethics of the
endeavour itself. In contrast there are many areas associated with computer-based systems
where there is public concern about the nature of the endeavour or at least about what it has
led to. The following examples can be considered in this context:
• companies providing systems that do not guarantee security against unauthorised access
• organisations that try to conceal information about a security breach that has occurred in
their systems
• private data transmitted by individuals to other individuals being stored and made
available to security services
• search engines providing search results with no concern about the quality of the content.
There is by no means a consistent public attitude to concerns like this. This makes it difficult
for an individual software engineer to make a judgement with respect to public good. Even if
the judgement is that a company is not acting in the public good it will always be difficult for
an individual to exert any influence. There are recent examples where individuals have taken
action which has resulted in their life being severely affected.
Discussion Point:
This section has deliberately been presented in generalisations. You should carry out a
search for some individual examples and then consider actions that could be taken and
justified as being for the public good.
An organisation can claim copyright for a published work if it is created by one or more
individuals that work for the organisation. Copyright cannot apply to an idea and it cannot
apply to a component of a published work.
KEYTERMS
• a literary work
• a musical composition
• a film
• a music recording
• a radio or TV broadcast
• a work of art
• a computer program.
The justification for the existence of copyright has two components. The first is that the
creation takes time and effort and requires original thinking. There should, therefore, be
opportunity for the copyright holder to be rewarded financially for this endeavour. The
second is that it is unfair for some other individual or organisation to reproduce the work and
to make money from it without any compensation to the original creator.
As with the case of data protection discussed in Chapter 8 (Section 8.01), there is a need
for legislation to try to deter abuses of copyright. The similarity continues in that legislation
cannot ensure that no abuses occur. Different countries have different details in their
legislation but there is an international agreement that copyright laws cannot be evaded by
reproducing the work in a different country from where the work was created.
• an agreed method for indicating the copyright, for example the use of the © symbol.
When copyright is in place there will be implications for how the work can be used. The
copyright owner can include a statement concerning how the work might be used. For
instance, the ACM has the following statement relating to the code of ethics discussed in
Section 9.02:
This Code may be published without permission as long as it is not changed in any way and
it
carries the copyright notice. Copyright © 1999 by the Association for Computing Machinery,
Inc. and the Institute for Electrical and Electronics Engineers, Inc.
This is one of several possible variations referring to permissions that are granted when the
work has not been sold. If someone has bought a copy of a copyrighted product there is no
restriction on copies being made provided that these are solely for the use of the individual.
A general regulation relates to books in a library, where a library user can photocopy part of
a
book.
Before the Internet came to be a dominant feature of people’s lives, breaches of copyright
were routinely happening in two ways. Individuals with a music system that included a tape
cassette recorder could record a radio broadcast. It also allowed a copy to be made of a
friend’s vinyl record. Individuals also often had unrestricted access to a photocopier in their
place of work and could copy printed material.
In the modern world, the cinema, broadcast and music industries are attempting to sell their
products as CDs, DVDs or Blu-ray discs. Illegal copying (known as ‘piracy’) now takes place
through using the Internet to download or stream data that was originally released for sale
on one of these optical media. As well as the change in approach, there is the significant
difference that illegal copying is now happening on a major scale and thus seriously affecting
the profitability of the creators.
The producers of the original product can use digital rights management (DRM) to attempt
to counter such activity. Originally DRM was simply used to make a CD playable on a CD
player but to prevent it being played on a computer system. Now DRM has to be used to
prevent ripping. This might involve encryption or deliberate inclusion of damaged sectors.
Unfortunately these techniques do not guarantee the prevention of piracy.
The major mechanism for piracy of media content is the widespread use of peer-to-peer file
sharing, a technology discussed in Chapter 17 (Section 17.07). As a result, there are moves
afoot to force ISPs to monitor the usage of this technology and to report usage to interested
parties. Naturally enough there is considerable resistance to such action in that it amounts to
a breach of privacy.
105
Commercial software
Commercial software almost always has to be paid for but there are a number of different
options that might be available:
• A company might have the option of buying a site licence which allows a defined number
of copies to be running at any one time.
106
For open licensing there are two major operations under way. Both are global non-profit
organisations.
The Open Source Initiative makes open source software, including the source code,
available for free. The aim is for collaborative development of software to take place. The
user
of the software is free to use it, modify it, copy it or distribute it according to need.
The Free Software Foundation has similar objectives but has also incorporated what it has
called ‘copyleft’. This is the condition that if the software is modified the source code for the
modified version must be made available under the same conditions of usage.
The two organisations are not in competition but there are some subtle differences in their
philosophy. There is a different raft of products made available by each of them.
Both these organisations offer free products. Another form of free software is termed
freeware. This is software that is distributed for free but without the source code.
Discussion Point:
Flow often do you think that open licence software is being used? Should it be used more
often?
TASK 9.01
Carry out a search to investigate some of the software available under an open licence.
KEY TERMS
Open source software: software free with unlimited use allowed and access to source code
Shareware: software free for use for a limited period but no source code provided
Freeware: software free with unlimited use allowed but no source code provided
Shareware licensing
• There is a history of software disasters that might have been prevented if sound software
engineering practice had been employed.
• Commercial software has to be paid for; alternatives are open licence or shareware which
are free.
Exam-style Questions
1 The ACM and IEEE set out eight principles for ethics and professional practice. The
categories, with a short explanation,
are shown in this diagram.
Statement 1: Team leaders should subscribe to and promote an ethical approach to the
management of software
development and maintenance.
Statement 2: Software engineers shall participate in lifelong learning regarding the practice
of the profession.
Statement 3: Software and related modifications meet the highest possible standards.
( Code of Et "X-
a These three statements need to be added to the diagram. Circle the correct numbers on
the diagram to indicate
the positions for Statement 1, Statement 2 and Statement 3.
[2]
b For each of these three workplace scenarios, unethical behaviour is demonstrated. Explain
the principle(s) which
i Workplace scenario 1
A large project is devolved to project teams, each led by a project leader. One project leader
fails to inform his
manager that he has major concerns that:
• their team’s software contribution is taking much longer to write and test than anticipated
• they are consequently at risk of spending over their allocated budget. [3]
ii Workplace scenario 2
The software house is about to train a number of programmers in a new programming
language. Two
employees are refusing to attend the training. [2]
iii The company is developing some monitoring software which requires sensors placed in a
nature reserve.
One employee considers the sensors will be a danger to some of the wildlife, but is told by
his manager that
Cambridge International /IS and A Level Computer Science 9608 Specimen Paper 1 Q6
2 a Copyright is an important consideration when something is created.
ii When copyright is registered, some data will be recorded. Identify two examples of the
type of data that
would be recorded.
[2]
iii Copyright legislation defines two conditions that will apply to the copyrighted work. Identify
one of these. [1]
iv When copyright has been established there are options for how usage will be controlled.
Give two alternatives
for the instructions that could be included in the copyright statement for the created item.
[2]
b When software is obtained there will be an associated license defining how it can be used.
i For commercial software, describe two different ways in which the license might be applied
and explain the
benefits to the customer of one of these.
[4]
Learning objectives
Name, contact details, banking details, band name, band agent name,
band agent contact details
The intention is that this file could be used if the agency needed to contact the band member
directly or through the band’s agent. It could also be used after a gig when the band member
has to be paid. Ignoring what would constitute contact details or banking details, we can
look at a snapshot of some of the data that might be stored for the member’s given name,
the member’s family name and the band name. The file might have a thousand or more lines
of text. The following is a selection of some of the data that might be contained in various
lines in the file:
Xiangfei
Jha
ComputerKidz
Mahesh
Ravuru
ITWizz
Dylan
Stoddart
Graham
Vandana
ITWizz
Vandana
Graham
ITWizz
Mahesh
Ravuru
ITWizz
Precious
Olsen
ComputerKidz
Precious
Olsen
ITWizz
It is clear that there are problems with this data. It would appear that when the data for
Vandana Graham was first entered her names were inserted in the wrong order. A later
correct entry was made without deletion of the original incorrect data. This type of problem
is not unique to a file-based system. There is no validation technique that could detect the
original error. By contrast, validation should have led to the correction of the missing band
name for Dylan Stoddart. The Precious Olsen data are examples of duplication of data and
inconsistent data.
There is also possibly an error that is not evident from looking at the file contents. A band
name could be entered here when that band doesn’t exist. This shows how a file-based
approach can lead to data integrity problems in an individual file. The reason is the lack of
in-built control when data is entered. The database approach can prevent such problems or,
at least, minimise the chances of them happening.
A different problem is a lack of data privacy. The file above was designed so that the finance
section could find the banking details and the recruitment section could find contact details.
The problem is that there cannot be any control of access to part of a file so staff in the
recruitment section would be able to access the banking details of band members. Data
privacy would be properly handled by a database system.
Mindful of this privacy problem the agency decides to store data in different files for different
departments of the organisation. Table 10.01 summarises the main data to be stored in each
department’s file.
Department
Contract
Member names
Band name
Gig details
Finance
Member names
Bank details
Gig details
Publicity
Band name
Gig details
Recruitment
Member names
Band name
Agent details
There is now data duplication across the files. This is commonly referred to as data
redundancy which doesn’t mean that the data is no longer of use but rather that once data
has been stored there is no need for it to be stored again. This can lead to data
inconsistency
because of errors in the original entry or errors in subsequent editing. This is a different
cause
of data lacking integrity. One of the primary aims of the database approach is the elimination
of data redundancy.
KEYTERMS
The above account has focussed on the problems associated with data storage in files. We
now need to consider the problems that might occur when programs access the files.
Traditionally a programmer wrote a program and at the same time defined the data files that
the program would need. For the agency each department would have its own programs
which would access the department’s data files. When a programmer creates a program for
a
department the programmer has to know how the data is organised in these files, for
example,
that the fourth item on a line in the file is a band name. This is an example of ‘data
dependency’.
It is very likely that the files used by one department might have some data which is the
same
as the data in the files of other departments. However, in the scenario presented above
there
is no plan for file sharing. A further issue is that the agency might decide that there is a need
for a change in the data stored. For instance, they might see an increasing trend for bands
to perform with additional session musicians. Their data will need to be entered into some
files. This will require the existing files to be re-written. In turn, this will require the programs
to be re-written so that the new files are read correctly. In a database scenario the existing
programs could still be run even though additional data was added. The only programming
change needed would be the writing of additional programs which used this additional data.
The other aspect of data dependency is that when file structures have been defined to suit
specific programs they will not be suited to supporting new applications. The agency might
feel
the need for an information system to analyse the success or otherwise of the gigs they
have
organised over a number of years. Extracting the data for this from the sort of file-based
system
described here would be a complex task which would take considerable time to complete.
111
The architecture is illustrated in Figure 10.01 in the context of a database to beset up for our
theatrical agency.
storage (the internal schema) are known only at the internal level, the lowest level in the
ANSI
architecture. This is controlled by the database management system (DBMS) software.
The programmers who wrote this software are the only ones who know the structure for
the storage of the data on disk. The software will accommodate any changes that might be
needed in the storage medium.
At the next level, the conceptual level, there is a single universal view of the database. This
is
controlled by the database administrator (DBA) who has access to the DBMS. In the ANSI
architecture the conceptual level has a conceptual schema describing the organisation of the
data as perceived by a user or programmer. However, this is often described as a logical
schema.
At the external level there are individual user and programmer views. Each view has an
external schema describing which parts of the database are accessible. A view can support
a
number of user programs. The DBA is responsible for setting up these views and for defining
the appropriate, specific access rights. The DBMS provides facilities for a programmer to
develop a user interface for a program. It also provides a query processor. The query is the
mechanism for extracting and manipulating data from the database. A programmer will
incorporate access to queries in a user interface. The other feature provided by the DBMS is
the capability for creating a report to present formatted output.
KEY TERMS
Data management system (DBMS): software that controls access to data in a database
Database administrator (DBA): a person who uses the DBMS to customise the database to
suit user
and programmer requirements
r~
Discussion Point:
How many of the above concepts are recognisable in your experience of using a database?
In the relational database model each item of data is stored in a relation which is a special
type of table. The strange choice of name has its origin in a mathematical theory. A relational
database is a collection of relational tables.
When a table is created in a relational database it is first given a name and then the
attributes
are named. In a database design, a table would be given a name with the attribute names
listed in brackets after the table name. For example, a database for the theatrical agency
may
contain the following tables:
Member fMemberlD. MemberGivenName, MemberFamilyName, BandName,...)
Band fBandName. AgentID,...)
The logical view of the data in these tables is given in Table 10.02 and Table 10.03. Each
attribute is associated with one column in the table and is in effect a column header. The
column itself contains attribute values.
MemberlD
Member
GivenName
Member
FamilyName
Band
Name
...
0005
Xiangfei
Jha
ComputerKidz
0009
Mahesh
Ravuru
ITWizz
0001
Dylan
Stoddart
ComputerKidz
0025
Vandana
Graham
ITWizz
BandName
AgentID
...
ComputerKidz
01
ITWizz
07
Although some database products do allow a direct view of a table this is not the norm
hence
the use of the term ‘logical view’ here. If a user wishes to inspect all of the data in a table a
query should be used.
KEY TERMS
A row in a relation should be referred to as a tuple but this strict nomenclature is not
always used. Often a row is called a ‘record’ and the attribute values ‘fields’. The tuple is the
collection of data stored for one ‘instance’ of the relation. In Table 10.02, each tuple relates
to
one individual band member. A fundamental principle of a relational database is that a tuple
is a set of atomic values; each attribute has one value or no value.
The most important feature of the relational database concept is the primary key. A primary
key may be a single attribute or a combination of attributes. Every table must have a primary
key and each tuple in the table must have a value for the primary key and that value must
be unique. Once a table and its attributes have been defined the next task is to choose the
primary key. In some cases there may be more than one attribute for which unique values
are
guaranteed. In this case, each one is a candidate key and one will be selected as the
primary
key. More often there is no candidate key and so a primary key has to be created. Table
10.02
shows an example of this with the introduction of the attribute MemberlD as the primary key
(the primary key is underlined in the logical view).
The primary key ensures ‘entity integrity 1 . The DBMS will not allow an attempt to insert a
value for a primary key when that value already exists. Therefore each tuple must be unique.
This is one of the features of the relational model that helps to ensure data integrity. The
primary key also provides a unique reference to any attribute value that a query is selecting.
Although it is possible for a database to contain stand-alone tables it is usually true that each
table will have some relationship with another table. This relationship is implemented by
using a foreign key.
KEYTERMS
Primary key: an attribute or a combination of attributes for which there is a value in each
tuple and
that value is unique
Foreign key: an attribute in one table that refers to the primary key in another table
The use of a foreign key can be discussed on the basis of the two database tables
represented in Table 10.02 and Table 10.03. When the database is being created, the Band
table is created first. BandName is chosen as the primary key because unique names for
bands can be guaranteed. Then the Member table is created. MemberlD is defined as the
primary key and the attribute BandName is identified as a foreign key referencing the
primary
key in the Band table. Once this relationship between primary and foreign keys has been
established, the DBMS will prevent any entry for BandName in the Member table being
made
if the corresponding value does not exist in the Band table. This provides referential integrity
which is another reason why the relational database model helps to ensure data integrity.
Question 10.01
BandName is a primary key for the Band table. Does this mean that as a foreign key in the
Member table it must have unique values? Explain your reasoning.
The top-down, stepwise refinement (see Chapter 12, Section 12.01) approach to database
design uses an entity-relationship (ER) diagram. This might be initially created and used by
a systems analyst before being passed on to the database designer. Otherwise the designer
has to create it. The term ‘relationship’ (not to be confused with a relation!) was introduced
earlier in connection with the use of a foreign key. An entity (strictly speaking an entity
type) could be a thing, a type of person, an event, a transaction or an organisation. Most
importantly, there must be a number of‘instances’ of the entity. An entity is something that
will become a table in a relational database.
Parti
Let’s consider a scenario for the theatrical agency which will be sufficient to model a
part of the final database they would need. The starting point for a top-down design is a
statement of the requirement:
The agency needs a database to handle bookings for bands. Each band has a number of
members. Each booking is for a venue. Each booking might be for one or more bands.
You look for the nouns. You ignore ‘agency’ because there is only the one. You choose
Booking, Band, Member and Venue. For each of these there will be more than one
instance. You are aware that each booking is for a gig at a venue but you ignore this
because you think that the Booking entity will be sufficient to hold the required data
about a gig.
This requires experience but the aim is not to define too many. You choose the following
three:
You ignore the fact that there will be, for example, a relationship between Member and
Venue because you think that this will be handled through the other relationships that
indirectly link them. You can now draw a preliminary ER diagram as shown in Figure 10.02.
Member
Band
Booking
Venue
Figure 10.02 A preliminary entity-relationship diagram
Now comes the crucial stage of deciding on what are known as the ‘cardinalities’ of the
relationships. At present we have a single line connecting each pair of entities. This line
actually defines two relationships which might be described as the ‘forward’ one and
the ‘backward’ one on the diagram as drawn. However, this only becomes apparent at
the final stage of drawing the relationship. First we have to choose one of the following
descriptions for the cardinality of each relation:
• one-to-one or 1:1
• one-to-many or 1:M
• many-to-one or M:1
• many-to-many or M:M.
This can be illustrated by considering the relationship between Member and Band. We
argue that one Member is a member of only one Band. (This needs to be confirmed as a
fact by the agency.) We then argue that one Band has more than one Member so it has
many. Therefore the relationship between Member and Band is M:l. In its simplest form,
this relationship can be drawn as shown in Figure 10.03.
Member
Band
This can be given more detail by including the fact that a member must belong to a Band
and a Band must have more than one Member. To reflect this, the relationship can be
drawn as shown in Figure 10.04.
Member
Band
At each end of the relationship there are two symbols. One of the symbols shows the
minimum cardinality and the other the maximum cardinality. In thisparticularcase the
minimum and maximum values just happen to be the same. Flowever, using the diagram
to document that a Member must belong to a Band is important. It indicates that when
the database is created it must not be possible to create a new entry in the Member table
unless there is a valid entry for BandName in that table.
For the relationship between Booking and Venue we argue that one Booking is for one
Venue (there must be a venue and there cannot be more than one) and that one Venue
can be used for many Bookings so the relationship between Booking and Venue is M:l.
Flowever, a Venue might exist that has so far never had a booking so the relationship can
be drawn as shown in Figure 10.05.
Booking
Venue
Finally for the relationship between Band and Booking we argue that one Booking can
be for many Bands and that one Band has many Bookings (hopefully!) so the relationship
is M:M. Flowever, a new band might not yet have a booking. Also there might be only one
Band for a booking so the relationship can be drawn as shown in Figure 10.06.
Band
-o<^ Booking
At this stage we should name each relationship. The full ER diagram for the limited
scenario that has been considered is as shown in Figure 10.07.
Member
belongs to
has
Band >
is booked for
is for
is made at
Booking >~o-
is booked for
Venue
Figure 10.07 The ER diagram for the theatrical agency’s booking database
To illustrate how the information should be read from such a diagram we can look at the
part shown in Figure 10.08. Despite the fact that there is a many-to-many relationship,
a reading of a relationship always considers just one entity to begin the sentence. So,
reading forwards and then backwards, we say that:
Parti
is booked for
Band
►-< Booking
is for
A fully annotated ER diagram of the type developed in Section 10.04 holds all of the
information about the relationships that exist for the data that is to be stored in a system.
It can be defined as a conceptual model because it does not relate to any specific way of
implementing a system. If the system is to be implemented as a relational database the ER
diagram has to be converted to a logical model. To do this we can start with a simplified ER
diagram that just identifies cardinalities.
If a relationship is 1:M, no further refinement is needed. The relationship shows that the
entity at
the many end needs to have a foreign key referencing the primary key of the entity at the
one end.
If there were a 1:1 relationship there are options for implementation. However, such
relationships are extremely rare and will not be considered further.
The problem relationship is the M:M, where a foreign key cannot be used. A foreign key
attribute can only have a single value so it cannot handle the many references required. The
solution for the M:M relationship is to create a link entity. For Band and Booking, the logical
entity model will contain the link entity shown in Figure 10.09.
Band
Band-Booking
Booking
117
With the link entity in the model it is now possible to have two foreign keys in the link entity;
one referencing the primary key of Band and one referencing the primary key of Booking.
Each entity in the logical ER diagram will become a table in the relational database. It is
therefore possible to choose primary keys and foreign keys for the tables. These can be
summarised in a key table. Table 10.04 shows sensible choices for the theatrical agency’s
booking database.
Table name
Primary key
Foreign key
Member
MemberlD
BandName
Band
BandName
Band-Booking
BandName, BookingID
Booking
BookingID
VenueName
Venue
VenueName
Table 10.04 A key table for the agency booking database
118
The decisions about the primary keys are determined by the uniqueness requirement. The
link entity cannot use either BandName or BookingID alone but the combination of the two
in a compound primary key will work.
TASK 10.01
Consider the following scenario. An organisation books cruises for passengers. Each cruise
visits a number of ports. Create a conceptual ER diagram and convert it to a logical ER
diagram. Create a key table for the database that could be implemented from the design.
10.06 Normalisation
Normalisation is a design technique for constructing a set of table designs from a list of data
items. It can also be used to improve on existing table designs.
To illustrate the technique let’s consider the document shown in Figure 10.10. This is a
booking data sheet that the theatrical company might use.
Venue:
Camside
CA1
Headlining
ComputerKidz
ITWizz
3
The data items on this sheet (ignoring headings) can be listed as a set of attributes:
The list is put inside brackets because we are starting a process of table design. The
extra set of brackets around BandName, NumberOfMembers, Headlining is because they
represent a repeating group. If there is a repeating group, the attributes cannot sensibly
be put into one relational table. A table must have single rows and atomic attribute
values so the only possibility would be to include tuples such as those shown in Table
10.05. There is now data redundancy here with the duplication of the bookingID, venue
data and the date.
Booking
ID
Venue
Name
Venue
Addressl
Venue
Address2
Date
Band
Name
Number
Of Members
Headlining
2016/023
CA1
23.06.2016
ComputerKidz
2016/023
Camside
CA1
23.06.2016
ITWizz
Part 1
The conversion to first normal form (INF) requires splitting the data into two groups. At this
stage we represent the data as table definitions. Therefore we have to choose table names
and identify a primary key for each table. One table contains the non-repeating group
attributes the other the repeating group attributes. For the first table a sensible design is:
The table with the repeating group is not so straightforward. It needs a compound
primary key and a foreign key to give a reference to the first table. The sensible design is:
The Booking table is automatically in 2NF; only tables with repeating group attributes
have to be converted. For conversion to second normal form (2NF), the process is
to examine each non-key attribute and ask if it is dependent on both parts of the
compound key. Any attributes that are dependent on only one of the attributes in the
compound key must be moved out into a new table. In this case, NumberOfMembers is
only dependent on BandName. In 2NF there are now three table definitions:
Note that the Booking table is unchanged from INF. The Band-Booking table now has
two foreign keys to provide reference to data in the other two tables. The characteristics
of a table in 2NF is that it either has a single primary key or it has a compound primary
key with any non-key attribute dependent on both components.
For conversion to third normal form (3NF) each table has to be examined to see if there
are any non-key dependencies; that means we must look for any non-key attribute that is
dependent on another non-key attribute. If there is, a new table must be defined.
Note that once again a new foreign key has been identified to keep a reference to data in the
newly created table. These four table definitions match four of the entities in the logical ER
model for which the keys were identified in Table 10.04. This will not always happen. A
logical
ER diagram will describe a 2NF set of entities but not necessarily a 3NF set.
119
Cambridge International AS and A level Computer Science
KEY TERMS
Repeating group: a set of attributes that have more than one set of values when the other
attributes
each have a single value
To summarise, if a set of tables are in 3NF it can be said that each non-key attribute is
dependent on the key, the whole key and nothing but the key.
Question 10.02
In Step 2 of Worked Example 10.02, why is the Headlining attribute not placed in the Band
table?
TASK 10.02
07845 25-06-2016
Product
no
Description
Quantity
Price /
unit
Total
327
Inkjet cartridges
24
$30
$720
563
Laser toner
$25
$125
Total Price
$835
SQL is the programming language provided by a DBMS to support all of the operations
associated with a relational database. Even when a database package offers high-level
facilities for user interaction, they use SQL.
Data Definition Language (DDL) is the part of SQL provided for creating or altering tables.
These commands only create the structure. They do not put any data into the database.
The following are some examples of DDL that could be used in creating the database for the
theatrical agency:
These examples show that once the database has been created the tables can be created
and the attributes defined. It is possible to define a primary key and a foreign key within the
create table command but the alter table command can be used as shown (it can
also be used to add extra attributes).
Part 1
TASK 10.03
For the database defined in Worked Example 10.02, complete the DDL for creating the four
tables. Use varchar2(5) for BookingID, number(l) for NumberOfMembers, date for Date,
. Data Manipulation Language (DML) is used when a database is first created, to populate
the
tables with data. It can then be used for ongoing maintenance. The following code shows a
selection of the use of the commands:
UPDATE Band
The above code shows the two methods of inserting data. The first, simpler version can be
used if the order of the attributes is known. The second is the safer method: the attributes
are defined then the values are listed. The next two statements show a change of data and
\ the removal of data.
I The main use of DML is to obtain data from a database using a query. A query always
starts
l SELECT BandName
!
I FROM Band
ORDER BY BandName;
l SELECT BandName
[ FROM Band-Booking
f GROUP BY BandName;
Both of these examples select data from a single table. The first produces an ordered list of
all the bands. The second produces a list of bands that have headlined a gig. The group by
b restriction ensures that the band names are not repeated.
f A query can be based on a ‘join condition’ between data in two tables. The most frequently
121
Note the use of the full names of attributes, which include the table name. This query will
find
the venue and date of bookings for the band ComputerKidz.
Although a database can be accessed directly using SQL there is often a need to control
access to a database using a different language. This makes sense because a program can
access data in a file so why not in a database? Programming languages therefore have a
mechanism for embedding an SQL command into a program.
This code assumes that you have created a MYSQL database on a server located on your
own
computer.
10.08 DBMS features
There are a few important features of a DBMS which have not been mentioned. The first and
most important is the data dictionary which is part of the database that is hidden from view
from everyone except the DBA. It contains metadata about the data. This includes details
of all the definitions of tables, attributes and so on but also of how the physical storage is
organised.
There are a number of features to improve performance. Of special note is the capability
to create an index for a table. This is needed if the table contains a lot of data. An index is a
secondary table which is associated with an attribute that has unique values. The index table
contains the attribute values and pointers to the corresponding tuple in the original table.
The index can be on the primary key or on a secondary key which was a candidate key
when
the choice of primary key was made. Searching an index table is much quicker than
searching
the full table.
Part 1
Summary
• A database offers improved methods for ensuring data integrity compared to a file-based
approach.
• A database architecture provides, for the user, a conceptual level interface to the stored
data.
• A relational database comprises tables of a special type; each table has a primary key and
may
contain foreign keys.
• Normalisation is a database design method which starts with a collection of attributes and
converts them into first normal form then into second normal form and, finally, into third
normal
form.
• Structured Query Language (SQL) includes data definition language (DDL) commands for
establishing a database and data manipulation language (DML) commands for creating
queries.
Exam-style Questions
la A relational database has been created to store data about subjects that students are
studying. The following is a
selection of some data stored in one of the tables. The data represents the student’s name,
the personal tutor group,
the personal tutor, the subject studied, the level of study and the subject teacher but there is
some data missing:
Xiangfei
MUB
Computing
DER
Xiangfei
MUB
Maths
BNN
Xiangfei
MUB
Physics
AS
DAB
Mahesh
BAR
History
AS
IJM
Mahesh
BAR
Geography
AS
CAB
i Define the terms used to describe the components in a relational database table using
examples from
ii If this represented all of the data, it would have been impossible to create this table.
What is it that has not been shown here and must have been defined to allow the creation as
a relational
database table? Explain your answer and suggest examples of the missing data. [4]
iii Is this table in first normal form (INF)? Explain your reason. [2]
b It has been suggested that the database design could be improved. The design suggested
contains the following
two tables:
StudentSubject(StudentName, Subject,
Level, SubjectTeacher)
Cambridge International AS and A level Computer Science
i Identify features of this design which are characteristic of a relational database. [3]
iii Explain why the Student table is notin third normal form (3NF). [2]
A company provides catering services for clients who need special-occasion, celebratory
dinners. For
each dinner, a number of dishes are to be offered. The dinner will be held at a venue. The
company will
provide staff to serve the meals at the venue.
The company needs a database to store data related to this business activity.
b Identify pairs of entities where there is a direct relationship between them. [4]
c For each pair of entities, draw the relationship and justify the choice of cardinality
illustrated by the
representation. [6]
124
Colwyn Bay
North Wales
Date
Room type
Number of
rooms
Room rate
23/06/2016
Front-facing double
$80
23/06/2016
Rear-facing double
$65
24/06/2016
Front-facing double
$80
a Create an unnormalised list of attributes using the data shown in this form. Make sure that
you distinguish
b Convert the data to first normal form (INF). Present this as designs for two tables with
keys identified. P]
c Choose the appropriate table and convert it to two tables in second normal form (2NF).
Explain your choice
oftableto modify. Explain your identification ofthe keys for these two new tables. [5]
d Identify which part of your design is not in Third Normal Form (3NF). [2]
Chapter 11
Learning objectives
KEYTERMS
126
We use algorithms in everyday life. If you need to change a wheel on a car, you might need
to
follow instructions (the algorithm) from a manual:
This might sound all very straightforward. However, if the instructions are not followed in the
correct logical sequence, the process might become much more difficult or even impossible.
For example, if you tried to do Step 1 after Step 3, the wheel may spin and you can’t loosen
the wheel nuts. You can’t do Step 4 before Step 3.
1 Measure the following ingredients: 200g sugar, 200g butter, 4 eggs, 200g flour, 2
teaspoons
baking powder and 2 tablespoons of milk.
2 Mix the ingredients together in a large bowl, until the consistency of the mixture is
smooth.
The recipe is an algorithm. The ingredients are the input and the cake is the output. The
process is mixing the ingredients and cooking the mixture in the oven.
Sometimes a step might need breaking down into smaller steps. For example Step 2 can be
more detailed:
2.3 Sieve the flour and baking powder and stir slowly into the egg mixture.
Sometimes there might be different steps depending on some other conditions. For
example,
consider how to get from one place to another using the map of the London Underground
system in Figure 11.01.
Old Str«
Piccadilly
Victoria
Jubilee
To travel from King’s Cross St. Pancras to Westminster, we consider two routes:
• Route A: Take the Victoria Line to Green Park (4 stations); then take the Jubilee Line to
Westminster (1 station).
• Route B: Take the Piccadilly Line to Green Park (6 stations); then take the Jubilee Line to
Westminster (1 station).
Route A looks like the best route. If there are engineering works on the Victoria Line and
trains are delayed, Route B might turn out to be the quicker route.
The directions on how to get from King’s Cross St. Pancras to Westminster can be written
as:
IF there are engineering works on the Victoria Line
THEN
ELSE
127
TASK 11.01
Many problems have more than one solution. Sometimes it is a personal preference which
solution to choose. Sometimes one solution will be better than another.
TIP
Computer scientists are interested in finding good solutions. A good solution gives the
correct
results, takes up as little computer memory as possible and executes as fast as possible.
The
solution should be concise, elegant and easy to understand.
An algorithm consists of a sequence of steps. Under certain conditions we may wish not to
KEY TERMS
Structured English: a subset of the English language that consists of command statements
used to
describe an algorithm
perform some steps. We may wish to repeat a number of steps. In computer science, when
writing algorithms, we use four basic types of construct:
• Assignment:
a value is given a name (identifier) or the value associated with a given identifier is
changed.
• Sequence:
• Selection:
under certain conditions some steps are performed, otherwise different (or no) steps are
performed.
• Repetition:
Many problems we try to solve with a computer involve data. The solution involves inputting
data to the computer, processing the data and outputting results (as shown in Figure 11.02).
We need to know the constructs so we know how detailed our design has to be.
These constructs are represented in each of the three notations as shown in Table 11.01.
In this book, algorithms and program code are typed using the courier font.
11.03 Variables
When we input data for a process, individual values need to be stored in memory. We need
to be able to refer to a specific memory location so that we can write statements of what to
do with the value stored there. We refer to these named memory locations as variables. You
can imagine these variables like boxes with name labels on them. When a value is input, it is
stored in the box with the specified name (identifier) on it.
KEY TERMS
For example, the variable used to store a count of how many guesses have been made
might
be given the identifier NumberofGuesses and the player’s name might be stored in a
variable
called ThisPlayer, as shown in Figure 11.03.
Variable identifiers should not contain spaces, only letters, digits and _ (the underscore
symbol). To make algorithms easier to understand, the naming of a variable should reflect
the variable’s use. This means often that more than one word is used as an identifier. The
formatting convention used here is known asCamelCaps. It makes an identifier easier to
read.
11.04 Assignments
Assigning a value
INPUT Number
NumberofGuesses 1
9j
» Part 2
Updating a value
The following pseudocode takes the value stored in NumberofGuesses (see Figure 11.05
(a)),
adds 1 to that value and then stores the new value back into the variable NumberofGuesses
(see Figure 11.05 (b)).
NumberofGuesses NumberofGuesses + 1
(a) (b)
Figure 11.05 Updating the value of a variable
[ Copying a value
| The following pseudocode takes the value stored in vaiuei and copies it to value2
The value in vaiuei remains the same until it is assigned a different value.
131
If we want to swap the contents of two variables, we need to store one of the values in
another variable temporarily. Otherwise the second value to be moved will be overwritten by
the first value to be moved.
Temp Value 1
Value 1 Value2
Value2 Temp
The problem to be solved: Convert a distance in miles and output the equivalent distance in
km.
Step 1: Write the problem as a series of structured English statements:
We need a variable to store the original distance in miles and a variable to store the result
of multiplying the number of miles by 1.61. It is helpful to construct an identifier table to
list the variables.
Identifier
Explanation
Miles
Km
The detail given in a flowchart should be the same as the detail given in pseudocode. It
should use the basic constructs listed in Table 11.01.
Figure 11.08 represents our algorithm using a flowchart and the equivalent pseudocode.
KEY TERMS
ldentifiertable:atable
listing the variable identifiers required for the solution, with explanations
TASK 11.02
In Section 11.01, we looked at an algorithm with different steps depending on some other
condition:
ELSE
The selection construct in Table 11.01 uses a condition to follow either the first group of
steps
or the second group of steps (see Figure 11.09).
Simple condition
IF A < B
THEN
<statement(s)>
ELSE
<statement(s)>
END IF
A condition consists of at least one logic proposition (see Chapter 4, Section 4.01). Logic
propositions use the relational (comparison) operators shown in Table 11.03.
Operator
Comparison
Is equal to
<
Is less than
Is greater than
<=
<>
Is not equal to
Conditions are either TRUE or FALSE. In pseudocode, we distinguish between the relational
operator = (which tests for equality) and the assignment symbol
A person is classed as a child if they are under 13 and as an adult if they are over 19. If they
are
between 13 and 19 inclusive they are classed as teenagers. We can write these statements
as
logic statements:
Part 2
TASK 11.03
A town has a bus service where passengers under the age of 12 and over the age of 60 do
not
need to pay a fare. Write the logic statements for free fares.
• If the number input was smaller than the secret number, output message “secret number
is greater”.
IF Guess = SecretNumber
THEN
ELSE
ELSE
ENDIF
ENDIF
More complex conditions can be formed by using the logical operators AND, OR and NOT.
For
example, the number-guessing game might allow the player multiple guesses; if the player
has not guessed the secret number after 10 guesses, a different message is output.
IF Guess = SecretNumber
THEN
ELSE
Complex condition
ELSE
ENDIF
ENDIF
ENDIF
The problem to be solved: Take three numbers as input and output the largest number.
There are several different methods (algorithms) to solve this problem. Here is one method:
2 Store each of the input values in a separate variable (the identifiers are shown in Table
11.04).
3 Compare the first number with the second number and then compare the bigger one
of these with the third number.
Identifier
Explanation
Numberl
Number2
Number3
INPUT Numberl
INPUT Number2
INPUT Number3
IF Numberl > Number2
OUTPUT Numberl
ELSE
OUTPUT Number3
ENDIF
OUTPUT Number2
ELSE
OUTPUT Number3
ENDIF
ENDIF
Part 2
Question: 11.01
►
f
The problem to be solved: Take three numbers as input and output the largest number.
This is an alternative method to Worked Example 11.02.
2 Input the second number and compare it with the value in BiggestsoFar.
4 Input the third number and compare it with the value in BiggestsoFar
5 If the third number is bigger, assign its value to BiggestsoFar
The identifiers required for this solution are shown in Table 11.05.
Identifier
Explanation
BiggestsoFar
NextNumber
Table 11.05 Identifier table for the alternative solution to the biggest number problem
INPUT BiggestsoFar
INPUT NextNumber
IF NextNumber > BiggestsoFar
THEN
ENDIF
INPUT NextNumber
IF NextNumber > BiggestsoFar
THEN
ENDIF
OUTPUT BiggestsoFar
Note that when we input the third number in this method the second number gets
overwritten as it is no longer needed.
There are several advantages of using the method in Worked Example 11.03 compared to
the
method in Worked Example 11.02:
• This algorithm can be adapted more easily if further numbers are to be compared (see
Worked Example 11.04).
The disadvantage of the method in Worked Example 11.03 compared to the method in
Worked Example 11.02 is that there is more work involved with this algorithm. If the second
number is bigger than the first number, the value of BiggestsoFar has to be changed. If
the third number is biggerthan the value in BiggestsoFar then the value of BiggestsoFar
137
has to be changed again. Depending on the input values, this could result in two extra
assignment instructions being carried out.
138
11.06 Loops
Look at the pseudocode algorithm in Worked Example 11.03. The two if statements are
identical. To compare 10 numbers we would need to write this statement nine times.
Moreover, if the problem changed to having to compare, for example, 100 numbers, our
algorithm would become very tedious. If we use a repetition construct (a loop) we can avoid
writing the same lines of pseudocode over and over again.
Question: 11.02
What changes do you need to make to the algorithm in Worked Example 11.04:
There is another loop construct that does the counting for us: the for. . .endfor loop.
The problem to be solved: Take 10 numbers as input and output the largest number.
We can use the same identifiers as in Worked Example 11.04. Note that the purpose of
counter has changed.
Identifier
Explanation
BiggestSoFar
NextNumber
Counter
Table 11.07 Identifier table for biggest number problem using a FOR loop
INPUT BiggestSoFar
FOR Counter <- 2 TO 10
INPUT NextNumber
IF NextNumber > BiggestSoFar
THEN
ENDIF
ENDFOR
OUTPUT BiggestSoFar
The first time round the loop, counter is set to 2. The next time round the loop,
counter has automatically increased to 3, and so on. The last time round the loop,
Counter has the value 10.
A rogue value is a value used to terminate a sequence of values. The rogue value is of the
same data type but outside the range of normal expected values.
KEYTERMS
Note: In this example the rogue value chosen is 0. It is very important to choose a rogue
value
that is of the same data type but outside the range of normal expected values. For example,
if
the input might normally include 0 then a negative value, such as -1, might be chosen.
Look at Worked Example 11.05. Instead of counting the numbers input, we need to check
whether the number input is 0 to terminate the loop. The identifiers are shown in Table
11.08.
Identifier
Explanation
BiggestSoFar
NextNumber
Table 11.08 Identifier table for biggest number problem using a rogue value
140
INPUT BiggestSoFar
REPEAT
INPUT NextNumber
IF NextNumber > BiggestSoFar
THEN
BiggestSoFar NextNumber
ENDIF
UNTIL NextNumber = 0
OUTPUT BiggestSoFar
This algorithm works even if the sequence consists of only one non-zero input. However,
it will not work if the only input is 0. In that case, we don’t want to perform the statements
within the loop at all. We can use an alternative construct, the while... endwhile loop.
INPUT NextNumber
BiggestSoFar NextNumber
WHILE NextNumber <> 0 // sequence terminator not encountered
INPUT NextNumber
IF NextNumber > BiggestSoFar
THEN
BiggestSoFar NextNumber
ENDIF
ENDWHILE
OUTPUT BiggestSoFar
Before we enter the loop we check whether we have a non-zero number. To make this
work for the first number, we store it in NextNumber and also in BiggestSoFar. If this
first number is zero we don’t follow the instructions within the loop. For a non-zero first
number this algorithm has the same effect as the algorithm using repeat. . .until.
Consider the number guessing game again, this time allowing repeated guesses:
1 The player repeatedly inputs a number to guess the secret number stored.
2 If the guess is correct, the number of guesses made is output and the game stops.
3 If the number input is larger than the secret number, the player is given the message to
input a smaller number.
4 If the number input is smaller than the secret number, the player is given the message
to input a larger number.
Part 2
Identifier
Explanation
SecretNumber
NumberOfGuesses
Guess
142
SecretNumber Random
NumberOfGuesses «- 0
REPEAT
INPUT Guess
ENDIF
ENDIF
The above solution uses a post-condition (repeat. . .until) loop. An alternative solution
uses a pre-condition (while...endwhile) loop:
SecretNumber Random
INPUT Guess
NumberOfGuesses <- 1
WHILE Guess <> SecretNumber
IF Guess > SecretNumber
THEN
ENDIF
ENDIF
INPUT Guess
NumberOfGuesses NumberOfGuesses + 1
ENDWHILE
The problem to be solved: Take 10 numbers as input and output the sum of these
numbers and the average.
Identifier
Explanation
RunningTotal
Counter
How many numbers have been input
NextNumber
Average
Table 11.10 Identifier table for running total and average algorithm
RunningTotal *- 0
FOR Counter ^ 1 TO 10
INPUT NextNumber
OUTPUT RunningTotal
Average RunningTotal / 10
OUTPUT Average
It is very important that the value stored in RunningTotal is initialised to zero before we
start adding the numbers being input.
TASK 11.04
Change the algorithm in Worked Example 11.08 so that the sequence of numbers is
terminated by a rogue value of 0.
The problem to be solved: Take as input two numbers and a symbol. Output a grid made
up entirely of the chosen symbol, with the number of rows matching the first number
input and the number of columns matching the second number input.
For example the three input values 3,7 and &, result in the output:
Sc. Sc Sc. Sc Sc Sc Sc
ScScScScScScSc
Sc Sc Sc Sc Sc Sc Sc
We need two variables to store the number of rows and the number of columns. We also
need a
variable to store the symbol. We need a counter for the rows and a counter for the columns.
Identifier
Explanation
NumberOfRows
NumberOfColumns
Symbol
RowCounter
ColumnCounter
INPUT NumberOfRows
INPUT NumberOfColumns
INPUT Symbol
Each time round the outer loop (counting the number of rows) we complete the inner loop,
outputting a symbol for each count of the number of columns. This type of construct is
called a nested loop.
KEYTERMS
The problem to be solved: Take seven numbers as input and store them for later use.
We could use seven separate variables. However, if we wanted our algorithm to work with
70 numbers, for example, then this would become very tedious. We can make use of a
data structure, known as a ‘linear list’ or a one-dimensional (ID) array.
This array is given an identifier, for example MyList, and each element within the array
is referred to using this identifier and its position (index) within the array. For example,
MyList [4] refers to the fourth element in the MyList array.
We can use a loop to access each array element in turn. If the numbers input to the
pseudocode algorithm below are 25,34,98,7,41,19 and 5 then the algorithm will
produce the result in Figure 11.10.
ENDFOR
Index
[1]
[2]
[3]
[4]
[5]
[6]
[7]
TASK 11.05
Set up two arrays, one for your friends’ names and one fortheir ages as shown in Figure
11.11.
Name Age
[1]
Matt
[i]
15
[2]
Fred
[2]
16
[3]
Anna
[3]
14
'•
[20]
Xenios
[20]
17
Searching a ID array
The problem to be solved: Take a number as input. Search for this number in an existing
ID array of seven numbers (see Worked Example 11.10).
Start at the first element of the array and check each element in turn until the search
value is found or the end of the array is reached. This method is called a linear search.
Identifier
Explanation
MyList
Maxlndex
SearchValue
Found
Index
146
Maxlndex 7
INPUT SearchValue
Found FALSE
Index «e- 0
REPEAT
Index Index + 1
IF MyList [Index] = SearchValue
THEN
Found TRUE
ENDIF
UNTIL FOUND = TRUE OR Index >= Maxlndex
IF Found = TRUE
THEN
ENDIF
The complex condition to the repeat. . .until loop allows us to exit the loop when
the search value is found. Using the variable Found makes the algorithm easier to
understand. Found is initialised to false before entering the loop and set to true if the
value is found.
If the value is not in the array, the loop terminates when index is greater than or equal to
Maxlndex That means we have come to the end of the array. Note that using Maxlndex
in the logic statement to terminate the loop makes it much easier to adapt the algorithm
when the array consists of a different number of elements. The algorithm only needs to
be changed in the first line, where Maxlndex is given a value.
KEY TERMS
Linear search: checking each element of an array in turn for a required value
TASK 11.06
Use the algorithm in Worked Example 11.11 as a design pattern to search for a friend’s
name
and output their age.
The simplest way to sort an unordered list of values is the following method:
1 Compare the first and second values. If the first value is larger than the second value,
swap them.
2 Compare the second and third values. If the second value is larger than the third value,
swap them.
3 Compare the third and fourth values. If the third value is larger than the fourth value,
swap them.
4 Keep on comparing adjacent values, swapping them if necessary, until the last two
values in the list have been processed.
r
Part 2
Figure 11.12 shows what happens to the values as we work down the array, following this
algorithm.
Compare
1 st Pair
Compare
2 nd Pair
Compare
3 rd Pair
Compare
4 th Pair
Compare
Compare
6 th Pair
Sorted
list
25
No
25
25
25
25
25
25
34
swap
34
No
34
34
34
34
34
98
98
swap
98
swap
98
swap
41
41
41
41
41
41
41
98
swap
19
19
19
19
19
19
19
98
swap
98
When we have completed the first pass through the entire array, the largest value is in the
correct position at the end of the array. The other values may or may not be in the correct
order.
We need to work through the array again and again. After each pass through the array the
next largest value will be in its correct position, as shown in Figure 11.13.
Original
After
After
After
After
After
After
list
pass 1
pass 2
pass 3
pass 4
pass 5
pass 6
25
25
25
34
34
7
25
19
98
34
19
19
19
41
19
25
25
25
41
19
34
34
34
34
19
5
41
41
41
41
41
98
98
98
98
98
98
In effect we perform a loop within a loop, a nested loop. This method is known as a
bubble sort. The name comes from the fact that smaller values slowly rise to the top, like
bubbles in a liquid.
The identifiers needed for the algorithm are listed in Table 11.13.
Identifier
Explanation
Maxlndex
i
Counter for outer loop
Temp
148
n Maxlndex - 1
FOR i 1 TO Maxlndex - 1
FOR j 1 TO n
THEN
Temp MyList[j]
ENDIF
ENDFOR
n «- n - 1 // this means the next time round the inner loop, we don't
// look at the values already in the correct positions.
ENDFOR
The values to be sorted may already be in the correct order before the outer loop has
been through all its iterations. Look at the list of values in Figure 11.14. It is only slightly
different from the first list we sorted.
Original
After
After
After
After
After
After
list
pass 1
pass 2
pass 3
pass 4
pass 5
pass 6
34
34
98
7
34
19
19
19
19
41
19
25
25
25
25
41
19
25
34
34
34
34
19
25
41
41
41
41
41
25
98
98
98
98
98
98
After the third pass the values are all in the correct order but our algorithm will carry on
with three further passes through the array. This means we are making comparisons when
no further comparisons need to be made.
If we have gone through the whole of the inner loop (one pass) without swapping any
values, we know that the array elements must be in the correct order. We can therefore
replace the outer loop with a conditional loop.
We can use a variable NoMoreSwaps to store whether or not a swap has taken place
during the current pass. We initialise the variable NoMoreSwaps to true When we swap a
pair of values we set NoMoreSwaps to false. At the end of the passthrough the array we
can check whether a swap has taken place.
The identifier table for this improved algorithm is shown in Table 11.14.
Identifier
Explanation
MyList[1..7]
Maxlndex
Temp
n Maxlndex - 1
REPEAT
NoMoreSwaps TRUE
FOR j e 1 TO n
THEN
Temp MyList[j]
MyList[j] MyList[j + 1]
MyList[j + 1] Temp
NoMoreSwaps FALSE
ENDIF
ENDFOR
nn-1
KEYTERMS
Bubble sort: a sort method where adjacent pairs of values are compared and swapped
Discussion Point:
What happens if the array elements are already in the correct order?
TASK 11.07
Rewrite the algorithm in Worked Example 11.12 to sort the array elements into descending
order.
A ID array is like a linear list. The nth element within the array MyList is referred to as
MyList [n].
A two-dimensional (2D) array is like a table or matrix. The element in rowxand columny
of ThisTable is referred to as ThisTable [x, y].
For example to store the value 5 in the element in the fourth row and second column, we
write:
ThisTable [4, 2] 5
When we want to access each element of a ID array, we use a loop to access each element
in turn. When working with a 2D array, we need a loop to access each row. Within each row
we need to access each column. This means we use a loop within a loop (nested loops).
Using pseudocode, the algorithm to set each element of array ThisTabie to zero is:
When we want to output the contents of a 2D array, we again need nested loops. We want
to output all the values in one row of the array on the same line. At the end of the row, we
want to output a new line.
FOR Row <- 1 TO MaxRows
ENDFOR
Identifier
Explanation
ThisTabie[1..4, 1..6]
MaxRows
MaxColumns
Row
Column
Summary
• Algorithms are expressed using the four basic constructs of assignment, sequence,
selection and
repetition.
• Logic statements use the relational operators =, <, >, <>, <= and >= and the logic
operators AND,
OR and NOT.
• Selection constructs and conditional loops use conditions to determine the steps to be
followed.
Che
Exam-style Questions
1 The Modulo-11 method of calculating a check digit for a sequence of nine digits is as
follows:
Each digit in the sequence is given a weight depending on its position in the sequence. The
leftmost digit has a weight of 10.
The next digit to the right has a weight of 9, the next one 8 and so on. Values are calculated
by multiplying each digit by its
weight. These values are added together and the sum is divided by 11. The remainder from
this division is subtracted from 11
and this value is the check digit. If this value is 10, then the check digit is X. Note that x mod
y gives the remainder from the
division of x by y.
[9]
[7]
[8]
4 Alan uses two ID arrays, userList and PasswordList. For twenty users, he stores each user
ID in UserList and the corresponding password in PasswordList. For example, the person
with user ID Fredi2 has password rzt456.
UserList PasswordList
[1]
Matt05
[i]
pqklmn4
[2]
Fredl2
[2]
rzt456
[3]
Anna 9
[3]
j edd321
[20]
Xenios4
[20]
wkl@tmp6
Alan wants to write an algorithm to check whether a user ID and password, entered by a
user,
are correct. He designs the algorithm to search UserList for the user ID. If the user ID is
found, the password stored in PasswordList is to be compared to the entered password. If
the passwords match, the login is successful. In all other cases, login is unsuccessful.
Identifier
Explanation
UserList [1..20]
ID array to store user IDs
MaxIndex
MyUserlD
MyPassword
UserldFound
UserList
TRUE if .
LoginOK
FALSE if .
TRUE if .
Index
[4]
153
Maxlndex 20
INPUT MyUserlD
INPUT MyPassword
UserldFound <- FALSE
LoginOK <- .
Index 0
REPEAT
INDEX .
IF UserList[.] = .
THEN
UserldFound TRUE
ENDIF
UNTIL . OR .
IF UserldFound = TRUE
THEN
IF PasswordList [.] =
THEN
LoginOK TRUE
ENDIF
IF .
THEN
OUTPUT "
ii
154
ENDIF
[ 10 ]
Learning objectives
Many problems that we want to solve are bigger than the ones we met in Chapter 11. To
make
it easier to solve a bigger problem, we break the problem down into smaller steps. These
might need breaking down further until the steps are small enough to solve easily.
For a solution to a problem to be programmable, we need to break down the steps of the
solution into the basic constructs of sequence, assignment, selection, repetition, input and
output.
We can use a method called stepwise refinement to break down the steps of our outline
solution into smaller steps until it is detailed enough. In Section 11.01 we looked at a recipe
fora cake. The step of mixing together all the ingredients was broken down into more
detailed steps.
KEY TERMS
Stepwise refinement: breaking down the steps of an outline solution into smaller and smaller
steps
156
The problem to be solved: Take as input a chosen symbol and an odd number. Output a
pyramid shape made up entirely of the chosen symbol, with the number of symbols in the
final row matching the number input.
For example the two input values A and 9 result in the following output:
A
AAA
AAAAA
AAAAAAA
AAAAAAAAA
This problem is similar to Worked Example 11.09 in Chapter 11, but the number of symbols
in each row starts with one and increases by two with each row. Each row starts with a
decreasing number of spaces, to create the slope effect.
Our first attempt at solving this problem using structured English is:
06 UNTIL the required number of symbols have been output in one row
This is not enough detail to write a program in a high-level programming language. Exactly
what values do we need to set?
We need as input:
• the number of symbols in the final row (for the pyramid to look symmetrical, this needs
to be an odd number).
We need to calculate how many spaces we need in the first row. So that the slope of the
pyramid is symmetrical, this number should be half of the final row’s symbols. We need
to set the number of symbols to be output in the first row to 1. We therefore need the
identifiers listed in Table 12.01.
Identifier
Explanation
Symbol
The character symbol to form the pyramid
MaxNumberOfSymbols
NumberOfSpaces
NumberOfSymbols
Using pseudocode, we now refine the steps of our first attempt. To show which step
we are refining, a numbering system is used as shown.
Remember we need an odd number for MaxNumberOf symbols. We need to make sure
the input is an odd number. So we further refine Step 01.2:
01.2.1 REPEAT
03.3 ENDFOR
04 // Output number of symbols expands into:
04.3 ENDFOR
158
In Step 05 we need to decrease the number of spaces by 1 and increase the number of
symbols by 2:
Step 06 essentially checks whether the number of symbols for the next row is now
greater than the value input at the beginning.
We can put together all the steps and end up with a solution.
01 // Set Values
01.2.1 REPEAT
01.4 NumberOfSymbols «- 1
02 REPEAT
03.3 ENDFOR
04.3 ENDFOR
TASK 12.01
Use stepwise refinement to output a hollow triangle. For example the two input values A and
9 result in the following output:
AA
AA
AA
AAAAAAAAA
06 UNTIL the required number of symbols have been output in one row
Part 2
Another method of developing a solution is to decompose the problem into sub-tasks. Each
sub-task can be considered as a ‘module’ that is refined separately. Modules are procedures
and functions.
A procedure groups together a number of steps and gives them a name (an identifier). We
can use this identifier when we want to refer to this group of steps. When we want to perform
the steps in a procedure we call the procedure by its name.
KEY TERMS
Procedure: a sequence of steps that is given an identifier and can be called to perform a
sub-task
CALL ProcedureXYZ
(a)
The rules for module identifiers are the same as for variable identifiers (see Section 11.03)
159
PROCEDURE InputMaxNumberOfSymbols
REPEAT
INPUT MaxNumberOfSymbols
UNTIL MaxNumberOfSymbols MOD 2=1
ENDPROCEDURE
PROCEDURE OutputSpaces
PROCEDURE OutputSymbols
PROCEDURE AdjustValuesForNextRow
160
TASK 12.02
Connect 4 is a game played by two players. In the commercial version shown in Figure
12.01, one player uses red tokens and the other uses black. Each player has 21 tokens.
The game board is a vertical grid of six rows and seven columns.
Columns get filled with tokens from the bottom. The players take it in turns to choose
a column that is not full and drop a token into this column. The token will occupy the
lowest empty position in the chosen column. The winner is the player who is the first
to connect four of their own tokens in a horizontal, vertical or diagonal line. If all tokens
have been used and neither player has connected four tokens, the game ends in a draw.
If we want to write a program to play this game on a computer, we need to work out the
steps required to ‘solve the problem’, that means to let players take their turn in placing
tokens and checking for a winner. We will designate our players (and their tokens) by
‘O’ and ‘X’. The game board will be represented by a 2D array. To simplify the problem,
the winner is the player who is the first to connect four of their tokens horizontally or
vertically.
Initialise board
Set up game
Display board
While game not finished
01 CALL InitialiseBoard
02 CALL SetUpGame
03 CALL OutputBoard
05 CALL ThisPlayerMakesMove
06 CALL OutputBoard
07 CALL ChecklfThisPlayerHasWon
08 IF GameFinished = FALSE
09 THEN
10 CALL SwapThisPlayer
11 ENDIF
12 ENDWHILE
Note that Steps 03 and 06 are the same. This means that we can save ourselves some
effort. We only need to define this module once, but can call it from more than one place.
This is one of the advantages of using modules.
Identifier
Explanation
InitialiseBoard
GameFinished
ThisPlayer
OutputBoard
ThisPlayerMakesMove
ChecklfThisPlayerHasWon
SwapThisPlayer
Now we can refine each procedure (module). This is likely to add some more identifiers to
our identifier table. The additional entries required are shown after each procedure.
PROCEDURE InitialiseBoard
FOR Row <s- 1 TO 6
FOR Column 1 TO 7
Board [Row, Column] <- BLANK // use a suitable value for blank
ENDFOR
ENDFOR
ENDPROCEDURE
Identifier
Explanation
Row
Column
BLANK
PROCEDURE SetUpGame
PROCEDURE OutputBoard
FOR Row <- 1 TO 6
FOR Column e 1 TO 7
PROCEDURE ThisPlayerMakesMove
an identifier. But the steps of a function are to workout a single value that is returned
from the function. This value is used in an expression.
Identifier
Explanation
ValidColumn
ThisPlayerChoosesColumn
ValidRow
FindNextFreePositionlnColumn
REPEAT
INPUT ColumnNumber
Identifier
Explanation
ColumnNumber
The column number chosen by the current player
ColumnNumberValid
Valid «- TRUE
ENDIF
ENDIF
RETURN Valid
ENDFUNCTION
Identifier
Explanation
Valid
WHILE Board [ThisRow, ValidColumn] <> BLANK // find first empty cell
ThisRow «- ThisRow + 1
ENDWHILE
RETURN ThisRow
ENDFUNCTION
Identifier
Explanation
ThisRow
PROCEDURE ChecklfThisPlayerHasWon
WinnerFound «- FALSE
CALL CheckHorizontalLinelnValidRow
IF WinnerFound = FALSE
THEN
CALL CheckVerticalLinelnValidColumn
ENDIF
IF WinnerFound = TRUE
THEN
GameFinished «- TRUE
OUTPUT ThisPlayer " is the winner"
ELSE
CALL CheckForFullBoard
ENDIF
ENDPROCEDURE
Note that the ChecklfThisPlayerHasWon procedure uses three further procedures that
we need to define.
Identifier
Explanation
WinnerFound
CheckHorizontalLinelnValidRow
Procedure to check if there is a winning horizontal
line in the row the last token was placed in
CheckVerticalLinelnValidColumn
CheckForFullBoard
PROCEDURE CheckHorizontalLinelnValidRow
FOR i <- 1 TO 4
THEN
WinnerFound «- TRUE
ENDIF
ENDFOR
ENDPROCEDURE
PROCEDURE CheckVerticalLinelnValidColumn
WinnerFound «- TRUE
ENDIF
ENDIF
ENDPROCEDURE
PROCEDURE CheckForFullBoard
BlankFound «- FALSE
ThisRow «- 0
REPEAT
ThisColumn <- 0
ThisRow <- ThisRow + 1
REPEAT
ThisColumn «- ThisColumn + 1
IF Board [ThisRow, ThisColumn] = BLANK
THEN
ENDIF
GameFinished «- TRUE
ENDIF
ENDPROCEDURE
Identifier
Explanation
BlankFound
ThisRow
PROCEDURE SwapThisPlayer
IF ThisPlayer = 'O'
THEN
ThisPlayer «- 'X'
ELSE
ThisPlayer «- 'O'
ENDIF
ENDPROCEDURE
The complete identifier table for the Connect 4 program is shown in Table 12.11.
Identifier
Explanation
Board[1..6, 1..7]
InitialiseBoard
SetUpGame
GameFinished
ThisPlayer
ThisPlayerMakesMove
ChecklfThisPlayerHasWon
SwapThisPlayer
Row
Column
BLANK
ValidColumn
ThisPlayerChoosesColumn
ValidRow
FindNextFreePositionlnColumn
ColumnNumberValid
Valid
Part 2
ThisRow
WinnerFound
CheckHorizontalLinelnValidRow
CheckVerticalLinelnValidColumn
CheckForFullBoard
BlankFound
ThisRow
ThisColumn
Function: a sequence of steps that is given an identifier and returns a single value; function
call is part
of an expression
Note that some of the identifiers in Table 12.10 are for variables that are used only within a
single module. We call such a variable a local variable (see Chapter 14, Section 14.03). In
Table
12.10, the local variables are highlighted. The other variables in Table 12.10 are used by
several
sub-tasks. Variables available to all modules are known as global variables (see Chapter 14,
Section 14.03).
KEY TERMS
Local variable: a variable that is only accessible within the module in which it is declared
Global variable: a variable that is accessible from all modules
An alternative approach to modular design is to choose the sub-tasks and then construct a
structure chart to show the interrelations between the modules. Each box of the structure
chart represents a module. Each level is a refinement of the level above.
A structure chart also shows the interface between modules, the variables. These variables
are referred to as ‘parameters’. A parameter supplying a value to a lower-level module is
shown as a downwards pointing arrow. A parameter supplying a new value to the module at
the next higher level is shown as an upward pointing arrow.
KEY TERMS
Figure 12.02 shows a structure chart for a module that calculates the average of two
numbers. The top-level box is the name of the module, which is refined into the three sub¬
tasks of Level 1. The input numbers (parameters Numberl and Number2) are passed into
the ‘Calculate Average’ sub-task and then the Average parameter is passed into the
‘OUTPUT
Average’ sub-task. The arrows show how the parameters are passed between the modules.
This parameter passing is known as the ‘interface’.
Level 0
interface
Level 1
Figure 12.02 Structure chart for a module that calculates the average of two numbers
TASK 12.03
Draw a structure chart for the following module: Input a number of km, output the equivalent
number of miles.
Structure charts can also show control information: selection and repetition.
The simple number-guessing game that was introduced in Chapter 11 (Section 11.05) could
be modularised and presented as a structure chart, as shown in Figure 12.03.
Figure 12.03 Structure chart for number-guessing game with only one guess allowed
Amend the structure chart for the number-guessing game (Figure 12.03) to include repeated
guesses until the player guesses the secret number. The output should include the number
of
guesses made.
TASK 12.05
Draw a structure chart for the following problem: A user attempts to log on with a user ID.
User
IDs and passwords are stored in two ID arrays (lists). The algorithm searches the list of user
IDs and looks up the password in the password list. The user is given three chances to input
the correct password, if the correct password is entered, a suitable message is output. If the
third attempt is incorrect, a warning message is output.
Structure charts help programmers to visualise how modules are interrelated and how
they interface with each other. When looking at a larger problem this becomes even more
important. Figure 12.05 shows a structure chart for the Connect 4 program. It uses the
following symbols:
• An arrow with a solid round end •-► shows that the value transferred is a flag (a
Boolean value).
• A double-headed arrow -o-► shows that the variable value is updated within
the module.
Let’s look at the pyramid problem again (Figure 12.04). In Worked Example 12.02, a
modular
solution was created without using a structure chart and all variables were global. Now
we are going use local variables and parameters. The reason for using local variables and
parameters is that modules are then self-contained and any changes to variables do not
have
accidental effects on a variable value elsewhere.
The top-level module, Pyramid, calls four modules. When a module is called, we supply
the parameters in parentheses after the module identifier. This gives the following
pseudocode:
MODULE Pyramid
CALL InputMaxNumberOfSymbols
REPEAT
INPUT MaxNumberOfSymbols
UNTIL MaxNumberOfSymbols MOD 2=1
ENDPROCEDURE
Discussion Point:
The full rules of Connect 4 are that a diagonal of four tokens also is a winning line. Where
in Figure 12.05 should the module to check for a diagonal be added? What parameters
are required for this module? Does this additional module require further stepwise
refinement?
Summary
• Stepwise refinement involves breaking down the steps of an outline solution into smaller
and
smaller steps (sub-tasks).
• Stepwise refinement is used to produce a solution that can be stated in terms of the four
basic
constructs of sequence, assignment, selection and repetition.
• A procedure is a sequence of steps that are given an identifier. A procedure can be called
whenever this sequence of steps should be followed.
• A function is a sequence of steps that are given an identifier. This sequence of steps
results in a single
value that is returned from the function. A function call is part of an expression or
assignment.
• Local variables are variables that are used within a single module.
• Global variables are variables that are used throughout the solution.
• A structure chart shows the interface between modules: parameters passed between the
calling
module and the module being called.
• Structure charts show selection, where a module is called only under certain conditions.
Exam-style Questions
1 A random number generator is to be tested to see whether all numbers within the range 1
to
20 are generated equally frequently. The structured English version of the algorithm is
frequency)
Identifier
Explanation
Tallyfl. .20]
RandomNumber
NumberOfTests
ExpectedFrequency
Part 2
• The cards are placed face down in random order as an 8 x 8 grid pattern.
• When it is a player’s turn, the player chooses two cards and turns them face up so the
pictures show.
• If the pictures are the same, the player takes the pair of cards, gains a point and has
another go.
• If the pictures are not the same, the cards are turned face down again.
• All players can see the up-turned pictures and memorise their grid positions.
• after the input of two sets of co-ordinates shows the chosen cards for a short time
• outputs the number of points for both players when there are no more cards (the game has
finished).
For the purpose of this algorithm:
Identifier
Explanation
Grid[1..8, 1..8]
ThisPlayer
GameEnd
xl, yl
The co-ordinate pairs of the two cards chosen by the current player
x2, y2
[5]
[ 12 ]
Top-level algorithm:
01 CALL SetUpEmptyGrid
02 CALL RandomlyDistributeCards
03 CALL SetUpPlayers
06 CALL GetPlayersCoordinates
07 CALL DisplayGrid
08 CALL TestForMatch
09 CALL TestForEndGame
11 CALL OutputResultS
a What is the name given to the method of breaking the above steps down into smaller
steps?
PROCEDURE SetUpEmptyGrid
FOR i M TO 8
FOR j t 1 TO 8
ENDFOR
ENDFOR
ENDPROCEDURE
174
PROCEDURE RandomlyDistributeCards
FOR Number f- 1 TO 32
CALL GetEmptyGridPosition
CALL GetEmptyGridPosition
ENDFOR
ENDPROCEDURE
PROCEDURE GetEmptyGridPosition
REPEAT
ENDPROCEDURE
PROCEDURE SetUpPlayers
ThisPlayer 1
ENDPROCEDURE
[i]
PROCEDURE GetPlayersCoordinates
REPEAT
INPUT xl, yl
CALL DisplayGrid
REPEAT
INPUT x2, y2
// check grid position has a card and is not in the same position as first card
Part 2
PROCEDURE DisplayGrid
FOR i f- 1 TO 8
FOR j <— 1 TO 8
OUTPUT .
ELSE
THEN
OUTPUT .
OUTPUT .
ENDIF
ENDIF
ENDFOR
ENDFOR
ENDPROCEDURE
PROCEDURE TestForMatch
THEN
Grid[xl, yl] .
// increment points
Points [ThisPlayer] <—
ELSE
CALL SwapPlayers
ENDIF
ENDPROCEDURE
PROCEDURE SwapPlayers
ENDPROCEDURE
PROCEDURE TestForEndGame
GameEnd .
ENDIF
ENDPROCEDURE
PROCEDURE OutputResults
ENDPROCEDURE [18]
Chapter 13
Learning objectives
Fart 2
Chapters 11 and 12 introduced the concept of solving a problem and representing a solution
using a flowchart, pseudocode or a structure chart. We expressed our solutions using the
basic constructs: assignment, sequence, selection, iteration, input and output.
To write a computer program, we need to know the syntax of these basic constructs in our
chosen programming language. This chapter introduces syntax for Python, Visual Basic
console mode and Pascal/Delphi console mode.
You only need learn to program in one of the three languages covered in this book.
However,
you should be able to recognise the basic control structures in a high-level language other
than the one chosen to be studied in depth. So do read the sections covering the other two
programming languages.
Python
Python was conceived by Guido van Rossum in the late 1980s. Python 2.0 was released in
2000 and Python 3.0 in 2008. Python is a multi-paradigm programming language. It fully
supports both object-oriented programming and structured programming. Many other
paradigms, including logic programming, are supported using extensions. These paradigms
are covered in Chapters 26,27 and 29.
The Python programs in this book have been prepared using Python 3 (see www.python.org
for a free download) and Python’s Integrated Development Environment (IDLE).
• Python is case sensitive: the identifier Numben is seen as different from numberi or
NUMBERl.
• Code makes extensive use of a concept called ‘slicing’ (see Section 13.08).
• Programs are interpreted (see Chapter 7, Section 7.05 for information on interpreted and
compiled programs).
You can type a statement into the Python Shell and the Python interpreter will run it
immediately (see Figure 13.01).
177
You can also type program code into a Python editor (such as IDLE), save it with a .py
extension and then run the
program code from the Run menu in the editor window (see Figure 13.02).
Python Shell
-=JLB]-2S|
Help
(AMD64)3 on
-—--——
»>
Hello World!
»>
|l n: 6 [colM
Figure 13.02 (a) A saved program in the Python editor window and (b) running in the Python
shell
178
The Visual Basic programs in this book have been prepared using Microsoft Visual Basic
2010 Express Console
Application. (Free download available from www.visualstudio.com/products/visual-studio-
express-vs)
• Every statement should be on a separate line. Statements can be typed on the same line
with a colon (:) as a separator. However, this is not recommended.
• VB.NET is not case sensitive. Modern VB.NET editors will automatically copy the case
from the first definition of an identifier.
• The convention is to use CamelCaps (also known as PascalCaps) for identifiers and
keywords.
• Programs need to be compiled (see Chapter 7, Section 7.05 for information on interpreted
and compiled programs).
You type your program code into the Integrated Development Environment (IDE) as shown
in
Figure 13.03 (a), save the program code and then click on the Run button (). This invokes
the compiler. If there are no syntax errors the compiled program will then run. Output will be
shown in a separate console window (see Figure 13.03 (b)).
Part 2
(a) .jsi x|
3 Error List |
Ready Ln 7
Col 1
Ch l
INS ^
Figure 13.03 (a) A saved program in the VB.NET editor and (b) running in the program
execution (console) window
Note that the console window shuts when the program has finished execution. To keep it
open, so you can seethe output, the last statement of your program should be
Console.ReadLineO (see Figure 13.03(a)).
Designed by Niklaus Wirth as a small and efficient language, Pascal was intended to
encourage good programming practice using structured programming. Pascal was
published in 1970. A derivative known as Object Pascal for object-oriented programming
was
developed in 1985. Delphi was originally developed by Borland and uses Object Pascal.
Since
2008, Delphi has been owned by Embarcadero Technologies.
The Pascal programs in this book have been prepared using Borland Delphi 7 Console
Application. Other Pascal/Delphi IDEs will work in a similar way (for example, the free
version
from www.lazarus.freepascal.org).
• Every statement ends with a semicolon (;). More than one statement can go on a single
line, but this is not recommended.
• The convention is to use CamelCaps (also known as PascalCaps) for identifiers and lower
case for keywords.
• Whenever Pascal syntax requires a statement, a compound statement can be used. For
an example see Table 13.28.
• Programs need to be compiled (see Chapter 7, Section 7.05 for information on interpreted
and compiled programs).
You type your program statements into the Integrated Development Environment (IDE) as
shown in Figure 13.04 (a), save the program code and then click on the Run button
( !►! ). This invokes the compiler. If there are no syntax errors the compiled program code
will
then run. Output will be shown in a separate (console) window (see Figure 13.04 (b).
179
180
PIT
-JGl-X)
Project j
«- -
program Projects;
{$APPTYPE CONSOLE}
uses
SysUtils;
jlU
begin
end.
r i: i r
j Insert |3
Figure 13.04 (a) A Pascal program in the Delphi editor and (b) running in the program
execution (console) window
Note that the console window shuts when the program has finished execution. To keep it
open, so you can see the output, the last statement of your program should be ReadLn; (see
Figure 13.04(a)).
Declaration of variables
Most programming languages require you to declare the type of data to be stored in a
variable, so the correct amount of memory space can be reserved by the compiler. A
variable
declared to store a whole number (integer) cannot then be used to store alphanumeric
characters (strings) or vice versa. Pascal and VB.NET require variables to be declared
before
they are used.
Python handles variables differently to most programming languages. It tags values. This is
why Python does not have variable declarations. However, it is good programming practice
to include a comment about the variables you are planning to use and the type of data you
will store in them.
DECLARE :
// alphanumeric characters
Part 2
Syntax definitions
Python
VB.NET
Dim [ / identifier] As
Pascal
var [, cidentifier>] : ;
Code examples
Python
VB.NET
Pascal
Sometimes we use a value in a solution that never changes, for example, the value of the
mathematical constant pi (jt). Instead of using the actual value in program statements, it is
good practice and helps readability, if we give a constant value a name and declare it at the
beginning of the program.
CONSTANT =
For example:
CONSTANT Pi = 3.14
Syntax definitions
Python
VB.NET
Const =
Pascal
Const = ;
Code examples
Python
PI = 3.14
Python convention is to write constant
identifiers using all capital letters. The values
can be changed, although you should treat
constants as not changeable.
VB.NET
Const Pi = 3.14
Pascal
Const Pi = 3.14;
Assignment of variables
Once we have declared a variable, we can assign a value to it (See Chapter 11, Section
11.04).
In pseudocode, assignment statements are written as:
<-
Python
cidentifier> =
A = 34
B=B+1
VB.NET
A = 34
B=B+1
:= ;
A := 34;
B := B + 1;
VB.NET allows you to initialise a variable as part of the declaration statement, for example:
Arithmetic operators
Assignments don’t just give initial values to variables. We also use an assignment when we
need to store the result of a calculation. The arithmetic operators used for calculations are
shown in Table 13.01.
Operation
Pseudocode
Python
VB.NET
Pascal
Addition
Subtraction
Multiplication
Division
Exponent
**
Not available
Integer division
DIV
//
Div
Modulus
MOD
Mod
Mod
When more than one operator appears in an expression, the order of evaluation depends
on the mathematical rules of precedence: parentheses, exponentiation, multiplication,
division, addition, subtraction.
Part 2
KEYTERMS
Question 13.01
4*3-3A2
(43-3) A 2
4 (3 - 3) A 2
4 * (3 - 3 A 2)
I OUTPUT
OUTPUT cidentifier(s)>
When outputting text and data to the console screen, we can list a mixture of output strings
and variable values in the print list.
Syntax definitions
jj
Python
print ()
print(, end = ")
VB.NET
Console.WriteLine()
Console.Write ()
Pascal
WriteLn() ;
Write();
183
Code examples
In the examples below, the print list consists of four separate items:
YourName and Numberl are variables, for which we print the value.
In pseudocode, we can indicate whether a new line should be output at the end by a
comment at the end of the statement.
Python
Console.Write ("Hello")
Pascal
In the code examples above you can see how output statements can bespread over more
than
one line when they are very long. You must break the line between two print list items. You
cannot break in the middle of a string, unless you make the string into two separate strings.
In Python and VB.NET you can also use the placeholder method for output: the variables
to be printed are represented by sequential numbers in {} in the message string and the
variables are listed in the correct order after the string, separated by commas:
Python
VB.NET
When coding an input statement, it is good practice to prompt the user as to what they are
meant to enter. For example, consider the pseudocode statement:
Note the space between the colon and the closing quote. This is significant. It gives a space
before the user types their input.
Part 2
VB.NET
A = Console. ReadLineO
Pascal
ReadLn(A);
Comments
Python
this is a comment
this is another comment
VB.NET
// this is a comment
// this is another comment
Pascal
// this is a comment
TASK 13.01
Use the IDE of your chosen programming language (in future just referred to as ‘your
language’). Type the program statements equivalent to the following pseudocode (you may
need to declare the variable YourName first):
Save your program as Examplel and then run it. Is the output as you expected?
Every programming language has built-in data types. Table 13.02 gives a subset of those
available. The number of bytes of memory allocated to a variable of the given type is given
in
bracketsforVB.NET and Pascal.
Description of data
Pseudocode
Python
VB.NET
Pascal
int
Integer (4 bytes)
Integer (4 bytes)
REAL
float
Single (4 bytes)
Double (8 bytes)
Real (8 bytes)
A single alphanumeric
character
CHAR
Not available
char (2 bytes -
Unicode)
Char (1 byte-ASCII)
A sequence of
alphanumeric
characters (a string)
STRING
Logical values:
True (represented as 1)
and
False (represented as 0)
BOOLEAN
bool
Boolean (2 bytes)
Boolean (1 byte)
See Chapter 1 (Sections 1.02 and 1.03) on how integers and characters are represented
inside the computer. Chapter 16 (Section 16.03) covers the internal representation of real
(single, double, float) numbers.
The string data type is known as a structured type because it is essentially a sequence
of characters. A special case is the empty string: a value of data type string, but with no
characters stored in it. In VB.NET, each character in a string requires two bytes of memory
and each character is represented in memory as Unicode (in which, the values from 1 to
127 correspond to ASCII).
In Pascal, a string occupies as many bytes as its maximum length plus one. The first byte
contains the current length of the string and the following bytes contain the characters of the
string (stored as ASCII). Because the largest unsigned integer that can be stored in a byte is
255, the maximum length of a string is 255 characters.
Date and currency have various internal representations but are output in conventional
format (except in Pascal where you have to do a string conversion for dates).
Part 2
Description of
data
Pseudocode
Python
VB.NET
Pascal
Date value
DATE
Not available
as a built-in
data type
Date (8 bytes)
TDateTime (8 bytes)
Monetary value
CURRENCY
Not available
Decimal
(16 bytes)
Currency (8 bytes)
In Python, date and currency are not available as built-in data types. A date is stored as the
number of days after 1/1/0001, using the datetime class (see Section 13.08). For currency,
use float.
VB.NET stores dates and times from 1.1.0001 (0 hours) to 31.12.9999 (23:59:59 hours) with
a
resolution of 100 nanoseconds (this unit is called a ‘tick’). Floating-point (decimal) numbers
are stored in binary-coded decimal format (see Section 1.02).
Pascal stores dates and times internally as a real number: the whole number part represents
the days since 30/12/1899 and the fractional part represents the part of the day that has
elapsed (time). Currency values are stored internally as a scaled and signed 64-bit integer
with the least significant four digits implicitly representing four decimal places.
There are many more data types. Programmers can also design and declare their own data
types (see Chapter 16 (Section 16.01) and Chapter 26 (Section 26.02).
TASK 13.02
1 Look at the identifier tables in Chapter 11 (Tables 11.02 and 11.04 to 11.12). Decide which
data type from your language is appropriate for each variable listed.
2 Write program code to implement the pseudocode from Worked Example 11.01 in Chapter
11.
In Chapter 11 (Section 11.05), we covered logic statements. These were statements that
included a condition. Conditions are also known as Boolean expressions and evaluate to
eitherTrue or False. True and False are known as Boolean values.
Simple Boolean expressions involve comparison operators (Table 13.04). Complex Boolean
expressions also involve Boolean operators (Table 13.05).
Operation
Pseudocode
Python
VB.NET
Pascal
equal
==
=
not equal
<>
i=
<>
<>
greater than
less than
<
<
<
<
<=
<=
<=
<=
Operation
Pseudocode
Python
VB.NET
Pascal
AND
and
And
AND
OR (logical inclusion)
OR
or
Or
OR
NOT
not
Not
NOT
13.05 Selection
IF...THEN statements
IF
THEN
<statement(s)>
ENDIF
Syntax definitions
Python
if <Boolean expressions
<statement (s)>
VB.NET
If Then
<statement (s) >
End If
Pascal
if
then
Code examples
Pseudocode example:
if x < o
THEN
OUTPUT "Negative"
ENDIF
Python
if x < 0:
print ("Negative")
VB.NET
If x < 0 Then
End If
Pascal
if x < 0
then
WriteLnC Negative');
Ch
TASK 13.03
Write program code to implement the pseudocode from Worked Example 11.03 in Chapter
11.
IF...THEN...ELSE statements
IF
THEN
<statement (s)>
ELSE
<statement(s)>
Syntax definitions
Python
if <Boolean expressions
<statement (s)>
else:
<statement(s)>
VB.NET
If Then
<statement(s)>
Else
<statement(s)>
End If
Pascal
if
then
else
the else.
I Code examples
I Pseudocode example:
f if x < o
THEN
OUTPUT "Negative"
ELSE
OUTPUT "Positive"
ENDIF
190
Python
if X < 0:
print ("Negative")
else:
print ("Positive")
VB.NET
If x < 0 Then
Else
Console.WriteLine ("Positive")
End If
Pascal
if x < 0
then
else
WriteLn ( 1 Positive');
Nested IF statements
In pseudocode, the nested IF statement is written as:
IF
THEN
ELSE
IF
THEN
<statement(s)>
ELSE
<statement(s)>
ENDIF
ENDIF
Syntax definitions
Python
if <Boolean expressions
<statement(s)>
elif <Boolean expressions
<statement (s) >
else:
VB.NET
If Then
<statement(s)>
Elself
Else
<statement(s)>
End If
Pascal
if
then
else
if
then
else
Code examples
Pseudocode example:
if x < o
THEN
OUTPUT "Negative"
ELSE
IF x = 0
THEN
OUTPUT "Zero"
ELSE
OUTPUT "Positive"
ENDIF
ENDIF
Python
if x < 0:
printp'Zero")
else:
print ("Positive")
VB.NET
If x < 0 Then
Elself x = 0 Then
Else
End If
Pascal
if x < 0
then
WriteLn( 'Negative')
else
if x = 0
then
WriteLn( 'Zero')
else
TASK 13.04
Write program code to implement the pseudocode from Worked Example 11.02 in Chapter
11.
192
CASE statements
An alternative selection construct is the CASE statement. Each considered CASE condition
can be:
• a single value
• a range.
CASE OF
: <statement(s)>
/ : <statement(s) >
TO : <statement(s) >
OTHERWISE <statement(s)>
ENDCASE
The value of determines which statements are executed. There can be as many
separate cases as required. The otherwise clause is optional and useful for error trapping.
Syntax definitions
Python
Python does not have a CASE statement. You need to use nested If
statements instead.
VB.NET
Select Case
Case valuel
<statement(s)>
Case value2,value3
<statement(s)>
<statement(s)>
Case Else
<statement(s) >
End Select
Pascal
case of
valuel: ;
value2, value3: ;
value4..value5: ;
else
end;
Part 2
Code examples
In pseudocode, an example CASE statement is:
CASE OF Grade
OTHERWISE
ENDCASE
Python
if Grade == "A":
else:
VB.NET
Case "A"
Case "F'V'U"
Case Else
Pascal
case Grade of
'FVU* : WriteLn('Fail');
'B'.-'E' : WriteLn('Pass');
else
WriteLnCInvalid grade');
end;
TASK 13.05
The problem to be solved: the user enters the number of the month and year. The output
is the number of days in that month. The program has to check if the year is a leap year for
February.
INPUT MonthNumber
INPUT Year
Days <- 0
CASE OF MonthNumber
ENDIF
ENDCASE
OUTPUT Days
13.06 Iteration
ENDFOR
The control variable starts with value s, increments by value i each time round the loop and
finishes when the control variable reaches the value e.
Syntax definitions
Python
VB.NET
Next
cstatement>;
Code examples
Python
Output: 0 12 3 4
Output: 2 5 8 11
Output: 5 4 3 2
Output: abc
VB.NET
For x = 1 To 5
Next
Output: 1 2 3 4 5
For x = 2 To 14 Step 3
Next
Output: 2 5 8 11 14
For x = 5 To 1 Step -1
Next
Output: 5 4 3 2 1
Next
Output:
1.5
2.5
Console.Write (x)
Next
Pascal
for x := 1 to 5 do
write (x);
Output: 1 2 3 4 5
for x := 5 downto 1 do
write (x);
Output: 5 4 3 2 1
write (x);
Output: abc
TASK 13.06
1 Write program code to implement the pseudocode from Worked Example 11.05 in Chapter
11.
2 Write program code to implement the pseudocode from Worked Example 11.08 in Chapter
11.
3 Write program code to implement the pseudocode from Worked Example 11.09 in Chapter
11.
196
Post-condition loops
A post-condition loop, as the name suggests, executes the statements within the loop at
least once. When the condition is encountered, it is evaluated. As long as the condition
evaluates to False the statements within the loop are executed again. When the condition
evaluates to True, execution will go to the next statement after the loop.
When coding a post-condition loop, you must ensure that there is a statement within the
loop that will at some point change the end condition to True. Otherwise the loop will
execute forever.
REPEAT
<statement(s)>
UNTIL
Syntax definitions
Python
VB.NET
DO
Loop Until
Pascal
repeat
<statement(s)>;
until ;
Code examples
Pseudocode example:
REPEAT
VB.NET
Do
Pascal
repeat
ReadLn (Answer) ;
TASK 13.07
1 Write program code to implement the pseudocode from Worked Example 11.04 in Chapter
11.
2 Write program code to implement the first algorithm from Worked Example 11.06 in
Chapter 11.
Pre-condition loops
Pre-condition loops, as the name suggests, evaluate the condition before the statements
within the loop are executed. Pre-condition loops will execute the statements within the loop
as long as the condition evaluates to True. When the condition evaluates to False, execution
will go to the next statement after the loop. Note that any variable used in the condition must
not be undefined when the loop structure is first encountered.
When coding a pre-condition loop, you must ensure that there is a statement within the loop
that will at some point change the value of the controlling condition. Otherwise the loop will
execute forever.
WHILE
<statement (s)>
ENDWHILE
Syntax definitions
Python
while :
<statement(s)>
VB.NET
Do While
<statement(s) >
Loop
Do Until
Loop
Pascal
while do
198
Code examples
Pseudocode example,
Answer nu
WHILE Answer <> "Y"
Python
Answer = ''
Loop
Answer = ""
Loop
Pascal
Answer :=
begin
ReadLn (Answer) ;
end;
while x < 10 do
x := x + 1;
TASK 13.08
Write program code to implement the second algorithm from Worked Example 11.06 in
Chapter 11.
Which loop structure to use?
If you know how many times around the loop you need to go when the program execution
gets to the loop statements, use a count-controlled loop. If the termination of the loop
depends on some condition determined by what happens within the loop, then use a
conditional loop. A pre-condition loop has the added benefit that the loop may not be
entered at all, if the condition does not require it.
13.07 Arrays
Traditionally, an array is a static data structure. This means the array is declared with a
specified number of elements of one specified data type and this does not change after
compilation. However, many programming languages now allow an array to be dynamic.
This
means the array can grow in size if required.
Part 2
Creating ID arrays
When we write a list on a piece of paper and number the individual items, we would normally
start the numbering with 1. You can view a ID array like a numbered listofitems.VB.NET
and Python number array elements from 0 (the lower bound). Depending on the problem to
be solved, it might make sense to ignore element 0. Pascal allows you to choose your lower
bound to be any integer. The upper bound is the largest number used for numbering the
elements of an array.
Syntax definitions
Python
VB.NET
Dim () As *
Pascal
Pseudocode example:
DECLARE Listl
DECLARE List2
ARRAY [1:3]
ARRAY [0:5]
OF STRING // 3
OF INTEGER // 6
DECLARE
DECLARE
Python
Listl = []
List2 = [0, 0, 0, 0, 0, 0]
AList = [""] * 26
VB.NET
Pascal
199
200
Accessing ID arrays
Code examples
Pseudocode example:
Python
NList[24] = 0
VB.NET
NList (25) = 0
Pascal
NList [24] := 0
AList['D'] := 'D'
In Python, you can print the whole contents of a list using print (List). In VB.NET and
Pascal, you need to use a loop to print one element of an array at a time.
TASK 13.09
1 Write program code to implement the pseudocode from Worked Example 11.10 in Chapter
11.
2 Write program code to implement the pseudocode from Worked Example 11.11 in Chapter
11.
3 Write program code to implement the improved algorithm from Worked Example 11.12 in
Chapter 11.
Creating 2D arrays
When we write a table of data (a matrix) on a piece of paper and want to refer to individual
elements of the table, the convention is to give the row number first and then the column
number. When declaring a 2D array, the number of rows is given first, then the number of
columns. Again we have lower and upper bounds for each dimension. VB.NET and Python
number all elements from 0.
DECLARE : ARRAY [: /
:] OF
Syntax definitions
Python
VB.NET
Pascal
Code examples
To declare a 2D array to represent a game board of six rows and seven columns, the
pseudocode statement is:
■MM
Part 2
Python
VB.NET
Pascal
Board
[[ 0 ,
0,
0,
o,
0,
0,
0 ],
[0,
o.
0,
0,
0,
0,
0 ],
[0,
o,
0,
0,
o,
0,
0 ],
[0,
0,
0,
0,
0,
0,
0 ],
[0,
0,
o,
o,
0,
0,
0 ],
[0,
o,
0,
0,
0,
0,
0 ]]
Board = [[0] 7] 6
Accessing 2D arrays
A specific element in a table is accessed using an index pair. In pseudocode this is written
as:
[x, y]
Code examples
Pseudocode example:
The following code examples demonstrate howto access elements in each of the three
languages.
Python
VB.NET
Board(3, 4) = 0
Pascal
Board[3, 4] := 0
When the array was
declared, the elements were
numbered from 1.
TASK 13.10
Write program code to implement the pseudocode from Worked Example 11.13; first
initialise
the table and then output its contents.
202
Programming environments provide many built-in functions. Some of them are always
available to use; some need to be imported from specialist module libraries.
Discussion Point:
Investigate your own programming environment and research other library routines.
Description
Pseudocode
Python
VB.NET
Pascal
Access a single
character using
its position P in a
string
ThisString[P]
Counts from 1
ThisString [P]
Counts from 0
ThisString(P)
Counts from 0
ThisString [P]
Counts from 1
Return the
character
associated with
the specified
character code
CHAR(i)
chr (i)
Chr (i)
Chr (i)
Return an
integer value
representing the
character code
of the specified
character
ASCII (ch)
ord(ch)
Asc(ch)
Ord(ch)
Return an integer
that contains
the number of
characters in
strings
LENGTH (S)
len(S)
len(S)
Length(S)
Return a substring
of length Lfrom
the left of strings
LEFT(S, L)
S [0 :L]
Seethe next
section, on
slicing
Left(S, L)
LeftStr(S, L)
Return a substring
of length l from
the right of string
s
RIGHT (S, L)
S[-L:]
Seethe next
section, on
slicing
Right(S, L)
RightStr(S, L)
Return a substring
of length Lfrom
position p in
strings
MID (S, P, L)
S[P : P + L]
See the next
section, on
slicing
mid(S, P, L)
MidStr(S, P, L)
Join strings
s = SI + S2
s = SI + S2
s = SI Sc S2
S := SI + S2;
Part 2
Slicing in Python
In Python a subsequence of any sequence type (e.g. lists and strings) can be created using
‘slicing’.
A slice is a substring of a string. For example, to get a substring of length Lfrom position P in
strings we write S[P: P + L].
[0]
[l]
[2]
ThisString
[3][4]
[5]
[6]
If you imagine the numbering of each element to start at the left-hand end (as shown in
Figure 13.05), then it is easier to see how the left element (the lower bound) is included, but
the right element (the upper bound) is excluded. Table 13.07 shows some other useful slices
in Python.
Expression
Result
Explanation
ThisString [2:]
CDEFG
ThisString [:2]
AB
ThisString[-2:]
FG
ThisString [:-2]
ABODE
Rounding numbers
Sometimes we need to round numbers after a calculation involving real numbers. Rounding
is
done away from zero. This means that 0.5 is rounded to 1 and -0.5 is rounded to -1.0.
Python
VB.NET
Pascal
Round (x)
Truncating numbers
Instead of rounding, sometimes we just want the whole number part of a real number.
This is known as ‘truncation’.
Python
int (x)
VB.NET
Pascal
Trunc(x)
Sometimes a whole number may be held as a string. To use such a number in a calculation,
we first need to convert it to an integer. For example, these functions return the integer value
5 from the string "5":
204
Python
int(S)
VB.NET
Clnt(S)
Pascal
StrToInt(S)
Sometimes a number with a decimal point may be held as a string. To use such a number in
a
calculation, we first need to convert it to a real (float). For example, these functions return
the
real number 75.43 from the string "75.43":
Python
float (x)
The returned value is a floating-point
number.
VB.NET
CDbl(S)
Pascal
StrToFloat (S)
When we want to present output in a tabulated way, we need to format the output
statement.
Python
VB.NET
Pascal
Write (:W:D) ;
WriteLn(Pi:5:2);
Random numbers are often required for simulations. Most programming languages have
various random number generators available. As the random numbers are generated
through a program, they are referred to as ‘pseudo-random’ numbers.
Python
VB.NET
Dim x As Integer
x = RandomNumber. Next (1, 6)
Pascal
Random (6)
The simplest option returns a random
number between 0 (inclusive) and
6 (exclusive).
Randomize;
RandomRange(l, 6)
TASK13.il
2 Write program code to implement the pseudocode using a pre-condition loop from
Worked Example 11.07 in Chapter 11.
Sometimes we want to work with the current time and date. The system clock can provide
this. There are many functions available to manipulate dates and times. Most are beyond the
scope of this book. Here are just a few basic functions.
205
Python
print (SomeDate)
print (Today)
SomeDate = SomeDate +
timedelta(1)
1 day.
VB.NET
SomeDate = #3/15/2015 #
Today = Now()
SomeDate =
number of days.
Pascal
TDateTime;
StrToDate(DateString);
Today := Date();
DateString :=
DateToStr (SomeDate) ;
WriteLn(DateString) ;
DateString := DateToStr(Today);
WriteLn (DateString);
SomeDate := SomeDate + 1;
TASK 13.12
Write program code to get today’s date from the system clock and output it with a suitable
message. Also output tomorrow’s date with a suitable message. Will your program give the
correct information, regardless of which day it is executed?
Discussion Point:
What other useful functions can you find? Which module libraries have you searched?
A text file consists of a sequence of characters formatted into lines. Each line is terminated
by
an end-of-line marker. The text file is terminated by an end-of-file marker.
Note: you can check the contents of a text file (or even create a text file required by a
program) by using a text editor such as NotePad.
The following code examples demonstrate how to open, write to and close a file called
sampieFiie.TXT in each of the three languages. If the file already exists, it is overwritten as
soon as the file handle is assigned by the ‘open file’ command.
Python
FileHandle. close ()
VB.NET
FileHandle = New
IO.StreamWriter ("SampleFile.TXT")
FileHandle. WriteLine (LineOfText)
FileHandle.CloseO
Pascal
WriteLn(FileHandle, LineOfText) ;
CloseFile (FileHandle);
208
An existing file can be read by a program. The following pseudocode statements provide
facilities for reading from a file:
The following code examples demonstrate how to open, read from and close a file called
SampieFiie.TXT in each of the three languages.
Python
LineOfText = FileHandle.readline()
FileHandle. close
VB.NET
1. StreamReader
1. StreamReader ("SampieFiie.TXT")
LineOfText = FileHandle.ReadLineO
FileHandle. Close ()
Pascal
ReadLn(FileHandle, LineOfText);
CloseFile (FileHandle);
Sometimes we may wish to add data to an existing file rather than creating a new file. This
can be done in Append mode. It adds the new data to the end of the existing file.
Part 2
Python
FileHandle. close ()
VB.NET
FileHandle = New
IO.StreamWriter("SampleFile.TXT", True)
FileHandle.Close ()
Pascal
AssignFile(FileHandle, 'SampleFile.TXT') ;
Append (FileHandle) ;
WriteLn(FileHandle, LineOfText) ;
CloseFile (FileHandle);
If we want to read a file from beginning to end we can use a conditional loop. Text files
contain a special marker at the end of the file that we can test for. Testing for this special
end-
of-file marker is a standard function in programming languages. Every time this function is
called it will test for this marker. The function will return FALSE if the end of the file is not yet
reached and will return TRUE if the end-of-file marker has been reached.
In pseudocode we call this function eof(). We can use the construct repeat. . .until eof().
If it is possible that the file contains no data, it is better to use the construct while not
EOF().
For example, the following pseudocode statements read a text file and output its contents:
CLOSEFILE "Test.txt"
The following code examples demonstrate how to output the contents of a file in each of the
three languages.
Python
LineOfText = FileHandle.readline()
while len(LineOfText) > 0:
LineOfText = FileHandle.readline()
print (LineOfText)
FileHandle. close
VB.NET
Dim LineOfText As String
FileHandle = New
Do Until FileHandle.EndOfStream
Loop
FileHandle. Close ()
Pascal
Reset (FileHandle) ;
while not EoF(Filehandle) do
begin
ReadLn(FileHandle, LineOfText);
WriteLn (LineOfText) ;
end;
CloseFile (FileHandle);
Exam-style Questions
1 Matt wants a program to output a conversion table for ounces to grams (1 ounce is 28.35
grams). He writes an algorithm:
Write program code to implement the algorithm. Include formatting, so that the output is
tabulated. [7]
2 Write program code to accept an input string UseriD. The program is to test the useriD
format. A valid format useriD
consists of three uppercase letters and four digits. The program is to output a message
whether UseriD is valid or not. [5]
Fred surveys the students at his college to find outtheirfavourite hobby. He wants to present
the data as a tally chart.
Fred plans to enter the data into the computer as he surveys the students. After data entry
is complete, he wants to output the total for each hobby.
1 Reading books
\\
vwww
3 Sport
\\\
4 Programming
5 Watching TV
vwwww
ENDFOR
a Write program code to declare and initialise the array Tally [1:5] of integer. [5]
c Write program code to declare an array to store the hobby titles and rewrite the for loop of
your program in
part (b) so that the hobby title is output before each tally. [4]
d Write program code to save the array data in a text file. [5]
e Write program code to read the data from the text file back into the initialised array. [5]
rylN "
ok> x .dO’ v rf e
\Av
fo v
V *V\C
Learning objectives
Part 2
14.01 Terminology
Different programming languages use different terminology for their subroutines, as listed
in Table 14.01.
Pseudocode
PROCEDURE
FUNCTION
Python
void function
fruitful function
VB
Subroutine
Function
Pascal
procedure
function
14.02 Procedures
CALL <procedure!dentifier>()
Syntax definitions
Python
def ():
<statement(s)>
VB.NET
Sub ()
End Sub
Pascal
procedure ;
begin
end;
When programming a procedure, note where the definition is written and how the procedure
is called from the main program.
Code examples
PROCEDURE InputOddNumberO
REPEAT
CALL InputOddNumber()
213
Cambridge International AS and A level Computer Science
214
Python
The Python editor colour-codes the different parts of a statement. This helps
when you are typing your own code. The indentation shows which statements
are part of the loop.
The Visual Basic Express editor colour-codes different parts of the statement,
so it is easy to see if syntax errors are made. The editor also auto-indents and
capitalises keywords.
Variables need to be declared before they are used. The editor will follow the
capitalisation of the variable declaration when you type an identifier without
followingyour original capitalisation.
The editor is predictive: pop-up lists will show when you type the first part of a
statement.
Pascal
When you execute the main program, ReadLn keeps the run-time window
open. The main program finishes with end. (note the full stop).
TASK 14.01
Write program code to implement the pseudocode from Worked Example 12.02 in Chapter
12.
14.03 Functions
In Chapter 13 (Section 13.08), we used built-in functions. These are useful subroutines
written
by other programmers and made available in module libraries. The most-used ones are
usually in the system library, so are available without having to explicitly import them.
You can write your own functions. Any function you have written can be used in another
program if you build up your own module library.
A function is used as part of an expression. When program execution gets to the statement
that includes a function call as part of the expression, the function is executed. The value
returned from this function call is then used in the expression.
When writing your own function, ensure you always return a value as part of the statements
that make up the function (the function body). You can have more than one RETURN
statement if there are different paths through the function body.
216
ENDPROCEDURE
Syntax definitions
Python
def ():
<statement(s)>
return
VB.NET
Function () As
<statement(s)>
= 'Return
End Function
Pascal
function () : ;
begin
<statement (s) ;
result := ; // := ;
end;
When programming a function, the definition is written in the same place as a procedure.
The
function is called from within an expression in the main program, or in a procedure.
Code example
We can write the example procedure from Section 14.02 as a function. In pseudocode, this
is:
Python
Figure 14.04 The Python editor with a function and local variable
The variable Number in Figure 14.04 is not accessible in the main program. Python's
variables are local
unless declared to be global.
Part 2
VB.NET
Pascal
Figure 14.05 The VB.NET editor with (a) global variables and (b) a local variable
The variable Number in Figure 14.05(a) is declared
as a global variable at the start of the module. This
is not good programming practice.
HE
Jaixi
(b)
■—— -1 □ 1 X|
Project2 j 4 . „ .4 ,.
Project j 4- - -
program Project2; ^
program Project2; *■
{SAFFTYPE CONSOLE)
{SAFFTYPE CONSOLE}
uses
uses
SysUtils;
SysOtils;
begin _
repeat
begin _
repeat
ReadLn(Number);
ReadLn(Number) ;
Result := Number;
end;
Result := Number;
end;
begin
NewNumber := InputOddNumber;
begin
ReadLn;
NewNumber := InputOddNumber;
end. ▼]
j ±r
ReadLn;
uL
end. w 1
j ±r
jlL
Figure 14.06 The Pascal editor with (a) global variable and (b) local variable
217
A global variable is available in any part of the program code. It is good programming
practice to declare a variable that is only used within a subroutine as a local variable.
In Python, every variable is local, unless it is overridden with a global declaration. In VB.NET
and Pascal, you need to write the declaration statement for a local variable within the
subroutine.
218
TASK 14.02
Write program code to implement the pseudocode from Worked Example 12.03 in Chapter
12.
The global and local variables are listed in Table 12.11.
14.04 Passing parameters to subroutines
When a subroutine requires one or more values from the main program, we supply these as
arguments to the subroutine at call time. This is how we use built-in functions. We don’t
need
to know the identifiers used within the function when we call a built-in function.
When we define a subroutine that requires values to be passed to the subroutine body, we
use a parameter list in the subroutine header. When the subroutine is called, we supply
the arguments in brackets. The arguments supplied are assigned to the corresponding
parameter of the subroutine (note the order of the parameters in the parameter list must be
the same as the order in the list of arguments). This is known as the subroutine interface.
FUNCTION () RETURNS
ProcedureExampie.py - C: /Users/Syh/ia / My
Sum = 0
return Sum
!Ln: 14[col7o
ConsoleAppUcabonl - Microsoft
jJBliSl
blModule Medulei
Next
SumRange = Sum
End Function
Sub Main()
NewNumber = SumRange(1, 5)
Console.Write Line(NewNumber)
Console.ReadLine()
End Sub
Project J
program Project^;
{SAPPTYPE CONSOLE}
uses
SysUtils;
Sum := 0;
Result := Sum;
end;
WriteLn(NewNumber);
ReadLn;
end.
| Insert
\Code/
JLi^
TASK 14.03
Product Product * n
ENDFOR
RETURN Product
ENDFUNCTION
If a parameter is passed by value, at call time the argument can be an actual value (as we
showed in Section 14.04). If the argument is a variable, then a copy of the current value of
the
variable is passed into the subroutine. The value of the variable in the calling program is not
affected by what happens in the subroutine.
For procedures, a parameter can be passed by reference. At call time, the argument must
be
a variable. A pointerto the memory location of that variable is passed into the procedure. Any
changes that are applied to the variable’s contents will be effective outside the procedure in
the calling program/module.
KEYTERMS
Note that neither of these methods of parameter passing applies to Python. In Python, the
method is called pass by object reference. This is basically an object-oriented way of
passing
parameters and is beyond the scope of this chapter (objects are dealt with in Chapter 27).
The important point is to understand how to program in Python to get the desired effect.
The full procedure header is written in pseudocode, in a very similar fashion to that for
function headers, as:
PROCEDURE ()
220
The parameter list needs more information for a procedure definition. In pseudocode, a
parameter in the list is represented in one of the following formats:
BYREF :
BYVAL cidentifier2> :
The pseudocode for the pyramid example in Chapter 12 (Section 12.04) includes a
procedure
definition that uses two parameters passed by value. We can now make that explicit:
PROCEDURE OutputSymbols (BYVAL NumberOfSymbols : INTEGER, BYVAL Symbol :
CHAR)
DECLARE Count : INTEGER
OUTPUT NewLine
ENDPROCEDURE
In Python (Figure 14.10), all parameters behave like local variables and their effect is as
though they are passed by value.
ProcedureExarnple.py - G/Users/Syhria/Hy
mr i' l’r v
InVB.NET (Figure 14.11), parameters default to passing by value. The keyword Byvai is
automatically inserted by the editor.
In Pascal (Figure 14.12), there is no keyword for passing by value. This is the default
method.
The pseudocode forthe pyramid example generated in Chapter 12 (Section 12.04) includes
a
procedure definition that uses two parameters passed by reference. We can now make that
explicit:
Spaces Spaces - 1
Symbols Symbols + 2
ENDPROCEDURE
Python does not have a facility to pass parameters by reference. Instead the subroutine
behaves as a function and returns multiple values (see Figure 14.13). Note the order of the
variables as they receive these values in the main part of the program.
.JfllxJ
NumberOfSpaces = int(input())
NumberOfSymbols = int(input())
,—r-UJ
-Ln: 6 |Col: 62
222
This way of treating a multiple of values as a unit is called a ‘tuple’. This concept is beyond
the
scope of this book. You can find out more by reading the Python help files.
In VB.NET (Figure 14.14), the ByRef keyword is placed in front of each parameter to be
passed
by reference.
I JP lJ ciib A & , A 4 Ti = ^ 0 s
^Jnjxj
-j .5* -■ j
Part 2
Chapter 14:
In Pascal (Figure 14.15), The keyword var is placed in front of the declaration of parameters
to be passed by reference.
TASK 14.04
1 Write program code to implement the structure chart from Figure 12.02 in Chapter 12
(for the average of two numbers).
2 Write program code to implement the structure chart from Figure 12.03 in Chapter 12
(for the number-guessing game).
3 Amend your program code from Task 14.02 to implement the interface shown in the
structure chart from Figure 12.05 in Chapter 12.
The programs in this section are full solutions to the pyramid-drawing program developed in
Chapter 12 (Section 12.04).
The parameters of the subroutines have different identifiers from the variables in the main
program. This is done deliberately, so that it is quite clear that the parameters and local
variables within a subroutine are separate from those in the calling program or module.
If a parameter is passed by reference to a procedure, the parameter identifier within the
procedure references the same memory location as the variable identifier passed to the
procedure as argument.
224
Python
Number = 0
while Number % 2 == 0:
return Number
def SetValuesQ:
MaxSymbols = InputMaxNumberOfSymbolsO
Spaces = (MaxSymbols + 1) // 2
Symbols = 1
Spaces = Spaces - 1
Symbols = Symbols + 2
return Spaces, Symbols
def main():
ThisSymbol, MaxNumberOfSymbols, NumberOfSpaces, NumberOfSymbols = SetValuesO
while NumberOf Symbols <= MaxNumberOfSymbols:
main()
VB.NET
Module Module1
Do
Number = onsole.ReadLineO
End Sub
Sub SetValues (ByRef Symbol, ByRef MaxSymbols, ByRef Spaces, ByRef Symbols)
Spaces = (MaxSymbols + 1) \ 2
Symbols = 1
End Sub
Next
End Sub
Spaces = Spaces - 1
Symbols = Symbols + 2
End Sub
Sub Main()
Do
Consol .ReadLineO
End Sub
End Module
Pascal
program Project2;
{$APPTYPE CONSOLE}
uses
SysUtils;
repeat
ReadLn (Number) ;
until Number MOD 2 = 1;
end;
procedure SetValues (var Symbol : char; var MaxSymbols, Spaces, Symbols : integer);
begin
ReadLn (Symbol) ;
Symbols := 1;
end;
226
begin
Spaces := Spaces - 1;
Symbols := Symbols + 2;
end;
begin
repeat
end.
Discussion Point:
Can you see how the two procedures outputspaces and outputsymbois have been
replaced by a single procedure OutputChars without changing the effect of the program?
• Declaration of subroutines (functions and procedures) is done before the main program
body.
• VB.NET and Pascal pass parameters by value, as a default, but can return one or more
values via parameters if they are
declared as reference parameters.
• In Python, parameters can only pass values into a subroutine. The only way to update a
value of a variable in the calling
program is to return one or more values from a function.
• When a subroutine is defined, parameters are the ‘placeholders’ for values passed into a
subroutine.
Exam-style Questions
1 Write program code for a procedure OutputTimesTabie that takes one integer parameter,
n, and outputs the times
table torn. For example the procedure call OutputTimesTabie (5) should produce:
1x5 = 5
2 x 5 = 10
3 x 5 = 15
4 x 5 = 20
5 x 5 = 25
6 x 5 = 30
7 x 5 = 35
8 x 5 = 40
9 x 5 = 45
10 x 5 = 50 [6]
2 Write program code for a function isDivisibleO that takes two integer parameters, x and y.
The function is to
return the value True or False to indicate whether x is exactly divisible by y. For example,
isDivisibie (24, 6)
3 A poultry farm packs eggs into egg boxes. Each box takes six eggs. Boxes must not
contain fewer than six eggs.
Write program code for a procedure EggsintoBoxes that takes an integer parameter,
Numberof Eggs. The procedure
is to calculate how many egg boxes can be filled with the given number of eggs and how
many eggs will be left over. The
procedure is to return two values as parameters, NumberofBoxes and EggsLef tover. [9]
227
^t/
CtoiptsrII
Software Development
Learning objectives
Part 2
Problem solving
The first step in solving a problem is to define it clearly. This is usually done in structured
jt English (See Chapter 11, Section 11.02) and is known as a ‘specification’.
The next step is planning a solution. Sometimes there is more than one solution. You need
to
decide which is the most appropriate.
Design
You have a solution in mind. How do you design the solution in detail? Chapter 11 (Section
11.04) showed that an identifier table is a good starting point. This leads you to thinking
about data structures: do you need a ID array or a 2D array to store data while it is
processed? Do you need a file to store data long-term?
[»
i
\
Coding
When you have designed your solution you may need to choose a suitable high-level
programming language. If you know more than one programming language, you have to
weigh up the pros and cons of each one. Looking at Chapter 13, you need to decide which
programming language would best suit the problem you are trying to solve and which
language you are most familiar with.
You implement your algorithm by converting your pseudocode into program code.
Depending on your editor you may have some helpful facilities (for features to expect see
Section 15.02).
Some syntax errors may be flagged up by your editor, so you can correct these as you go
along. A syntax error is a ‘grammatical’ error, in which a program statement does not follow
the rules of the high-level language constructs.
KEY TERMS
Syntax error: an error in which a program statement does not follow the rules of the
language
229
Translation
Some syntax errors may only become apparent when you are using an interpreter or
compiler to translate your program. Interpreters and compilers work differently (see Chapter
7, Section 7.05, and Chapter 20, Section 20.05). When a program compiles successfully,
you
know there will be no syntax errors remaining.
1 This is not the case with interpreted programs. Only statements that are about to be
[ executed will be syntax checked. So, if your program has not been thoroughly tested, it
may
Figure 15.01 gives an example of how a compiler flags a syntax error. The compiler stops
when it first notices a syntax error. The error is often on the previous line. The compiler can’t
tell until it gets to the next line of code and finds an unexpected keyword.
IS ProjectZ.dpr
Project2
program Froject2;
{$APFTYPE CONSOLE}
uses
SysUtils;
repeat
ReadLn(Number);
until Number MOD 2-1;
end;
9: G Insert I \ Code/
iBuiidr
230
Execution
When you start writing programs you may find it takes several attempts before the program
compiles. When it finally does, you can execute it. It may ‘crash’, meaning that it stops
working. In this case, you need to debug the code. The program may run and give you some
output. This is the Eureka moment: ‘it works!!!!’. But does the program do what it was meant
to do?
Testing
Only thorough testing can ensure the program really works under all circumstances (see
Sections 15.03 to 15.05).
Discussion Point:
Prettyprinting
Prettyprint refers to the presentation of the program code typed into an editor. It includes
indentation, colour-coding of keywords and comments.
Part 2
Python
IDLE (see Figure 15.02) automatically colour-codes keywords, built-in function calls,
comments, strings and the identifier in a function header. Indentation is automatic. When
you need to unindent after a block of statements, delete the spaces provided.
VB.NET
The editor provided by Visual Studio (see Figure 15.03) automatically colour-codes
keywords,
object references (such as console), comments and strings. The editor automatically indents
blocks of code correctly.
231
232
Pascal
This Delphi editor (see Figure 15.04) emboldens keywords and colour-codes strings,
comments and system directives (such as {apptype console}). When the programmer
indents a line of code, the next line is automatically indented by the same amount.
Context-sensitive prompts
This feature displays hints or a choice of keywords and available identifiers appropriate at
the
current insertion point of the program code.
Figure 15.05 shows an example of the Visual Studio editor responding to text typed in by the
programmer.
i -I AS -j i IS ' • • ►
Console.ReadLine()
4axNumberOfSymbols(MaxSymbols)
= (MaxSymbols + 1) \ 2
Ls = 1
OutputChar(ByVal Number.
Dim Count As Integer
' • ! FirstWeekOfYear
v Fix
FlagsAttribute
isl For
fi! For Each'
Sub ^ ForeignKeyConstraint
v Format
v FormatCurrency
<S FnrmatnatVTime
■ Common All
ByVal Symbol)
2 Choose the
required keyword
For statement
irOfSpaces, NumberOfS'
INC
VE
zJ
In Figure 15.06, the Python editor, IDLE, shows the required parameters after a function
identifier has been typed in by the programmer.
def InputMaxNumberOfSymbols():
Number = 0
while Number % 2 = 0:
def SetValues():
MaxSymbols = InputMaxNumberOfSymbols()
Ln: lSiCol: 12
When a line has been typed, some editors perform syntax checks and alert the programmer
to errors.
Figure 15.07 shows an example of the Visual Studio editor responding to a syntax error.
JP -Ll 'AS /- i -A
Modulel.vb* x
SModule Modulel
Dim MaxNumbe
Dim Symbol As Char
100 % - 4
1 Hei
l ho\
233
When working on program code consisting of many lines of code, it saves excessive
scrolling
if you can collapse blocks of statements.
Figure 15.08 shows the Visual Studio editor window with the procedures collapsed, so the
programmer can see the global variable declarations and the main program body. The
procedure
headings are still visible to help the programmer supply the correct arguments when calling
one of
these procedures from the main program.
234
TASK 15.01
Investigate the facilities in the editors you have available. If you have a choice of editors, you
may like to use the editor with the most helpful facilities.
Finding syntax errors is easy. The compiler/interpreter will find themforyou and usually gives
you a hint as to what is wrong.
Much more difficult to find are logic errors and run-time errors. A run-time error occurs when
program execution comes to an unexpected halt or ‘crash’ or it goes into an infinite loop and
‘freezes’.
KEY TERMS
Logic error: an error in the logic of the solution that causes it not to behave as intended
Run-time error: an error that causes program execution to crash or freeze
Both of these types of error can only be found by careful testing. The danger of such errors
is that
they may only manifest themselves under certain circumstances. If a program crashes every
time it
is executed, it is obvious there is an error. If the program is used frequently and appears to
work until
•w
a certain set of data causes a malfunction, that is much more difficult to discover without
perhaps
serious consequences.
Stub testing
When you develop a user interface, you may wish to test it before you have implemented all
the facilities. You can write
a ‘stub’ for each procedure (see Figure 15.09). The procedure body only contains an output
statement to acknowledge
that the call was made. Each option the user chooses in the main program will call the
relevant procedure.
235
Black-box testing
As the programmer, you can see your program code and your testing will involve knowledge
of the code (see the next section, about white-box testing).
As part of thorough testing, a program should also be tested by other people, who do not
see
the program code and don’t know how the solution was coded.
236
Such program testers will look at the program specification to see what the program is
meant to do, devise test data and work out expected results. Test data usually consists of
normal data values, boundary data values and erroneous data values.
The tester then runs the program with the test data and records their results. This method of
testing is called black-box testing because the tester can’t see inside the program code: the
program is a ‘black box’.
Where the actual results don’t match the expected results, a problem exists. This needs
further investigation by the programmer to find the reason for this discrepancy and correct
the program (see Section 15.06). Once black-box testing has established that there is an
error, other methods (see Sections 15.04 and 15.05) have to be employed to find the lines of
code that need correcting.
KEY TERMS
Black-box testing: comparing expected results with actual results when a program is run
White-box testing
How can we check that code works correctly? We choose suitable test data that checks
every
path through the code.
KEY TERMS
INPUT Numberl
INPUT Number2
INPUT Number3
IF Numberl > Number2
THEN // Numberl is bigger
OUTPUT Numberl
ELSE
OUTPUT Number3
ENDIF
OUTPUT Number2
ELSE
OUTPUT Number3
ENDIF
ENDIF
To test it, we need four sets of numbers with the following characteristics:
• The first number larger than the second number; the third number is the largest.
• The second number is larger than the first number; the third number is the largest.
Note that it does not matter what exact values are chosen as test data. The important point
is that the values differ in such a way that each part of the nested if statement is checked.
Table 15.01 lists four sets of test data and the results from them. The parts of the algorithm
not entered fora particular set of data are greyed out. This makes it easier to see that each
part has been checked after all four tests have been done.
Line of algorithm
INPUT Numberl
Test 1
15
INPUT Number2
12
INPUT Number3
TRUE
THEN
TRUE
THEN
OUTPUT Numberl
Output 15
ELSE
OUTPUT Number3
ENDIF
ELSE
THEN
OUTPUT Number2
ELSE
OUTPUT Number3
ENDIF
ENDIF
For more white-box testing methods see Sections 15.04 and 15.05.
237
KEYTERMS
Python
To debug using IDLE, from the Python Shell (see Figure 15.10), choose Debugger from the
Debug menu.
Open the source program from the File menu. To set a breakpoint, right-click on the line you
want to set the breakpoint on.
Start running the program by clicking the Go button in the Debug Control window. The
program stops at the breakpoint (Figure 15.11(a)). Then click the Step button to execute one
instruction at a time.
The Debug Control window (Figure 15.11(b)) shows which line number is about to be
executed (line 4 in the example). The contents of all variables are also displayed in the
Debug
Control window.
238
-JnJxJ
Help
Total = 0
Numbers = 0
NewNumber = int(input())
NewNumber = int(input())
Numbers = Numbers + 1
Ln: 4|Col: 1
Figure 15.11 (a) Python program showing a breakpoint and (b) the Debug Control window
VB.NET
In Visual Studio (see Figure 15.12(a)), you can set breakpoints by clicking in the left margin
of
the editor.
Click on Run to run the program and enter data (se e Figu re 15.12(b)). When your program
reaches the breakpoint, use the ‘Step Into’ button ( ) to single-step through your
program.
To set up a variable watch window, select Windows from the Debug menu and choose
Watch.
A table is displayed at the bottom of the editor (see Figure 15.12(a)) and you can type in the
variable names you want to inspect.
Part 2
Loop
End Sub
Watch
window
End Module
(a)
(b)
Figure 15.12 VB.NET (a) program with breakpoint and (b) run window with input
Pascal
In the Delphi editor, you need to switch the debugger on before compiling your program:
in the Tools menu, select Debugger Options and ensure the Integrated Debugger option is
ticked.
You can now set breakpoints by clicking in the left margin of the editor (see Figure 15.13(a)).
Click on Run. When your program reaches the breakpoint, use the Trace into’ button ( # )
To set up a variable watch window, from the Run menu, choose ‘Add Watch ../. Type one
variable name at a time into the Expression box and click Ok. To see the watch window, from
the View menu, choose ‘Debug windows’ and ‘Watches’ (see Figure 15.13(c)).
240
(C)
Figure 15.13 Pascal (a) program with breakpoint, (b) run window with input and (c) watch
window
A good way of checking that an algorithm works as intended is to dry-run the algorithm
using a trace table and different test data.
The idea is to write down the current contents of all variables and conditional values at each
step of the algorithm.
KEY TERMS
Trace table: a table with a column for each variable that records their changing values
Part 2
Chapter 15
WORKED EXAMPLE 15.02
Tracing an algorithm
SecretNumber 34
NumberOfGuesses 1
REPEAT
IF Guess = SecretNumber
THEN
ELSE
ENDIF
NumberOfGuesses NumberOfGuesses + 1
ENDIF
To test the algorithm, construct a trace table (Table 15.02) with one column for each
variable used in the algorithm and also for the condition Guess > SecretNumber
Now carefully look at each step of the algorithm and record what happens. Note that
we do not tend to write down values that don’t change. Here SecretNumber does not
change after the initial assignment, so the column is left blank in subsequent rows.
SecretNumber
Guess
NumberOfGuesses
Guess >
SecretNumber
Message
34
FALSE
.. .larger...
55
TRUE
.. .smaller...
30
FALSE
.. .larger...
42
TRUE
.. .smaller...
36
TRUE
.. .smaller...
33
FALSE
.. .larger...
34
... 7 guesses
We only make an entry in a cell when an assignment occurs. Values remain in variables
until they are overwritten. So a blank cell means that the value from the previous entry
remains.
It is important to start filling in a new row in the trace table for each iteration (each time
round the loop).
242
Tracing an algorithm
To test the improved algorithm of Worked Example 11.12 (bubble sort), dry-run the
algorithm by completing the trace table (Table 15.03).
Maxlndex 7
n Maxlndex - 1
REPEAT
NoMoreSwaps TRUE
FOR j 1 TO n
THEN
Temp MyList[j]
MyList[j + 1] Temp
NoMoreSwaps FALSE
ENDIF
ENDFOR
nn-1
UNTIL NoMoreSwaps = TRUE
Max
NoMoreSwaps
Temp
MyList
Index
MyList [j + 1]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
34
98
41
19
25
TRUE
FALSE
FALSE
FALSE
TRUE
98
98
TRUE
98
41
98
TRUE
98
19
98
TRUE
98
25
98
5
TRUE
FALSE
FALSE
TRUE
34
34
FALSE
TRUE
41
19
41
TRUE
41
25
41
TRUE
FALSE
2
FALSE
FALSE
TRUE
34
19
34
TRUE
34
25
34
TRUE
FALSE
FALSE
FALSE
Part 2
TASK 15.02
Design a trace table for the following algorithm:
HexLength Length(HexString)
CASE OF HexDigit
'A': HexValue 10
'B': HexValue 11
•C': HexValue 12
'D': HexValue «- 13
'E': HexValue 14
•F': HexValue 15
ENDCASE
RETURN ValueSoFar
ENDFUNCTION
Maintaining programs is not like maintaining a mechanical device. It doesn't need lubricating
and parts don't wear out. Corrective maintenance of a program refers to the work required
when a program is not working correctly due to a logic error or because of a run-time error.
Sometimes program errors don’t become apparent for a tong time because it is only under
very rare circumstances that there is an unexpected result or the program crashes. These
circumstances might arise because part of the program is not used often or because the
data
on an occasion includes extreme values.
G
When a problem is reported, the programmer needs to find out what is causing the bug. To
find a bug, a programmer either uses the features found in an IDE (see Section 15.04) or a
trace table (see Section 15.05).
TASK 15.03
INPUT Binarystring
FOR i «- 1 TO StringLength
Bit Binarystring [i]
1 Dry-run the algorithm using 'ior as the input. Complete the trace table.
2 The result should be 5. Can you find the error in the code and correct it?
Programs often get changed to make them perform functions they were not originally
designed to do.
For example, the Connect 4 game introduced in Chapter 12 (Worked Example 12.03) allows
two players, 0 and X, to play against each other. An amended version would be for one
player
to be the computer. This would mean a single player could try and win against the computer.
244 I
TASK 15.04
Design the algorithm to simulate the computer playing the part of Player X in Connect 4.
Chapter 15: Software Development
Exam-style Questions
ENDFUNCTION
a Dry-run the function call Binary (n) by completing the given trace table.
Number
Binarystring
PlaceValue
11
11
b i Now dry-run the function call Binary do) by completing the given trace table.
Number
Binarystring
PlaceValue
10
11
24£
[5]
ii The algorithm is supposed to convert a denary integer into the equivalent binary number,
stored as a
string of Os and Is. Explain the result of each dry-run and what needs changing in the given
algorithm. [3]
246
Chapter 16
Data Representation
Learning objectives
This chapter must start with a clarification. It is generally accepted that a programmer writes
a program which is to be used by a ‘user’ in the same way that an operating system provides
a 'user’ interface. However, in the activity of programming the programmer now becomes the
'used of the programming language. The term ‘user-defined data type’ applies to this latter
type of user.
A non-composite data type has a definition which does not involve a reference to another
type. The simple built-in types such as integer or real are obvious examples. When a
programmer uses a simple built-in type the only requirement is for an identifier to be named
with a defined type. A user-defined type has to be explicitly defined before an identifier can
be created. Two examples are discussed here.
An enumerated data type defines a list of possible values. The following pseudocode
shows two examples of type definitions:
TYPE
The values defined in an enumerated data type are ordinal. This means that they have an
implied order of values. This makes the second example much more useful because the
ordering can be put to many uses in a program. For example, a comparison statement can
be
used with the values and variables of the enumerated data type:
A pointer data type is used to reference a memory location. It may be used to construct
dynamically varying data structures.
The pointer definition has to relate to the type of the variable that is being pointed to. The
pseudocode for the definition of a pointer is illustrated by:
TYPE
TMyPointer = *
Declaration of a variable of pointer type does not require the caret symbol ( A ) to be used:
A special use of a pointer variable is to access the value stored at the address pointed to.
The
pointer variable is said to be ‘dereferenced’:
ValuePointedTo MyPointer*
A composite user-defined data type has a definition with reference to at least one other type.
Three examples are considered here.
A record data type is the most useful and therefore most widely used. It allows the
programmer to collect together values with different data types when these form a coherent
whole.
248
KEY TERMS
Record data type: a data type that contains a fixed number of components, which can be of
different
types
As an example, a record could be used for a program using employee data. Pseudocode for
defining the type could be:
TYPE
TEmployeeRecord
Employeel.DateEmployed «- #16/05/2017 #
A particular use of a record is for the implementation of a data structure where one or
possibly two of the variables defined are pointer variables.
A set data type allows a program to create sets and to apply the mathematical operations
defined in set theory. The following is a representative list of the operations to be expected:
• union
• difference
• intersection
When object-oriented programming is not being used a programmer may choose not to use
any user-defined data types. However, for any reasonably large program it is likely that their
use will make a program more understandable and less error-prone. Once the programmer
has decided because of this advantage to use a data type that is not one of the built-in types
then user-definition is inevitable. The use of, for instance, an integer variable is the same
for any program. However, there cannot be a built-in record type because each different
problem will need an individual definition of a record.
In everyday computer usage, a wide variety of file types is encountered. Examples are
graphic
files, word-processing files, spreadsheet files and so on. Whatever the file type, the content
is
stored using a defined binary code that allows the file to be used in the way intended.
249
For the very specific task of storing data to be used by a computer program, there are only
two defined file types. A file is either a text file or a binary file. A text file, as discussed in
Chapter 13 (Section 13.09), contains data stored according to a defined character code as
defined in Chapter 1 (Section 1.03). It is possible, by using a text editor, to create a text file.
A binary file stores data in its internal representation, for example an integer value might be
stored in two bytes in two’s complement representation. This type of file will be created using
a specific program.
The organisation of a binary file is based on the concept of a record. A file contains records
and each record contains fields. Each field consists of a value.
KEYTERMS
Binary file: a file designed for storing data to be used by a computer program
Record: a collection of fields containing data values
Discussion Point:
A record is a user-defined data type. It is also a component of a file. Can there be or should
there be any relationship between these two concepts?
Serial files
A serial file contains records which have no defined order. A typical use of a serial file would
be for a bank to record transactions involving customer accounts. A program would be
running. Each time there was a withdrawal or a deposit the program would receive the
details as data input and would record these in a transaction file. The records would enter
the file in chronological order but otherwise the file would have no ordering of the records.
A text file can be considered to be a type of serial file but it is different because the file has
repeating lines which are defined by an end-of-line character or characters. There is no end-
of-record character. A record in a serial file must have a defined format to allow data to be
input and output correctly.
Sequential files
A sequential file has records that are ordered. It is the type of file suited to long-term
storage of data. As such it should be the type of file that is considered as an alternative to a
database. The discussion in Chapter 10 (Section 10.01) compared a text file with a database
but the arguments for using a database remain the same if a sequential file is used for the
comparison. In the banking scenario, a sequential file could be used as a master file for an
individual customer account. Periodically, the transaction file would be read and all affected
customer account master files would be updated.
In order to allow the sequential file to be ordered there has to be a key field for which the
values are unique and sequential but not necessarily consecutive. It is worth emphasising
the difference between key fields and primary keys in a database table, where the values
are required to be unique but not to be sequential. In a sequential file, a particular record is
found by sequentially reading the value of the key field until the required value is found.
250
Direct-access files
Direct-access files are sometimes referred to as ‘random-access’ files but, as with random-
access memory, the randomness is only that the access is not defined by a sequential
reading of the file. For large files, direct access is attractive because of the time that would
be taken to search through a sequential file. In an ideal scenario, data in a direct-access file
would be stored in an identifiable record which could be located immediately when required.
Unfortunately, this is not possible. Instead, data is stored in an identifiable record but finding
it may involve an initial direct access to a nearby record followed by a limited serial search.
The choice of the position chosen fora record must be calculated using data in the record
so that the same calculation can be carried out when subsequently there is a search for
the data. The normal method is to use a hashing algorithm. This takes as input the value
for the key field and outputs a value for the position of the record relative to the start of the
file. The hashing algorithm must take into account the potential maximum length of the file,
that is, the number of records the file will store. A simple example of a hashing algorithm, if
the key field has a numeric value, is to divide the value by a suitably large number and use
the remainder from the division to define the position. This method will not create unique
positions. If a hash position is calculated that duplicates one already calculated by a different
key, the next position in the file is used. This is why a search will involve a direct access
possibly followed by a limited serial search.
File access
Once a file organisation has been chosen and the data has been entered into a file, the
question now to be considered is how this data is to be used. If an individual data item is to
be read then the access method for a serial file is to successively read record by record until
the required data is found. If the data is stored in a sequential file the process is similar but
only the value in the key field has to be read. For a direct-access file, the value in the key
field
Part 3
is submitted to the hashing algorithm which then provides the same value for the position in
the file that was provided when the algorithm was used at the time of data input.
File access might also be needed to delete or edit data. The normal approach with a
sequential file is to create a new version of the file. Data is copied from the old file to the new
file until the record is reached which needs deleting or editing. If deletion is needed, reading
and copying of the old file continues from the next record. If a record has changed, an edited
version of the record is written to the new file and then the remaining records are copied to
the new file. For a direct-access file there is no need to create a new file (unless the file has
become full). A deleted record can have a flag set so that in a subsequent reading process
the
record is skipped over.
Serial file organisation is well suited to batch processing or for backing up data on magnetic
tape. Flowever, if a program needs a file in which individual data items might be read,
updated or deleted then direct-access file organisation is the most suitable and serial file
organisation the least suitable.
A real number is one with a fractional part. When we write down a value for a real number in
the denary system we have a choice. We can use a simple representation or we can use an
exponential notation (sometimes referred to as scientific notation). In this latter case we
have
options. For example, the number 25.3 might alternatively be written as:
A binary code must be used for storing a real number in a computer system. One possibility
is
to use a fixed-point representation. In this option, an overall number of bits is chosen with a
defined number of bits for the whole number part and the remainder for the fractional part.
The alternative is a floating-point representation. The format for a floating-point number
can be generalised as:
±MxRE
In this option a defined number of bits are used for what is called the significand or mantissa,
±M. The remaining bits are used for the exponent or exrad, E. The radix, R is not stored in
the
representation; it has an implied value of 2.
Floating-point representation: a representation of real numbers that stores a value for the
mantissa
and a value for the exponent
To illustrate the differences between the two representations a very simple example can be
used. Let’s consider that a real number is to be stored in eight bits.
For the fixed-point option, a possible choice would be to use the most significant bit as a
sign
bit and the next five bits for the whole number part leaving two bits for the fractional part.
252
Some important non-zero values in this representation are shown in Table 16.01. (The bits
are
shown with a gap to indicate the implied position of the binary point.)
Description
Binary code
Denary equivalent
01111111
31.75
Smallest positive value
000000 01
0.25
100000 01
-0.25
mm li
-31.75
Fora floating-point representation, a possible choice would be four bits for the mantissa and
four bits for the exponent with each using two’s complement representation. The exponent
is stored as a signed integer. The mantissa has to be stored as a fixed-point real value. The
question now is where the binary point should be.
Two of the options for the mantissa being expressed in four bits are shown in Table 16.02(a)
and Table 16.02(b). In each case, the denary equivalent is shown and the position of the
implied binary point is shown by a gap. Table 16.02(c) shows the three largest magnitude
positive and negative values for integer coding that will be used for the exponent.
a)_ b) c)
Real value
in denary
0111
3.5
0110
3.0
0101
2.5
1010
-3.0
1001
-3.5
100 0
-4.0
Second bit
pattern fora
real value
Real
value in
denary
0111
.875
0110
.75
0101
.625
1010
-.75
1001
-.875
1000
-1.0
Integer bit
pattern
Integer value
in denary
0111
0110
0101
1010
-6
1001
-7
1000
-8
Table 16.02 Coding a fixed-point real value in eight bits (four for the mantissa and
four for the exponent)
It can be seen that having the mantissa with the implied binary point immediately following
the sign bit produces smaller spacing between the values that can be represented. This is
the
preferred option for a floating-point representation. Using this option, the most important
non-zero values for the floating-point representation are shown in Table 16.03. (The implied
binary point and the mantissa exponent separation are shown by a gap.)
Description
Binary code
Denary equivalent
01110111
.875 x 2 7 = 112
0 001 1000
1 111 1000
-.125 x 2- 8 = -1/2048
1000 0111
-1 x 2 7 = -128
The comparison between the values in Tables 16.01 and 16.03 illustrate the greater range of
positive and negative values available if floating-point representation is used.
Part 3
f 1 Using the methods suggested in Chapter 1 (Section 1.01) can you confirm for yourself
that
► the denary equivalents of the binary codes shown in Tables 16.02 and Table 16.03 are as
. indicated?
: 2 Can you also confirm that conversion from positive to negative or vice versa for a fixed-
format real value still follows the rules defined in Chapter 1 (Section 1.02) for two’s
complement representation.
, both with regard to the total number of bits to be used and the split between those
representing the mantissa and those representing the exponent. In practice, a choice for
the total number of bits to be used will be available as an option when the program is
written. However, the split between the two parts of the representation will have been
' determined by the floating-point processor. If you did have a choice you would base a
■ decision on the fact that increasing the number of bits for the mantissa would give better
precision for a value stored but would leave fewer bits for the exponent so reducing the
range of possible values.
r means using the largest possible magnitude for the value represented by the mantissa.
j To illustrate this we can consider the eight-bit representation used in Table 16.03. Table
16.04
Denary representation
0.125 x 2 4
0 0010100
0.25 x 2 3
0 010 0011
0.5 x 2 2
0 100 0010
Table 16.04 Alternative representations of denary 2 using four bits each for mantissa and
exponent.
For a negative number we can consider representations for -4 as shown in Table 16.05.
Denary representation
-0.25 x 2 4
1110 0100
-0.5 x2 3
1 100 0011
-1.0 x 2 2
1 000 0010
Table 16.05 Alternative representations of denary -4 using four bits each for mantissa and
exponent.
It can be seen that when the number is represented with the highest magnitude for the
mantissa, the two most significant bits are different. This fact can be used to recognise that
a number is in a normalised representation. The values in these tables also show how a
number could be normalised. For a positive number, the bits in the mantissa are shifted left
until the most significant bits are 0 followed by 1. For each shift left the value of the exponent
is reduced by 1.
254
The same process of shifting is used for a negative number until the most significant bits are
1 followed by 0. In this case, no attention is paid to the fact that bits are falling off the most
significant end of the mantissa.
Conversion of representations
In Chapter 1 (Section 1.01), a number of methods for converting numbers into different
representations were discussed. The ideas presented there now need a little expansion.
Let’s start by considering the conversion of a simple real number, such as 4.75, into a simple
fixed-point binary representation. This looks easy because 4 converts to 100 in binary and
.75
converts to .11 in binary so the binary version of 4.75 should be:
100.11
However, we now remember that a positive number should start with 0. Can we just add a
sign bit? Fora positive number we can. Denary 4.75 can be represented as 0100.11 in
binary.
For negative numbers we still want to use two’s complement form. So; to find the
representation of -4.75 we can start with the representation for 4.75 then convert it to two’s
complement as follows:
To check the result, we can apply Method 2 from Worked Example 1.01 in Chapter 1.1011 is
the code for -8 + 3 and .01 is the code for .25; -8 + 3 + .25 = -4.75.
We can now consider the conversion of a denary value expressed as a real number into a
floating-point binary representation. The first thing to realise is that most fractional parts do
not convert to a precise representation. This is because the binary fractional parts represent
a half, a quarter, an eighth, a sixteenth and so on. Unless a denary fraction is a sum of a
collection of these values, there cannot be an accurate conversion. In particular, of the
values
from .1 through to .9 only .5 converts accurately. This was mentioned in Chapter 1 (Section
1.02) in the discussion about storing currency values.
1 Convert the whole-number part using the method described in Chapter 1 (Section 1.01).
3 Convert the fractional part using the method described in Worked Example 16.01.
5 Adjust the position of the binary point and change the exponent accordingly to achieve a
normalised form.
Example 1
Part 3
4 Shifting the binary point gives 0.100011 which has exponent value denary 4.
5 The next stage depends on the number of bits defined for the mantissa and the
exponent; if ten bits are allocated for the mantissa and four bits are allocated for the
exponent the final representation becomes 0100011000 for the mantissa and 0100 for
the exponent.
Example 2
Let’s consider the conversion of 8.63. The first step is the same but now the .63 has to be
converted by the ‘multiply by two and record whole number parts’ method. This works as
follows:
.63 x 2 = 1.26 so 1 is stored to give the fraction .1
At this stage it can be seen that multiplying .08 by 2 successively is going to give a lot of
zeros in the binary fraction before another 1 is added so the process can be stopped.
What has happened is that .63 has been approximated as .625. So, following Steps 3-5 in
Example 1, the final representation becomes 0100010100 for the mantissa and 0100 for the
exponent.
TASK 16.01
Convert the denary value -7.75 to a floating-point binary representation with ten bits for the
mantissa and four bits for the exponent. Start by converting 7.75 to binary (make sure you
add
the sign bit!). Then convert to two's complement form. Finally, choose the correct value for
the exponent to leave the implied position of the binary point after the sign bit. Convert back
to denary to check the result.
The other potential problem relates to the range of numbers that can be stored. Referring
back to the simple eight-bit representation illustrated in Table 16.03, the highest value
represented is denary 112. A calculation can easily produce a value higherthan this. As
Chapter 5 (Section 5.02) illustrated, this produces an overflow error condition. However, for
floating-point values there is also a possibility that if a very small number is divided by a
number greater than 1 the result is a value smaller than the smallest that can be stored. This
is an underflow error condition. Depending on the circumstances, it may be possible for a
program to continue running by converting this very small number to zero but clearly this
must involve risk.
Exam-style Questions
1 A programmer may choose to use a user-defined data type when writing a program.
a Give an example of a non-composite user-defined data type and explain why its use by a
programmer is
b A program is to be written to handle data relating to the animals kept in a zoo. The
programmer chooses to
use a record user-defined data type.
iii Write pseudocode for the definition of a record type which is to be used to store: animal
name, animal age,
i What are the terms used to describe the components of such a file. [2]
ii Explain the difference between a binary file and a text file. [3]
i Explain the difference between the three types of file organisation. [4]
ii Give an example of file use for which a serial file organisation would be suitable. Justify
your choice. [3]
iii Give an example of file use when direct access would be advantageous. Justify your
choice. [3]
Part 3
3 A file contains binary coding. The following are four successive bytes in the file:
10010101
00110011
11001000
00010001
a The four bytes represent two numbers in floating-point representation. The first byte in
each case represents
the mantissa. Each byte is stored in two’s complement representation.
i Give the name for what the second byte represents in each case. [1]
ii State whether the representations are for two positive numbers or two negative numbers
and explain why. [2]
iii One of the numbers is in a normalised representation. State which one it is and give the
reason why. [2]
iv State where the implied binary point is in a normalised representation and explain why a
normalised
v If two bytes were still to be used but the number of bits for each component was going to
be changed by
allocating more to the mantissa, what effect would this have on the numbers that could be
represented?
b Using the representation described in part (a), Show the representation of denary 12.43 as
a floating-point
Learning objectives
Part 3
There are five requirements for a data communications system: a sender, a receiver, a
transmission medium, a message and a protocol. Here, a ‘message’ is a general term to
describe any type of transmitted data. A message can only be transmitted if there is an
agreed protocol. Protocols are discussed in later sections of this chapter.
A data communications system may consist of an isolated network. There are several
possible topologies for an isolated network. The simplest possible network is where two
end-systems are connected by a network link as shown in Figure 17.01. This is an example
of
a point-to-point connection for which there is a dedicated link.
A ring topology is shown in Figure 17.03(a). In this configuration, each end-system has a
point-
to-point connection to the two adjacent end-systems.
25 !
(a)
(b)
260
In the star topology, the end-systems may again be user workstations or servers but the
central device is different. The star topology is nowadays the dominant configuration for an
individual network. There are several reasons for this. The most important is that the central
device can be used to connect the network to other networks and, in particular, to the
Internet. A specialised application is to use the star topology to function logically as a ring.
With the appropriate software installed, each end-system can function as though it has just
two directly connected neighbours.
Discussion Point:
Which network topologies have you used? You may wish to defer this discussion until you
have read about network devices later in this chapter.
There are several concepts relating to the use of a network for communication and data
transmission.
The data flow along an individual link is simplex, half duplex or full duplex. In simplex mode
the flow is one-way. In a duplex mode flow is both ways but only occurs simultaneously in
full-duplex mode.
Message types
Transmission modes
For communication over an internetwork there are two possible approaches: circuit
switching or packet switching. Circuit switching is the method used in the traditional
telephone system. Because the Public Switched Telephone Networks (PTSNs) have now
largely converted to digital technology, the same method can be provided for data transfer
rather than voice communication. Typically this is provided in a leased line service. The
concept is illustrated in Figure 17.05, which shows end-systems connected to local
exchanges
which have a switching function and which are connected via a number of intermediate
nodes with a switching function.
2 The system checks whether or not the receiver is ready to accept data.
3 If the receiver is available, a sequence of links is established across the network.
It is not necessary for this discussion to define what could constitute a node in a circuit-
switched network. The links that are provided between the nodes are dedicated channels in
shared transmission media that guarantee unimpeded transmission. When a telephone call
is made there is a definite end of the call with removal of the links. However, for a leased-line
data connection there might be a permanent circuit established.
The packet-switching method allows data transmission without a circuit being established.
Data cannot be sent in a continuous stream. Instead data is packaged in portions inside
packets. A packet consists of a header which contains instructions for delivery plus the data
body. The method is similar to that used by the postal service but rather more complex! The
network schematic shown in Figure 17.05 is still appropriate to describe packet switching
except that the links used are not defined at the time a packet is transmitted by the sender.
Packet-switching services
When packet switching is used there are two ways that the network can provide a service:
connectionless service or connection-oriented service.
An end-system on an Ethernet LAN needs a network interface card (NIC). Each NIC has a
unique ‘physical’ address. This is sometimes referred to as the MAC address as explained
in Section 17.06. The end-system itself has no identification on the network. If the NIC is
removed and inserted into a different end-system, it takes the address with it.
The simplest device that can be used at the ‘centre’ of a star topology LAN is a hub. A hub
ensures that any incoming communication is broadcast to all connected end-systems.
However, the use of a hub is not restricted to supporting an isolated network. One possibility
is to have a hierarchical configuration with one hub connected to other hubs, which support
individual LANs. Another possibility is for a hub to have a built-in broadband modem. This
allows all of the end-user systems on the LAN to have an Internet connection when this
modem is connected to a telephone line.
A switch can function as a hub but it is a more intelligent device and, in particular, can keep
track of the addresses of connected devices. This allows a switch to send an incoming
transmission to a specific end-system as a unicast. This facility obviously reduces the
amount
of network traffic compared to that generated by a hub.
A router is the most intelligent of the connecting devices. It is in effect a small computer.
It can function as a switch but the router can make a decision about which device it will
transmit a received transmission to. As was mentioned in Chapter 2 (Section 2.04), the main
use of routers is in the backbone fabric of the Internet. Nearer to the end-systems, a router
may function as a gateway, as a network address translation box (described in Chapter 2
(Section 2.07)) or be combined with a firewall. There is further discussion of routers in
Section
17.04.
Protocols are essential for successful transmission of data over a network. Each protocol
defines a set of rules that must be agreed between sender and receiver. At the simplest
level,
a protocol could define that a positive voltage represents a bit with value 1. At the other
extreme, a protocol could define the format of the first 40 bytes in a packet.
KEYTERMS
Protocol: a set of rules for data transmission which are agreed by sender and receiver
The complexity of networking requires a very large number of protocols. A protocol suite is a
collection of related protocols. TCP/IP is the dominant protocol suite for Internet usage. TCP/
IP can be explained on the basis of the network model shown in Figure 17.06.
• Each layer except the physical layer represents software installed on an end-system or on
a router.
• The software for each layer must provide the capability to receive and to transmit data in
full-duplex mode to an adjacent layer.
As a result, an application run on one end-system can behave as though there was a direct
connection with an application running on a different end-system. To achieve this, the
application layer protocol on the sender end-system sends a ‘message’ to the transport layer
protocol on the same system. The transport layer protocol then initiates a process which
results in the identical ‘message’ being delivered to the receiver end-system. Finally, on the
receiver end-system, the transport layer protocol delivers the ‘message’ to the application
layer protocol.
TCP/IP
protocol
suite
The TCP/IP protocol suite only operates at the top three layers. The lower layers operate
with
a different protocol suite, such as Ethernet. A router has no awareness of the two highest
layers.
The selection has been chosen to illustrate that the TCP/IP suite encompasses a very wide
range of protocols which is still evolving. Some of the listed protocols will not be considered
further.
263
As well as needing to ensure safe delivery, TCP also has to ensure that any response is
directed back to the application protocol. Thus one item in the header is the port number
which identifies the application layer protocol. For example, for HTTP the port number is
1. The packet must also include the port number for the application layer protocol at the
receiving end-system. However, TCP is not concerned with the address of the receiving
end-
system. If the packet is one of a sequence, a sequence number is included to ensure
eventual
correct reassembly of the user data.
The TCP protocol is connection-oriented. As described in Section 17.02, initially just one
packet of a sequence is sent to the network layer. Once the connection has been
established,
TCP sends the other packets and receives response packets containing acknowledgements.
This allows missing packets to be identified and re-sent.
IP (Internet Protocol)
The function of the network layer, and in particular of the IP protocol, is to ensure correct
routing over the Internet. To do this it takes the packet received from the transport layer and
adds a further header. This header contains the IP addresses of both the sender and the
receiver. To find the IP address of the receiver, it is very likely to use the DNS system to find
the address corresponding to the URL supplied in the user data. This aspect was discussed
in
some detail in Chapter 2 (Section 2.08).
The IP packet, which is usually called a ‘datagram’, is sent to the data-link layer and
therefore
to a different protocol suite. The data-link layer assembles datagrams into ‘frames’. At this
stage, transmission can begin. Once the IP packet has been sent to the data-link layer, IP
has no further duty. IP functions as a connectionless service. If IP receives a packet which
contains an acknowledgement of a previously sent packet, it will simply pass the packet on
to TCP with no awareness of the content.
The router
As Figure 17.06 shows, the frame sent by the data-link layer will arrive at a router during
transmission (more likely at several routers!). At this stage, the datagram content of the
frame
is given back to IP. It is now the function of the router software to choose the next target host
in the transmission. The software has access to a routing table appropriate to that router.
The
size and complexity of the Internet prohibits a routerfrom having a global routing table. IP
then passes the datagram back to the data-link layer at the router.
The distinction between a switch and a router was discussed earlier. A further point to
note here is that when a frame arrives at a switch, it is transmitted on without any routing
decision. A switch operates in the data-link layer, not in the network layer.
17.05 Application-layer protocols associated with TCP/IP
There are very many application-layer protocols. This discussion will consider only a few of
the protocols that were introduced early in the use of TCP/IP.
Because HTTP (HyperText Transfer Protocol) underpins the World Wide Web it has to be
considered to be the most important application-layer protocol. Every time a user accesses a
website using a browser, HTTP is used but its functionality is hidden from view.
CRLF
where CR and LF are the ASCII carriage return and linefeed characters. The request line
usually has GET as the method. However, there are several alternatives to the GET method
which makes HTTP potentially a more widely applicable protocol than just being used for
webpage access. The version has to be specified because HTTP has evolved so there is
more
than one version in use.
In Chapter 2 (Section 2.09), a sequence of events was described for when a browser
accesses
a webpage. This can now be presented as a sequence of protocol actions. The following is
an
abbreviated version:
6
7
TCP creates one or more packets and sends the first one to IP using port 80 for the
destination port and a temporary port number for the sending port.
IP uses the URL in the message to get an IP address using DNS and sends a datagram.
When a connection has been established, TCP sends the remaining packets, if any, to IP
which then forwards them through the server IP and TCP to the server application layer.
HTTP transmits a response message which is transmitted via TCP, IP, IP and TCP to the
client browser application.
All of this can happen with just one click on a bookmark item in a browser!
Email protocols
The traditional method of sending and receiving emails is schematically illustrated in Figure
17.07. It can be seen that three individual client-server interactions are involved. The client
has a connection to a mail server which then has to function as a client in the transmission
to
the mail server used by the receiver.
sender SMTP
client
server
SMTP
server
POP3 receiver
client
1. is a ‘puli’ protocol. There is a more recent alternative to POP3, which is IMAP (Internet
Message Access Protocol). IMAP offers the same facilities as POP3 but also a lot more.
This approach has been largely superseded by the use of web-based mail. A browser is
used
to access the email application, so HTTP is now the protocol used. However, SMTP remains
in
use for transfer between the mail servers.
265
For routine transfers of files from one user to another the most likely method is to attach the
file to an email. However, this is not always a suitable method. FTP (File Transfer Protocol)
is
the application-layer protocol that can handle any file transfer between two end-systems.
File transfer can be less than straightforward if the end-systems have different operating
systems with different file systems. FTP handles this by separating the control process from
the data-transfer process.
Ethernet is the other dominant protocol in the modern networked world. It is primarily
focused on LANs. Although Ethernet was first devised in the 1970s independently of any
organisation, it was later adopted for standardisation by the Institute of Electrical and
Electronics Engineers (IEEE). In particular it was their 802 committee (obviously one of
many!) that took responsibility for the development of the protocol. The standard for a wired
network is denoted as IEEE 802.3 which can be considered to be a synonym for Ethernet.
The standard has evolved through five generations: standard or traditional, fast, gigabit, 10
gigabit and 100 gigabit. The gigabit part of the name indicates the transfer speed capability.
Ethernet transmits data in frames. Each frame contains a source address and a destination
address. The address is the physical or MAC address, which uniquely defines one NIC, as
described in Section 17.03. The reason that a unique address can be guaranteed is that 48
bits
are used for the definition. The address is usually written in hexadecimal notation, for
example:
4A:30:12:24:1A:10
266
In a star topology LAN with a hub as the central device, why must a transmission be
broadcast?
Because of the broadcast transmission, there was a need for the access to the shared
medium by end-systems to be controlled. If there were no control, two messages sent at
the same time would ‘collide’ and each message would be corrupted. The method adopted
was CSMA/CD (carrier sense multiple access with collision detection). This relied on the fact
that if a frame was being transmitted there was a voltage level on the Ethernet cable which
could be detected by an end-system. If this was the case, the protocol defined a time that
the
end-system had to wait before it tried again. Flowever, because two end-systems could have
waited then both decided to transmit at the same time collisions could still happen. Thus
there was also a need to incorporate a means for an end-system to detect a collision and to
discontinue transmission if a collision occurred.
Although there might be some legacy standard Ethernet LANs still operating, the modern
implementation of Ethernet is switched. The star configuration has a switch as the central
device. The switch controls transmission to specific end-systems. Each end-system is
connected to the switch by a full-duplex link so no collision is possible along that link. Since
collisions are now impossible, CSMA/CD is no longer needed.
Ethernet is the most likely protocol to be operating in the data-link layer defined in the
TCP/IP protocol stack. Referring back to Figure 17.06, the diagram shows IP in the network
layer sending a datagram to the data-link layer. When the data-link layer uses Ethernet, the
protocol defines two sub-layers. The upper of these is the logical link-control layer, which
handles flow control, error control and part of the framing process. The lower is the media
access control (MAC) sublayer which completes the framing process and defines the access
method. The MAC layer transmits the frames that contain the physical addresses for sender
and receiver. This is the reason that these addresses are often referred to as MAC
addresses.
The BitTorrent protocol is the most used protocol because it allows fast sharing of files.
There
are three basic problems to solve if end-systems are to be confident in using BitTorrent:
• How does a peer find others that have the wanted content?
• How do peers encourage other peers to provide content rather just using the protocol to
download for themselves?
The answer provided by BitTorrent to the first question is to get every content provider to
provide a content description, called a torrent, which is a file that contains the name of the
tracker (a server that leads peers to the content) and a list of the chunks that make up the
content. The torrent file is at least three orders of magnitude smaller than the content so can
be transferred quickly. The tracker is a server that maintains a list of all the other peers (the
c swarnT) actively downloading and uploading the content.
The answer to the second question involves peers simultaneously downloading and
uploading
chunks but peers have to exchange lists of chunks and aim to download rare chunks for
preference. Each time a rare chunk is downloaded it automatically becomes less rare!
The answer to the third question requires dealing with the free-riders or ‘leechers’ who
only download. The solution is for a peer to initially randomly try other peers but then to
only continue to upload to those peers that provide regular downloads. If a peer is not
downloading or only downloading slowly, it will eventually be isolated or ‘choked’.
It is worth noting that the language of BitTorrent is somewhat esoteric and there are other
terms used which have not been mentioned. Fortunately the principles are straightforward.
All of the previous discussion in this chapter has related to transmission using a cable
medium. In today’s world, this is no longer the dominant technology. The following brief
discussion considers four important examples of wireless technology discussed in order of
increasing scale of operation.
Bluetooth
WiFi
WiFi (WLAN in some countries) is a term used by the public to describe what is sometimes
called wireless Ethernet but is formally IEEE 802.11. This is a wireless LAN protocol which
267
uses radio frequency transmission. Most often a WiFi LAN is centred on a wireless access
point in an ‘infrastructure’ network (i.e. not an ad hoc network). The wireless access point
communicates wirelessly with any end-systems that have connected to the device. It also
has
a wired connection to the Internet.
268
WiMAX
WiMAX (Worldwide Interoperability for Microwave Access) or IEEE 802.16 is a protocol for a
MAN or WAN. It is designed for use by PSTNs to provide broadband access to the Internet
without having to lay underground cables. Local subscribers connect to the antenna of a
local base station using a microwave signal.
Cellular networks
The technology available in a mobile phone has progressed dramatically through what are
described as generations:
• 2G went digital.
• 3G introduced multimedia and serious Internet connection capability.
Part 3
Exam-style Questions
1 a There are five requirements for a data communication system. State the five
requirements,
b An isolated wired network is to be used as a data communication system.
ii For each topology, explain why there is or is not direct point-to-point connections between
the end-systems,
c Each end-system is fitted with a network interface card (NIC).
ii Explain what would happen if the NIC in an end-system was replaced by a newer version.
2 One end-system with an Internet connection has a file. A user on another end-system
connected to the Internet
needs a copy of the file. There are different methods that might be used to enable the userto
obtain a copy of the file.
b Identify the application-layer protocols that each method will use with a brief explanation
for each one.
i Describe, with the aid of a diagram, a network topology that could be used with standard
Ethernet.
i Draw a diagram to illustrate how the combination of Ethernet and the TCP/IP suite provides
support for data
communication.
[2]
[4]
[3]
[2]
[6]
[8]
[3]
269
[4]
[5]
[3]
270
Chapter 18
Learning objectives
Whenever a form of algebra is used it is vital that there is an understanding of its meaning.
As
a simple example we can consider the following four interpretations of the meaning of 1 + 1:
1+1=2
1 + 1 = 10
1+1=0
1+1=1
The first shows denary arithmetic, the second binary arithmetic and the third bit arithmetic.
The last one applies if Boolean algebra is being used. This is because in Boolean algebra 1
represents TRUE, 0 represents FALSE, and + represents OR. Therefore the fourth
statement
represents the logic statement:
There are options for the representation of Boolean algebra. For example, the symbols for
AND and OR are sometimes represented as a and v. There is the option of writing A.B or AB
for AND. The dot notation is used in this book. Finally, there are options for how NOT A (the
inverse of A) can be represented. A is used here.
Having established the notation for Boolean algebra we have to consider the rules that
apply.
These can formally be described as ‘laws’ or ‘identities’. Table 18.01 contains a full listing.
Identity/Law
AND form
OR form
Identity
i—*
ii
0+A = A
Null
o
II
<
1+A= 1
Idempotent
<
II
<
<
A+A = A
Inverse
II
A+A= 1
Commutative
<
CD
II
CD
<
A+B = B+A
Associative
(A.B).C = A.(B.C)
(A+B)+C = A+(B+C)
Distributive
A+B.C = (A+B).(A+C)
A.(B+C) = A.B+A.C
Absorption
A.(A+B) = A
A+A.B = A
De Morgan’s
(A7B) =A + B
(ATB)=A. B
Double Complement
=A
Some of the names used for the identities may be unfamiliar to you. This is not a concern.
You should note that for all but one of the identities there is an AND form and an OR form.
Furthermore, it is important to note that an identity written in one form can be transformed
into
271
the other by interchanging each 0 or 1 and each AND and OR. For example, O.A = 0 which
reads
FALSE AND A is FALSE transforms into TRUE OR A is TRUE, written in the algebra as 1+A
= 1.
It can also be seen that some of the identities look like those applying in normal algebra with
AND functioning as multiplication and OR functioning as addition. Thus it is allowed for the
terms ‘product’ and ‘sum’ to be used in the context of Boolean algebra.
TASK 18.01
It is vital that you can interpret a Boolean expression correctly. Go through Table 18.01 item
by
item and in each case read out the full meaning. For example:
1+A = 1 can be read as ‘one plus A equals 1’
but must be understood as ‘TRUE OR A is TRUE’
Although De Morgan’s laws look complicated at first glance, they can be rationalised easily.
The inverse of a Boolean product becomes the sum of the inverses of the individual values
in
the product. The inverse of a Boolean sum is the product of the individual inverses.
Unfortunately, using the algebra to simplify expressions is not something which can be learnt
as a
routine. It almost inevitably requires a little lateral thinking as Worked Example 18.01 will
show.
In order to simplify the expression we have to first make it more complicated! This is
where the lateral thinking comes in. The OR form of the absorption identity is A+A.B = A.
This can be used in reverse to replace A by A+A.B to produce the following:
A+A.B+A.B
Applying the AND form of the commutative law and the OR form of the distributive law in
reverse we can see that:
This allows us to use the OR form of the inverse identity which converts A+A to 1. Asa
result the expression has become:
A+B.l
When the OR form of the commutative law and the AND form of the identity law are
applied to the B.l term, it then becomes A+B.
Chapter 4 introduced the symbols for logic gates that are used in logic circuits and
discussed
the relationships between logic circuits, truth tables and logic expressions. This chapter
introduces some specific circuits that are used to construct components that provide
functionality in computer hardware.
A fundamental operation in computing is binary addition. The result of adding two bits is
either 1 or 0. However, when 1 is added to 1 the result is 0 but there is a carry bit equal to 1.
This cannot be ignored if two numbers with several bits in each
are being added.
The simplest circuit that can be used for binary addition is the
half adder. This can be represented by the diagram in Figure
18.01. The circuit takes two input bits and outputs a sum bit (S)
and a carry bit (C).
1 bit
half adder
circuitry
This is only one of several circuits that would provide the functionality. As was explained in
Chapter 4 (Section 4.05), circuit manufacturers prefer to use either NAND or NOR gates.
The
circuit shown in Figure 18.02 consisting only of NAND gates has the correct logic to produce
the C and S outputs and is a likely choice for implementation.
Question 18.01
In Figure 18.02, can you identify the individual circuits that represent the AND operator and
the XOR operator?
TASK 18.02
Use the intermediate points labelled W, X and Y to construct a truth table for the circuit
shown
in Figure 18.02. Check that this reproduces the truth table shown as Table 18.02.
If two numbers expressed in binary with several bits are to be added, the
addition must start with the two least significant bits and then proceed to the
most significant bits. At each stage the carry from the previous addition has
to be incorporated into the current addition. If a half adder is used each time,
there has to be separate circuitry to handle the carry bit because the half adder
only takes two inputs.
The full adder is a circuit that has three inputs including the previous carry bit.
The truth table is shown as Table 18.03.
One possible circuit for implementation contains two half adder circuits and an
OR gate as shown in Figure 18.03.
Cin
Input
Output
Cin
Cout
0
0
1
1
Cout
As before, it is possible to construct the circuit entirely from NAND gates as shown in Figure
18.04.
Cout
Can you see how full adders could be combined to handle addition of, for example, four-bit
bin.ary numbers? What happens to the carry input for the first addition?
The SR flip-flop
All of the circuits so far encountered in this book have been combinational circuits. For
such a circuit the output is dependent only on the input values. An alternative type of circuit
is a sequential circuit where the output depends on the input and on the previous output.
KEY TERMS
Combinational circuit: a circuit in which the output is dependent only on the input values
Sequential circuit: a circuit in which the output depends on the input values and the previous
output
Part
It can be constructed with two NAND gates or two NOR gates. Figure
18.05 shows the version with two NOR gates. The flip-flop is a two-
state device. Either it has Q set to 1 and Q' set to 0 or it has the reverse.
The truth table for the circuit can be presented as shown in Table
18.04. The two lines of the truth table where both Sand R are
input as 0 produce no change in the values set for Q or Q'. This is
the condition when no signal is input to the flip-flop. Input of S = 1
and R = 0 always produces Q = 1 and Q' = 0. Input of S = 0 and R = 1
always produces the reverse.
The JK flip-flop
Q'
(a) (b)
Figure 18.06 (a) A symbol for a JK flip-flop and (b) a possible circuit
The workings of the circuit are viewed in terms of the value of
the Q output immediately after the circuit detects a clock pulse.
One approach to creating a Boolean algebra expression for a particular problem is to start
with the truth table and apply the sum of products method. This establishes a minterm for
each row of the table that results in a 1 for the output.
Clock
Q unchanged
Q toggles
Table 18.05 Part of the truth table for a JK
flip-flop
This can be illustrated using the truth table for the half adder circuit shown in Figure 18.02.
The only row of the table creating a 1 output for C has a 1 input for A and for B. The product
becomes A.B and the sum has only this one term so we have:
OA.B
For the S output, there are two rows that produce a 1 output so there is a sum of two
minterms:
S = A.B +A.B
Note that the 0 in a row is represented by the inverse of the input symbol.
This approach can also be used as part of the process of creating a Boolean algebra logic
expression from a circuit diagram. The truth tables for the individual logic gates are used and
then some algebraic simplification is applied.
For convenience Figure 18.02 is reproduced here as Figure 18.07. Examination of the figure
shows inputs A and B to a NAND gate with output W.
The first three rows of the NAND truth table produce a 1 output so the sum of products
has three minterms:
W = A.B+A.B + A.B
We can now consider the input of W to a NAND gate with A as the other input to produce
the X output. The NAND gate operates as an AND gate followed by a NOT gate. The result
of the AND operation is the product ofthe inputs so:
X = A.(A.B+A.B + A.B)
We have to take the inverse of this to complete the NAND operation. This is where we need
the AND version of De Morgan’s law, which transforms the A.B into A+B.
The same laws applied to the output Y from the other intermediate NAND gate to give
Y = A+B.
Parts
Finally, we need to consider A+B and A+B being input to the final NAND gate. Again we can
consider the AND operation first as the product of the inputs:
S = (A+B).(A+B)
If we pause to think we will not multiply this out but instead we will apply De Morgan’s law
directly to this to perform the inverse operation to complete the NAND operation. This
gives:
S = A.B + A.B
This is the value obtained directly from the truth table so the algebra has been used
correctly.
Worked Example 18.02 did not show that the circuit produced the correct output for C. Also
a
shortcut was used to reach the final form of S. Can you use Boolean algebra to find the form
of C from the circuit and can you convert the expression for S if you start by using the
distributive law before applying De Morgan’s law?
X = A.B + A. B + A.B
X
0
This is not instantly recognisable as A+B but, with a little effort, using
Boolean algebra laws it could be shown to be the same.
BB
A
A
1—1
• Within each group, the only input values retained are those which retain a constant value
throughout the group.
These rules define a column and a row group as indicated by the blue outlines. In the
column
group, B remains unchanged but A changes so B is retained. In the row group, it is A that
remains unchanged. The Boolean algebra expression is then just the sum of these retained
values:
X = A+B
278
Thus the Karnaugh map has found the OR expression without using any algebra.
0
1
1
1
Before starting any application of a method it is always worth looking to see if there are
any trends. In this case you can see that whenever B = 1 the output for X is 1. This means
that the final algebra should have B + something.
There are options for how the K-map is presented. We will choose to combine input values
in the columns. Figure 18.09 shows the result. This follows the convention of having the
rows corresponding to values of A and the columns to combinations of values for B and C.
r-
11
11
1_
1
Figure 18.09 A K-map representation of the truth table shown in Table 18.07
It is important to note that the labelling of the columns does not follow a binary value
pattern. Instead it follows the Gray coding sequence, where only one bit changes value
each time.
Part 3
Following the rules stated above, the first group to identify is the square of four cells with
a value 1 as identified by the blue rectangle in the diagram. For these it can be seen that
A has different values, B has a constant value but C changes values. So, only B is retained.
Note this was anticipated from the initial inspection of the truth table.
This apparently only leaves the top left cell. It looks like an isolated cell but it is not
because K-maps wrap round. The cell is defined by BC = 00. This has two adjacent cells
under Gray coding rules. One is immediately obvious - BC = 01 but this contains 0 so
can be ignored. The other adjacent cell is the BC = 10 combination. Thus, there is a row
group containing BC = 00 and BC = 10, indicated by the dotted line partial group outlines.
Note that we cannot include the 11 cell in the same row because a group cannot contain
three members. For this row, the value A remains unchanged, B changes but C remains
unchanged so the product A. C results. So by adding this to the B for the other group the
final expression becomes:
A.C + B
This is much simpler than the expression with five minterms derived directly from the truth
table.
Consider the Karnaugh map shown in Figure 18.10. This corresponds to a problem with four
inputs. It wraps round horizontally and vertically. Use the map to create a Boolean algebra
expression.
279
Exam-style Questions
Inputs
Working space
Outputs
b For the circuit shown in part (a), identify the type of circuit and what the outputs represent.
2 a Consider the following truth table:
0
1
[2]
[5]
[3]
i Using the sum-of-products approach, create a Boolean expression that matches the logic.
[3]
Part 3
3a
ii For the rows that have A = 1, the output for X is 1. Explain how this would be reflected in a
simplified form
of Boolean expression matching the truth table.
[2]
i Usingyour knowledge of the truth table for an AND gate, create a Boolean algebra
expression for the output from the first AND gate.
ii Carry out the same exercise for the OR gate in the circuit.
iii Using De Morgan’s law, create the logic expression for the output from the NOT gate.
Consider the following truth table:
[2]
[3]
[4]
A
B
1-
ii Use the Karnaugh map to create a Boolean algebra expression for this logic.
[4]
[3]
i Use the sum-of-products method to create a Boolean algebra expression from the truth
table. [3]
ii Use Boolean algebra to show that this expression can be simplified to give the same
expression created
from the Karnaugh map. (Hint: you might wish to use the fact that A.B=A.B +A.B).
[4]
Learning objectives
One method is for the control unit to be constructed as a logic circuit. This is called the hard¬
wired solution. The machine-code instructions are handled directly by hardware.
The alternative is for the control unit to use microprogramming. In this approach, the control
unit contains a ROM component in which is stored the microinstructions or microcode for
microprogramming. This is often referred to as firmware. The choice of which method is
used
is largely dependent on the type of processor.
The ‘architecture’ of a processor can be defined in a number of ways. From the point of view
of a sophisticated programmer, the architecture involves the following:
The choice of the instruction set is the main factor in deciding on a suitable architecture.
One view is that the instruction set should be chosen so that it can be clearly applied to
important problems, that only simple equipment is required and that important problems
are handled speedily. An opposing view is that it should be chosen to suit the needs of high-
level languages.
RISC
CISC
Fewer instructions
More instructions
Simpler instructions
Multi-cycle instructions
Fixed-length instructions
Variable-length instructions
Fewer registers
Pipelining easier
284
In contrast, the specialised instructions that can be part of a CISC architecture often require
repeated memory access. The complexity of some of the instructions makes hard-wiring
extremely difficult so microprogramming is the norm. However, the increased complexity
of instructions for CISC is often because they more closely match high-level language
constructs. This means that compiler writing becomes much easier for a CISC processor.
Can you find out whether the processors in any systems you are using are described as
RISC
or CISC?
One of the major driving forces for creating RISC processors was the opportunity they would
provide for efficient pipelining. Pipelining is a form of parallelism applied specifically to
instruction execution. Other forms of parallelism are discussed in Section 19.03.
KEY TERMS
Pipelining: instruction-level parallelism
It can be seen that once under way the pipeline is handling five stages of five individual
instructions. In particular, at each clock cycle the complete processing of one instruction has
finished. Without the pipeline the processing time would be five times longer.
Clock cycles
7
IF
1.1
2.1
3.1
4.1
5.1
6.1
7.1
ID
1.2
2.2
3.2
4.2
5.2
6.2
OF
1.3
2.3
3.3
4.3
5.3
IE
1.4
2.4
3.4
4.4
WB
1.5
2.5
3.5
One issue that has to be dealt with regarding a pipelined processor is interrupt handling.
The discussion in Chapter 5 (Section 5.06) referred to a processor with instructions handled
sequentially. In the pipelined system described above there will be five instructions in the
pipeline when an interrupt occurs. One option for handling the interrupt is to erase the
pipeline
contents for the latest four instructions to have entered. Then the normal interrupt-handling
routine can be applied to the remaining instruction. The other option is to construct the
individual units in the processor with individual program counter registers. This allows
current
data to be stored for all of the instructions in the pipeline while the interrupt is handled.
Discussion Point:
These are typical three-register instructions favoured for RISC. The first adds the contents
of registers R2 and R3 and stores the result in Rl. The next instruction is similar but uses
the value stored in Rl. In a pipelined structure, the second instruction will be reading the
contents of Rl before the previous instruction has placed the value there. How could this
potential problem be overcome?
SISD (Single Instruction Single Data stream) is the typical arrangement found in early
personal computers. There is a single processor so no processor parallelism. The single
data
stream just means one memory.
SIMD (Single Instruction Multiple Data stream) describes how an array or vector processor
works. The multiple processors each have their own memory. One instruction is input and
each processor executes this instruction using data available in its dedicated memory.
MISD (Multiple Instruction Single Data stream) isn’t implemented in commercial products.
MIMD (Multiple Instruction Multiple Data stream) has examples in modern personal
computers which are of the symmetric multiprocessor type using identical processors. In this
case, each processor executes a different individual instruction. The multiple data stream
can be provided by a single memory suitably partitioned. Each processor might have a
dedicated cache memory.
Examples of one type of multicomputer system are called massively parallel computers.
These are the systems used by large organisations for computations involving highly
complex
mathematical processing. They are the latest in an evolution of what have traditionally
been called ‘supercomputers’. The major difference in architecture is that instead of
having a bus structure to support multiple processors there is a network infrastructure to
support multiple computer units. The programs running on the different computers can
communicate by passing messages using the network.
An alternative type of multicomputer system is cluster computing, where a very large number
of PCs are networked.
285
286
i Explain what this term means and which specific part of the processor will be hard-wired.
[3]
ii State what the alternative to hard-wiring is and what hardware component is needed to be
part of the
Learning objectives
Before considering the purposes of an operating system (OS), we need to present the
context
in which it runs. A computer system needs a program that begins to run when the system is
first switched on. At this stage, the operating system programs are stored on disk so there
is no operating system. However, the computer has stored in ROM a basic input output
system (BIOS) which starts a bootstrap program. It is this bootstrap program that loads the
operating system into memory and sets it running.
An operating system can provide facilities to have more than one program stored in memory.
Only one program can access the CPU at any given time but others are ready when the
opportunity arises. This is described as multi-programming. This will happen for one single
user. Some systems are designed to have many users simultaneously logged in. This is a
time-sharing system.
The purposes of an operating system can usefully be considered from two viewpoints:
an internal viewpoint and an external viewpoint. The internal viewpoint concerns how
the activities of the operating system are organised to best use the resources available.
The external viewpoint concerns the facilities made available for system usage. Chapter 7
(Section 7.02) contained a categorised summary of the various activities that an operating
system engages in. This chapter discusses some of them in more detail.
Resource management
• the CPU
• the memory
Resource management relating to the CPU concerns scheduling to ensure efficient usage.
The methods used are described in Section 20.03. These methods consider the CPU as a
single unit; specific issues relating to a multiprocessor system are not considered. Resource
management relating to the memory concerns optimum usage of main memory.
The I/O system does not just relate to input and output that directly involves a computer
user. It also includes input and output to storage devices while a program is running. Figure
20.01 shows a schematic diagram that illustrates the structure of the I/O system.
The bus structure in Figure 20.01 shows that there can be an option for the transfer of data
between an I/O device and memory. The operating system can ensure that I/O passes via
the
CPU but for large quantities of data the operating system can ensure direct transfer between
Device
Data rate
Keyboard
10 Bps
0.1 s
Screen
50 MBps
2 x10" 8 S
Disk
5 MBps
2 X 10‘ 7 S
The user interface may be made available as a command line, a graphical display or a voice
recognition system but the function is always to allow the user to interact with running
programs. When a program involves use of a device, the operating system provides the
device driver: the user just expects the device to work. (You might, however, wish to argue
that printers do not always quite fit this description.)
The operating system will provide a filesystem fora user to store data and programs. The
user has to choose filenames and organise a directory (folder) structure but the user does
not
have to organise the physical data storage on a disk. If the user is a programmer, the
operating
system supports the provision of a programming environment. This allows a program to be
created and run without the programmer being familiar with how the processor functions.
When a program is running it can be considered to be a type of user. The operating system
provides a set of system calls that provide an interface to the services it offers. For instance,
if
a program specifies that it needs to read data from a file, the request for the file is converted
into a system call that causes the operating system to take charge, find the file and make it
available to the program. An extension of this concept is when an operating system provides
an application programming interface (API). Each API call fulfils a specific function such as
creating a screen icon. The API might use one or more system calls. The API concept aims
to
provide portability for a program.
289
290
In this model, application programs or utility programs could make system calls to the
kernel. However, to work properly each higher layer needs to be fully serviced by a lower
layer (as in a network protocol stack).
This is hard to achieve in practice. A more flexible approach uses a modular structure,
illustrated in Figure 20.03. The structure works by the kernel calling on the individual
services
when required. It could possibly be associated with a micro-kernel structure where the
functionality in the kernel is reduced to the absolute minimum.
Programs that are available to be run on a computer system are initially stored on disk. In
a time-sharing system a user could submit a program as a ‘job’ which would include the
program and some instructions about how it should be run. Figure 20.04 shows an overview
of the components involved when a program is run.
Process states
In Chapter 7 (Section 7.02), it was stated that a process can be defined as ‘a program being
executed’. This definition is perhaps better slightly modified to include the state when the
program first arrives in memory. At this stage a process control block (PCB) can be created
in memory ready to receive data when the process is executed. Once in memory the state of
the process can change.
The transitions between the states shown in Figure 20.05 can be described as follows:
• A new process arrives in memory and a PCB is created; it changes to the ready state.
• A process in the ready state is given access to the CPU by the dispatcher; it changes to the
running state.
• A process in the running state is halted by an interrupt; it returns to the ready state.
• A process in the running state cannot progress until some event has occurred (I/O
perhaps);
it changes to the waiting state (sometimes called the ‘suspended’ or ‘blocked’ state).
• A process in the waiting state is notified that an event is completed; it returns to the ready
state.
• A process in the running state completes execution; it changes to the terminated state.
Figure 20.05 The five states defined for a process being executed
It is possible for a process to be separated into different parts for execution. The separate
parts
are called threads. If this has happened, each thread is handled as though it were a process.
KEY TERMS
Process control block (PCB): a complex data structure containing all data relevant to the
running of
a process
Interrupts
Some interrupts are caused by errors that prematurely terminate a running process.
Otherwise there are two reasons for interrupts:
• Processes consist of alternating periods of CPU usage and I/O usage. I/O takes far too
long
for the CPU to remain idle waiting for it to complete. The interrupt mechanism is used
when a process in the running state makes a system call requiring an I/O operation and
has to change to the waiting state.
• The scheduler decides to halt the process for one of several reasons, discussed in the next
section (‘Scheduling algorithms’).
Whatever the reason for an interrupt, the OS kernel must invoke an interrupt-handling
routine. This may have to decide on the priority of an interrupt. One required action is
that the current values stored in registers must be recorded in the process control block.
This allows the process to continue execution when it eventually returns to the running
state.
Discussion Point:
What would happen if an interrupt was received while the interrupt-handling routine was
being executed by the CPU? Does this require a priority being set for each interrupt?
292
Scheduling algorithms
Although the long-term or high-level scheduler will have decisions to make when choosing
which program should be loaded into memory, we concentrate here on the options for the
short-term or low-level scheduler.
The simplest possible algorithm is first come first served (FCFS). This is a non-preemptive
algorithm and can be implemented by placing the processes in a first-in first-out (FIFO)
queue. It will be very inefficient if it is the only algorithm employed but it can be used as part
of a more complex algorithm.
A round-robin algorithm allocates a time slice to each process and is therefore preemptive,
because a process will be halted when its time slice has run out. It can be implemented as a
FIFO queue. It normally does not involve prioritising processes. However, if separate queues
are created for processes of different priorities then each queue could be scheduled using a
round-robin algorithm.
A priority-based scheduling algorithm is more complicated. One reason for this is that every
time a new process enters the ready queue or when a running process is halted, the
priorities
for the processes may have to be re-evaluated. The other reason is that whatever scheme is
used to judge priority level it will require some computation. Possible criteria are:
• estimated time of process execution
More than one of these criteria might be considered. Clearly, estimating a time for execution
may not be easy. Some processes require extensive I/O, for instance printing wage slips for
employees. There is very little CPU usage for such a process so it makes sense to allocate it
a high priority so that the small amount of CPU usage can take place. The process will then
change to the waiting state while the printing takes place.
The term memory management embraces a number of aspects. One aspect concerns the
provision of protected memory space for the 05 kernel. Another is that the loading of a
program into memory requires defining the memory addresses for the program itself, for
associated procedures and for the data required by the program. In a multiprogramming
system, this might not be straightforward. The storage of processes in main memory can get
fragmented in the same way as happens for files stored on a hard disk. There may be a
need
for the medium-term scheduler to move a process out of main memory to ease the problem.
The most flexible approach to memory management is to use virtual memory based on
paging but with no requirement for all pages to be in memory at the same time. In a virtual
Part 3
memory system, the address space that the CPU uses is larger than the physical main
memory space. This requires the CPU to transfer address values to a memory management
unit that allocates a corresponding address on a page.
KEYTERMS
Virtual memory: a paging mechanism that allows a program to use more memory addresses
than are
available in main memory
The starting situation is that the set of pages comprising the process are stored on disk. One
or more of these pages is loaded into memory when the process is changing to the ready
state. When the process is dispatched to the running state, the process starts executing. At
some stage, it will need access to pages still stored on disk which means that a page needs
to
be taken out of memory first. This is when a page replacement algorithm is needed. A simple
algorithm would use a first-in first-out method. A more sensible method would be the least-
recently-used page but this requires statistics of page use to be recorded.
One of the advantages of the virtual memory approach is that a very large program can be
run
when an equally large amount of memory is unavailable. Another advantage is that only part
of
a program needs to be in memory at any one time. For example, the index tables for a
database
could be permanently in memory but the full tables could be brought in only when required.
The system overhead in running virtual memory can be a disadvantage. The worst problem
is ‘disk thrashing’, when part of a process on one page requires another page which is on
disk.
When that page is loaded it almost immediately requires the original page again. This can
lead to almost perpetual loading and unloading of pages. Algorithms have been developed
to guard against this but the problem can still occur, fortunately only rarely.
Although virtual memory could be used in a system running a virtual machine, the two are
completely different concepts that must not be confused. Also note that the Java virtual
machine discussed in Chapter 7 (Section 7.05) is based on a different underlying concept.
The principle of a virtual machine is that a process interacts directly with a software interface
provided by the operating system. The kernel of the operating system handles all of the
interactions with the actual hardware of the host system. The software interface provided for
the virtual machine provides an exact copy of the hardware. The logical structure is shown in
Figure 20.06.
293
The advantage of the virtual machine approach is that more than one different operating
system can be made available on one computer system. This is particularly valuable if an
organisation has legacy systems and wishes to continue to use the software but does not
wish to keep the aged hardware. Alternatively, the same operating system can be made
available many times. This is done by companies with large mainframe computers that offer
server consolidation facilities. Different companies can be offered their own virtual machine
running as a server.
One drawback to using a virtual machine is the time and effort required for implementation.
Another is the fact that the implementation will not offer the same level of performance that
would be obtained on a normal system.
A compiler can be described as having a ‘front end’ and a ‘back end’. The front-end program
performs analysis of the source code and produces an intermediate code that expresses
completely the semantics (the meaning) of the source code. The back-end program then
takes this intermediate code as input and performs synthesis of object code. This analysis-
synthesis model is represented in Figure 20.07.
For simplicity, Figure 20.07 assumes no error in the source code. There is a repetitive
process
in which the source code is read line-by-line. For each line, the compiler creates matching
intermediate code. Figure 20.07 also shows how an interpreter program would have the
same
analysis front-end: In this case, however, once a line of source code has been converted to
intermediate code, it is executed.
• lexical analysis
• syntax analysis
• semantic analysis
Part 3
In lexical analysis each line of source code is separated into tokens. This is a pattern¬
matching exercise. It requires the analyser to have knowledge of the components that can
be
found in a program written in the particular programming language.
The analyser must categorise each token. For instance, in the first example, var and integer
must be recognised as keywords. The non-alphanumeric characters such as [or * must be
categorised. The := is a special case; the analyser must recognise that this is one operator
with two characters that must not be separated.
Finally, all identifiers such as count and PercentMark must be recognised as such and
an entry for each must be made in the symbol table (which could have been called the
identifier table). The symbol table contains identifier attributes such as the data type,
where it is declared and where it is assigned a value. The symbol table is an important
data structure for a compiler. Although Figure 20.08 shows it only being used by the syntax
analysis program, it is also used by later stages of compilation.
KEY TERMS
Symbol table: a data structure in which each record contains the name and attributes of an
identifier
Figure 20.09 shows the parse tree for the following assignment statement:
y := 2 * x + 4
Note that the hierarchical structure of the tree, if correctly interpreted, ensures that the
multiplication of 2 by x is carried out before the addition of 4.
Semantic analysis is about establishing the full meaning of the code. An annotated
abstract syntax tree is constructed to record this information. For the identifiers in this
2x
296
tree an associated set of attributes is established including, for example, the data type.
These
attributes are also recorded in the symbol table.
An often-used intermediate code created by the last stage of front-end analysis is a three-
address code. As an example the following assignment statement has five identifiers
requiring five addresses:
y := a + (b * c - d) / e
This could be converted into the following four statements, each requiring at most three
addresses:
temp := b * c
temp := temp - d
temp := temp / e
y := a + temp
::= ||
::= 0|1|2|3|4|5|6|7|8|9
::= |
::= A|B|C|D|E|F|G|H|l|J|K|L|M|N|0|P|Q|R|S|T|U|V|W|X|Y|Z
::= a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
The use of | is to separate individual options. The ::= characters can be read as ‘is defined
as’. Note the recursive definition of in this particular version of BNF. Without the
use of recursion the definition would need to be more complicated to include all possible
combinations following the initial .
A syntax diagram is only used in the context of a language. It has limited use because it
cannot be incorporated into a compiler program as an algorithm. By contrast, BNF is a
general approach which can be used to describe any assembly of data. Furthermore, it can
be used as the basis for an algorithm.
If the front-end analysis has established that there are syntax errors, the only back¬
end process is the presentation of a list of these errors. For each error, there will be an
explanation and the location within the program source code.
In the absence of errors, the main back-end stage is machine code generation from the
intermediate code. This may involve optimisation of the code. The aim of optimisation
is to create an efficient program; the methods that can be used are diverse. One type of
optimisation focuses on features that were inherent in the original source code and have
been propagated into the intermediate code. As a simple example, consider these
successive
assignment statements:
x := (a + b) (a - b)
y := (a + 2 b) * (a - b)
temp := (a - b)
x := (a + b) temp
y := x + temp b
Question 20.01
Check the maths for the efficient code defined above.
Another example is when a statement inside a loop, which is therefore executed for each
repetition of the loop, does the same thing each time. Optimisation would place the
statement immediately before the loop.
The other type of optimisation is instigated when the machine code has been created. This
type of optimisation may involve efficient use of registers or of memory.
Evaluation of expressions
An assignment statement often has an algebraic expression defining a new value for an
identifier. The expression can be evaluated by firstly converting the infix representation in the
code to Reverse Polish Notation (RPN). RPN is a postfix representation which never
requires
brackets and has no rules of precedence.
297
a+b*c
The conversion to RPN has to take into account operator precedence so the first step is to
convert b * c to get the intermediate form:
a+bc*
We then convert the two terms to give the final RPN form:
abc*+
Ifthe original expression had been (a + b) * c (where the brackets were essential) then
the conversion to RPN would have given:
ab+c*
x2y3 + 6/
The process is as follows. The RPN is scanned until two identifiers are followed by an
operator. This combination is converted to give an intermediate form (brackets are used for
clarification):
(x 2) y 3 + 6 /
(X *
1.
(y
3.
6/
(X *
1.
(y *
1. 6 /
((X *
1.
(y *
3)) / 6
Because of the precedence rules, some of the brackets are unnecessary; the final version
could be written as:
(x 2 + y 3) / 6
In the syntax analysis stage, an expression is represented as a syntax tree. The expression
a + b * c would be presented as shown in Figure 20.11.
Figure 20.11 Syntax tree for an infix expression
To create this tree, the lowest precedence operator (+) is positioned at the root. If there are
several with the same precedence, the first one is used. The RPN form of the expression
can now be extracted by a post-order traversal. This starts at the lowest leaf to the left of
the root and then uses left-right-root ordering which ensures, in this case, that the RPN
representation is:
abc*+
Stack
line
To convert an infix expression to RPN using a stack, the shunting-yard algorithm is used
(Figure 20.12).
The rules of the algorithm are to consider the string of tokens representing the infix
expression. These represent the railroad waggons that are to be shunted from the infix line
to the RPN line. The tokens are examined one by one. For each one, the rules are:
o If the stack line is empty or contains a lower precedence operator, the operator is
diverted into the stack line.
o If the stack line contains an equal or higher preference operator, then that operator
is popped from the stack into the RPN expression line and the new operator takes
its place on the stack line.
• When all tokens have left the infix line, the operators remaining on the stack line are
popped one by one from the stack line onto the RPN expression line.
Consider the infix expression a + b c. Table 20.02 traces the conversion process. The
first operator to enter the stack line is the + so when the higher precedence comes later
it too enters the stack line. At the end the * is popped followed by the +.
Infix line
Stack line
RPN line
a+b*c
b*c
b*c
*c
ab
ab
abc
abc*
abc*+
Had the infix expression been a b + c then would have been first to enter the stack
line but it would have been popped from the stack before + could enter.
299
A stack can be used to evaluate an RPN expression. Let’s consider the execution of the
following RPN expression when x has the value 3 and y has the value 4:
x2y3 + 6/
The rules followed here are that the values are added to the stack in turn. The process is
interrupted if the next item in the RPN expression is an operator. This causes the top two
items to be popped from the stack. Then the operator is used to create a new value from
these two and the new value is added to the stack. The process then continues. Figure.
20.13 shows the successive contents of tine stack with an indication of when an operator
has been used. The intermediate states of the stack when two values have been popped
are not shown.
12
18
18
1 Convert the following infix expressions into RPN using the methods described in
Worked Examples 20.01,20.02 and 20.03:
(x - y) / 4
3 * (2 + x / 7)
2 Convert the following RPN expressions into the corresponding infix expressions:
4ab + c + d + e + *
y2^z3^ + /
Note that the caret ( A ) symbol represents ‘to the power of.
3 Using simple values for each variable in part 2, use the infix version to evaluate the
expression. Then use the stack method to evaluate the RPN expression and check
that you get the same result.
It needs to be understood that the use of RPN would be of little value if the simple processor
with a limited instruction set discussed in Chapter 6 (Section 6.04) was being used. Modern
processors will have instructions in the instruction set that handle stack operations, so a
compiler can convert expressions into RPN knowing that conversion to machine code can
utilise these and allow stack processing in program execution.
Part 3
Summary
• For the user, the operating system provides an interface, a file system and application
programming interfaces.
• There are five states for a process: new, ready, running, waiting and terminated.
• A process may be interrupted by an error, a need for an I/O activity or the scheduling
algorithm.
• In a virtual machine, a process interacts with a software interface provided by the operating
system.
• Compiler operation has a front-end program providing analysis and a back-end program
providing synthesis.
Exam-style Questions
ii In one model for the execution of a program, there are five defined process states. Identify
three of them
and explain the meaningof each.
i Identify two scheduling algorithms and for each classify its type.
ii A scheduling algorithm might be chosen to use prioritisation. Identify two criteria that could
be used to
assign a priority to a process.
i Identify which of the techniques in part (a) is used to create virtual memory.
iii Explain one problem that can occur in a virtual memory system.
i A compiler is modelled as containing a front end and a back end. State the overall aim of
the front end
ii Identify two processes which are part of the front end. [2]
g
iii Identify two processes which are part of the backend. [2]
::=.
::=.
::=.
::=. [4]
d Convert the Reverse Polish Notation expression 2 a3b6c- + into infix notation. [2]
302
Encryption can be used as a routine procedure when storing data within a computing
system. However, the focus in this chapter is on the use of encryption when transmitting
data
over a network.
The use of encryption is illustrated in Figure 21.01.The process starts with original data
referred to as plaintext, whatever form it takes. This is encrypted by an encryption algorithm
which makes use of a key. The product of the encryption is ciphertext, which is transmitted
to the recipient. When the transmission is received it is decrypted using a decryption
algorithm and a key to produce the original plaintext.
KEY TERMS
Security concerns
• Confidentiality: Only the intended recipient should be able to decrypt the ciphertext.
• Non-repudiation: Neither sender nor receiver should be able to deny involvement in the
transmission.
• Availability: Nothing should happen to prevent the receiver from receiving the transmission.
This chapter will consider only confidentiality, authenticity and integrity.
Encryption methods
The fundamental principle of encryption is that the encryption algorithm must not be
a secret: it must be in the public domain. In contrast, an encryption key must be secret.
However, this is not quite the full story. There are two alternative approaches. One is
symmetric key encryption; the other is asymmetric key encryption.
In symmetric key encryption there is just one key which is used to encrypt and then
to decrypt. This key is a secret shared by the sender and the receiver of a message. In
asymmetric key encryption two different keys are used, one for encryption and a different
one for decryption. Only one of these is a secret.
Part 3
So, how does this work? What happens at the sending end is straightforward. The sender
has
a key which is used to encrypt some plaintext and the ciphertext produced is transmitted
to the receiver. The question is, how does the receiver get to have the key needed for
decryption? If symmetric key encryption is used, there needs to be a secure method for the
sender and receiver to be provided with the secret key.
Using asymmetric key encryption, the process actually starts with the receiver. The receiver
must be in possession of two keys. One is a public key which is not secret. The other is a
private key which is secret and known only to the receiver. The receiver can send the public
key to a sender, who uses the public key for encryption and sends the ciphertext to the
receiver. The receiver is the only person who can decrypt the message because the private
and public keys are a matched pair. The public key can be provided to any number of
different people allowing the receiver to receive a private message from any of them. Note,
however, that if two individuals require two-way communication, both communicators need
a private key and must send the matching public key to the other person.
There are two requirements to ensure confidentiality should the transmission be intercepted
and the message extracted: the encryption algorithm must be complex and the number of
bits used to define the key must be large.
The details of encryption algorithms are beyond the scope of this book. However, you
might wish to investigate the type of approach used in established examples, such as DES
or RSA. Also, you might wish to consider the number of different combinations for a 64-bit
or 128-bit key.
The above account does not completely answer the question of how encryption works. The
missing factor is an organisation to provide keys and to ensure their safe delivery to
individuals usingthem.
Message
Cryptographic
hash
function
Digest
Sender’s
private key
-> Encryption
Digital
/ signature
Figure 21.02 Sender using a one-way hash function to send a digital signature
306
^ Message
■>
Cryptographic
hash
function
Let’s consider a would-be receiver who has a public-private key pair. This individual wishes
to be able to receive secure messages from other individuals. The public key must be made
available in a way that ensures authentication. The steps taken by the would-be receiver to
obtain a digital certificate to allow safe public key delivery are illustrated in Figure 21.04. The
process can be summarised as follows:
Figure 21.04 has person A placing the digital certificate on that person’s website but another
option is to post it on a website designed specifically for keeping digital certificate data.
Alternatively, a digital certificate might be used solely for authenticating emails as was
suggested in Chapter 8 (Section 8.02).
Once a signed digital certificate has been posted on a website, any other person wishing to
use person A’s public key downloads the signed digital certificate from the website and uses
the CA’s public key to extract person A’s public key from the digital certificate.
For this overall process to work there is a need for standards to be defined. As ever, the
name
for the standard, X.509, is not very memorable.
Part 3
Secure Socket Layer (SSL) arid Transport Layer Security (TLS) are two closely related
protocols
providing security in using the Internet. TLS is a slightly modified version of SSL. We
concentrate
on SSL here. The main context for the use of SSL is a client-server application. As described
in
Chapter 17 (Section 17.04), the interface between an application and TCP uses a port
number.
In the absence of a security protocol, TCP services an application using the port number.
The
combination of an IP address and a port number is called a ‘socket’. When the Secure
Socket
Layer protocol is implemented it functions as an additional layer between TCP in the
transport
layer and the application layer. When the SSL protocol is in place, the application protocol
HTTP
becomes HTTPS. Note that although SSL is referred to as a protocol, it is in fact a protocol
suite.
The starting point for SSL implementation is a connection between the client and the server
being established by TCP. The Handshake Protocol from the SSP suite is used to create
a session to allow the client and the server to communicate. Once the session has been
established, the client and server can agree which encryption algorithms are to be used and
can define the values for the session keys that are to be used. This interchange may involve
checking digital certificates. For the transmission, SSL provides encryption, compression of
the data and integrity checking. When the transmission is complete the session is closed
and
all records of the encryption disappear.
An application running HTTPS can guarantee secure communication allowing users to send
confidential information such as credit card details in an ecommerce transaction. The user
is completely unaware of the processes involved in ensuring confidential transmission with
data integrity assured.
307
Discussion Point:
Chapter 8 (Section 8.01) discussed security and privacy issues. The use of encryption has
always been a controversial subject. There are two important aspects to this. The first is
whether powerful, unbreakable encryption algorithms should be made available to the
public. The second relates to the key escrow scheme, which allows governments access to
all
secret keys. You may wish to revisit your Chapters discussions.
21.04 Malware
Types of malware
Malware is the colloquial name for malicious software. Malicious software is software that is
introduced into a system for a harmful purpose. One category of malware is where program
code is introduced to a system. The various types of malware-containing program code are:
308
System vulnerabilities
Many system vulnerabilities are associated directly with the activities of legitimate users of a
system. Malware can be introduced inadvertently by the user in a number of ways:
• accessing a website
Other vulnerabilities are indirectly associated with the activities of legitimate users. By far
the most significant is the use of weak passwords and particularly those which have a direct
connection to the user. A poor choice of password gives the would-be hacker a strong
chance
of being able to gain unauthorised access. Other examples include a legitimate user not
recognising a phishing or pharming attack and, as a result, disclosing sensitive information.
Systems inherently lack optimum security. Operating systems are notorious for lacking good
security. There is a tendency for operating systems to increase in complexity which tends to
offer the potential for more insecurity. The regular updates are often required because of a
newly discovered security vulnerability. In the past, commonly used application packages
have allowed macro viruses to flourish but this particular problem is largely under control.
Chapter 8 (Section 8.02) has a discussion of the standard security measures for computer
systems such as firewalls and anti-virus software.
1 a When transmitting data across a network three concerns relate to: confidentiality,
authenticity and integrity.
Explain each of these terms.
[4]
b Encryption and decryption can be carried out using a symmetric or an asymmetric key
method.
Explain how keys are used in each of these methods. You are not required to describe the
algorithms used.
Your account must include reference to a public key, a private key and a secret key. [6]
a Give the names of two types of malware which involve some malicious code being input
into a system. [2]
c Identify and explain two approaches for preventing malicious code from entering a
computer system. [4]
e Identify one possible policy for reducing the threat from phishing or pharming. [2]
310
Chapter 22
Learning objectives
By the end of this chapter you should be able to:
22.01 Logistics
Monitoring can be used to describe a very wide range of activities but all are characterised
by the measurement of some physical property. Typical examples of the physical property
could be temperature, pressure, light intensity, flow rate or movement.
Let’s consider temperature as an example. If this was being monitored under human control,
the measurement could be made with a standard mercury thermometer. However, in this
chapter we are interested in systems where a computer or microprocessor is being used.
In this scenario, monitoring requires a measuring device that records a value which can be
transmitted to the computer. Such a measuring device is a called a sensor. For monitoring
temperature, a sensor could contain a thermocouple which outputs a temperature-
dependent voltage.
• to check whether or not the monitored value is within acceptable limits; in a safety
system, if the measured property has reached a dangerous level;some immediate action
is required.
• to ensure routinely and continuously that the monitored property is as required; if the
value measured indicates that a change has occurred, then the control part of the system
may have to take measures to reverse this change.
The control element of a monitoring and control system needs a device, called an actuator
Figure 22.01 shows a schematic diagram of a computer-controlled environment.
KEY TERMS
Sensor: a hardware device that measures a property and transmits a value to a controlling
computer
Actuator: a hardware device that receives a signal from a computer and adjusts the setting
of a
controlling device
A closed-loop feedback control system is a special type of monitoring and control system
where the feedback directly controls the operation. Figure 22.02 shows a schematic diagram
of such a system. A microprocessor functions as the controller. This compares the value for
the actual output, as read by the sensor, with the desired output. It then transmits a value to
the actuator which depends on the difference calculated.
312
programming
but reading sensor values every clock cycle of a processor is unnecessarily frequent. The
program must control the timing of the repetitions. This might be done by creating a timed
sequence for reading values or possibly by including a time delay inside a loop.
Research the capabilities for controlling the timing sequence for continuous running in
your chosen programming language. Which ones would be best suited to a monitoring and
control program?
EndReadingSensor FALSE
ReadingOutOfRange FALSE
REPEAT
CALL SensorRead(SensorValue)
ELSE
ENDIF
ENDIF
IF ReadingOutOfRange
THEN
CALL WarningDisplay(Reading)
ENDIF
ReadingOutOfRange FALSE
FOR TimeFiller <- 1 TO 999999
ENDFOR
UNTIL EndReadingSensor
• The loop finishes with another loop that does nothing other than create a delay before
the outer loop repeats.
• When the sensor reading indicates a problem, the loop calls a procedure to handle
whatever notification method is to be used.
• Following this call, the loop continues so the Boolean variable has to be reset to prevent
the warning procedure being repetitively called.
CALL SensorRead(SensorValue)
ENDIF
IF SensorDifference > 0
THEN
ActuatorAdjustmentFactor SensorDifference/DesiredOutputLevel
AdjustmentDirection 'up'
ENDIF
IF SensorDifference < 0
THEN
ENDIF
UNTIL EndReadingSensor
• A procedure is called to activate the actuator only if the sensor reading shows a significant
change.
• The code will only work properly if it can be guaranteed that the activation of the actuator
has caused a change in the property before the sensor reading in the next iteration of the
loop.
The two fragments of code in Section 22.02 have a direct call to a procedure to take some
action. A slightly different approach would be to set values for Boolean variables subject
to what the sensors detect. For instance if a controlled environment had two properties to
be monitored and controlled, four Boolean variables could be used. Values could be set by
assignment statements such as:
IF SensorDifferencel > 0
IF SensorDifferencel < 0
IF SensorDifference2 > 0
IF SensorDifference2 < 0
Another part of the monitoring and control program would be checking whether any of the
four flags were set. The machine code for running such a program could use individual bits
to
represent each flag. The way that flags could be set and read are illustrated by the following
assembly language code fragments in which the three least significant bits (positions 0,1
and
LDD 0034
AND #B00000000
STO 0034
Uses a bitwise AND operation of the contents of the accumulator with the
operand to convert each bit to 0
LDD 0034
XOR #B00000001
314
STO 0034
Uses a bitwise XOR operation of the contents of the accumulator with the
operand to toggle the value of the bit stored in position 0. This changes the
value of the flag it represents.
LDD 0034
and #booooooio Uses a bitwise AND operation of the contents of the accumulator with the
operand to leave the value in position 1 unchanged but to convert every other
bit to 0. A subsequent instruction can now compare the value of the byte with
denary 2 to see if the flag represented by this bit position is set.
STO 0034
LDD 0034
OR #boooooioo Uses a bitwise OR operation of the contents of the accumulator with the
operand to set the flag represented by the bit in position 2. All other bit
positions remain unchanged.
STO 0034
Part 3
Exam-style Questions
1 A zoo reptile house has sixteen tanks which accommodate its reptiles. Each tank has to
have its own microclimate
where the appropriate levels of heat and humidity are crucial. The zoo implements a
computer system which
supplies the conditions in each of the tanks to a terminal in a central area. Warning
messages are flashed up on
the screen if any condition arises which requires the intervention of a zoo-keeper.
b State two items of hardware which need to be present in the tanks for this system to
function correctly,
c This is the polling routine which is used to run the system indefinitely:
01
REPEAT
02
FOR i 1 TO .
03
04
Conditionl
Extreme [i, 2]
05
THEN
06
in Tank ",
07
ENDIF
08
Conditions
Extreme [i, 4]
09
THEN
10
in Tank ",
11
ENDIF
12
ENDFOR
13
14
15
ENDFOR
16
UNTIL .
ii
iii
iv
315
[1]
[2]
[2]
[2]
[3]
[1]
d The zoo decides that the computer system needs to be updated. The computer system will
now make use
of actuators. These actuators will operate devices which adjust the microclimate.
Actuators can be in two states, on or off. Whether an actuator is on or off is determined by a
single bit
value (0 means off, 1 means on) in a specific 8-bit memory location.
The actuators to control the climate in Tank 4 use memory location 0804. Bit 5 of this
memory location
controls the heater.
bit number
value
Use some of the assembly language instructions to write the instructions that will ensure bit
5 of location
Instruction
Explanation
Op Code Operand
LDM #n
LLD
STO
OUT
Output to the screen the character whose ASCII value is stored in ACC
AND #n
AND
XOR #n
OR #n
Computational Thinking
and Problem-Solving
Learning objectives
We have already been thinking computationally in Chapters 11 to 15. Here is the formal
definition:
318
Abstraction
Abstraction involves filtering out information that is not necessary to solving the problem.
There are many examples in everyday life where abstraction is used. In Chapter 11 (Section
11.01), we saw part of the underground map of London, UK. The purpose of this map is
to help people plan their journey within London. The map does not show a geographical
representation of the tracks of the underground train network.nor does it show the streets
above ground. It shows the stations and which train lines connect the stations. In other
words, the information that is not necessary when planning how to get from Kings Cross St.
Pancras to Westminster is filtered out. The essential information we need to be able to plan
our route is clearly represented.
Decomposition
Decomposition means breaking tasks down into smaller parts in order to explain a process
more clearly. Decomposition is another word for step-wise refinement (covered in Chapter
12, Section 12.01). This led us to structured programming using procedures and functions
with parameters, covered in Chapter 14 (Section 14.03 to 14.05).
Data modelling
Data modelling involves analysing and organising data. In Chapter 13 we met simple data
types such as integer, character and Boolean. The string data type is a composite type:
a sequence of characters. When we have groups of data items we used one-dimensional
(ID) arrays to represent linear lists and two-dimensional (2D) arrays to represent tables
or matrices. We stored data in text files. In Chapter 10, we used data modelling to design
database tables.
We can set up abstract data types to model real-world concepts, such as records, queues
or stacks. When a programming language does not have such data types built-in, we can
define our own by building them from existing data types (see Section 23.03). There are
more
ways to build data models. In Chapter 27 we cover object-oriented programming where we
build data models by defining classes. In Chapter 29 we model data using facts and rules. In
Chapter 26 we cover random files.
Pattern recognition
Pattern recognition means looking for patterns or common solutions to common problems
and exploiting these to complete tasks in a more efficient and effective way. There are many
standard algorithms to solve standard problems, such as insertion sort or binary search (see
Section 23.02).
Algorithm design
Bubble sort
In Chapter 11, we developed the algorithm for a bubble sort (Worked Example 11.12).
Discussion Point:
TASK 23.01
Write program code for the most efficient bubble sort algorithm. Assume that the items to be
sorted are stored in a ID array with n elements.
319
Insertion sort
Imagine you have a number of cards with a different value printed on each card. How would
you sort these cards into order of increasing value?
You can consider the pile of cards as consisting of a sorted part and an unsorted part. Place
the unsorted cards in a pile on the table. Hold the sorted cards as a pack in your hand. To
start with only the first (top) card is sorted. The card on the top of the pile on the table is the
next card to be inserted. The last (bottom) card in your hand is your current card.
Figure 23.01 shows the sorted cards in your hand as blue and the pile of unsorted cards as
white. The next card to be inserted is shown in red. Each column shows the state of the pile
as the cards are sorted.
Position
number
1
2
l47|
47
47
17
17
17
54
54
54
47
47
28
17
17
HSU
54
54
47
93
93
93
| 93
93
54
28
28
28
| 28
r*28 a
93
Key
Sorted
Next card
Unsorted
1 Repeat until the card to be inserted has been placed in its correct position.
320
1.2 If the card to be inserted is greater than the current card, insert it below the current
card.
1.3 Otherwise, if there is a card above the current card in your hand, make this your new
current card.
1.4 If there is no new current card, place the card to be inserted at the top of the sorted
pile.
What happens when you work through the sorted cards to find the correct position for the
card to be inserted? In effect, as you consider the cards in your hand, you move the current
card down a position. If the value of the card to be inserted is smaller than the last card you
considered, then the card is inserted at the top of the pile (position 1).
We can write this algorithm using pseudocode. Assume the values to be sorted are stored in
a ID array, List:
TASK 23.02
1 Dry-run the insertion sort algorithm using a trace table. Assume the list consists of the
following six items in the order given: 53,21,60,18,42,19.
2 Write program code for the insertion sort algorithm. Assume that the items to be sorted are
stored in a ID array with n elements.
Investigate the performances of the insertion sort and the bubble sort by:
Binary search
In Chapter 11 we developed the algorithm for a linear search (Worked Example 11.11). This
is
the only way we can systematically search an unordered list. However, if the list is ordered,
then we can use a different technique.
If you want to look up a word in a dictionary, you are unlikely to start searching for the word
from the beginning of the dictionary. Suppose you are looking for the word ‘quicksort’. You
look at the middle entry of the dictionary (approximately) and find the word ‘magnetic’,
‘quicksort’ comes after ‘magnetic’, so you look in the second half of the dictionary. Again you
look at the entry in the middle of this second half of the dictionary (approximately) and find
the word ‘report’, ‘quicksort’ comes before ‘report’, so you look in the third quarter. You can
keep looking at the middle entry of the part which must contain your word, until you find the
word. If the word does not exist in the dictionary, you will have no entries in the dictionary left
to find the middle of.
KEYTERMS
Binary search: repeated checking of the middle item in an ordered search list and discarding
the half
of the list which does not contain the search item
We can write this algorithm using pseudocode. Assume the values are sorted in ascending
order and stored in a ID array, List of size Maxitems.
Found «- FALSE
SearchFailed <- FALSE
First 1
Last 4- Maxitems // set boundaries of search area
WHILE NOT Found AND NOT SearchFailed
Found 4- TRUE
ELSE
ENDIF
ENDIF
ENDIF
ENDWHILE
IF Found = TRUE
THEN
ENDIF
TASK 23.03
Dry-run the binary search algorithm using a trace table. Assume the list consists of the
following 20 items in the order given: 7,12,19,23,27,33,37,41,45,56,59,60,62,71,75,80,84,
88,92,99.
Search for the value 60. How many times did you have to execute the while loop?
Dry-run the algorithm again, this time searching for the value 34. How many times did you
have to execute the while loop?
322
Discussion Point:
Compare the binary-search algorithm with the linear-search algorithm. If the array contains
n items, how many times on average do you need to test a value when using a binary
search
and how many times on average do you need to test a value when using a linear search?
Can
you describe how the search time varies with increasing n?
An abstract data type is a collection of data. When we want to use an abstract data type,
we need a set of basic operations:
KEY TERMS
The remainder of this chapter describes the following ADTs: stack, queue, linked list, binary
tree, hash table and dictionary. It also demonstrates how they can be implemented from
appropriate built-in types or other ADTs.
23.04 Stacks
What are the features of a stack in the real world? To make a stack, we pile
items on top of each other. The item that is accessible is the one on top of the
stack. If we try to find an item in the stack and take it out, we are likely to cause
the pile of items to collapse.
Figure 23.02 shows how we can represent a stack when we have added four
items in this order: A, B, C, D. Note that the slots are shown numbered from the
bottom as this is more intuitive.
The Baseof stackPointer will always point to the first slot in the stack. The
TopOfstackPointer will point to the last element pushed onto the stack.
When an element is removed from the stack, the TopOfstackPointer will
decrease to point to the element now at the top of the stack.
_D
C_
J3
TopOfstackPointer
BaseOfStackPointer
23.05 Queues
What are the features of a queue in the real world? When people form a queue,
they join the queue at the end. People leave the queue from the front of the
queue. If it is an orderly queue, no-one pushes in between and people don’t
leave the queue from any other position.
Figure 23.03 shows how we can represent a queue when five items have joined
the queue in this order: A, B, C, D, E.
3
4
A _ FrontOfQueuePointer
E <- EndOfQueuePointer
When the item at the front of the queue leaves, we need to move all the
other items one slot forward. This would involve a lot of moving of data.
A more efficient way to make use of the slots is the concept of a ‘circular’
queue. Pointers show where the front and end of the queue are. Eventually
the queue will ‘wrap around’ to the beginning. Figure 23.04 shows a
circular queue after 11 items have joined and five items have left the
queue.
EndOfQueuePointer
FrontOfQueuePointer
In Chapter 11 we used an array as a linear list. In a linear list, the list items are stored in
consecutive locations. This is not always appropriate. Another method is to store an
individual list item in whatever location is available and link the individual item into an
ordered sequence using pointers.
An element of a list is called a node. A node can consist of several data items and a pointer,
which is a variable that stores the address of the node it points to.
A pointer that does not point at anything is called a null pointer. It is usually represented
by 0. A variable that stores the address of the first element is called a start pointer.
KEY TERMS
Start pointer: a variable that stores the address of the first element of a linked list
In Figure 23.05, the data value in the node box represents the key field of that node. There
are
likely to be many data items associated with each node. The arrows represent the pointers.
It does not show at which address a node is stored, so the diagram does not give the value
of
the pointer, only where it conceptually links to.
StartPointer
323
A new node, A, is inserted at the beginning of the list. The content of startPointer is copied
into the new node’s pointer field and startPointer is set to point to the new node, A.
StartPointer
node
Figure 23.06 Conceptual diagram of adding a new node to the beginning of a linked list
In Figure 23.07, a new node, P, is inserted at the end of the list. The pointer field of node L
points to the new node, P. The pointer field of the new node, P, contains the null pointer.
To delete the first node in the list (Figure 23.08), we copy the pointer field of the node to be
deleted into StartPointer.
StartPointer
StartPointer
Figure 23.09 Conceptual diagram of deleting the last node of a linked list
Sometimes the nodes are linked together in order of key field value to produce an ordered
linked list. This means a new node may need to be inserted or deleted from between two
existing nodes.
To insert a new node, C, between existing nodes, B and D (Figure 23.10), we copy the
pointer
field of node B into the pointer field of the new node, C. We change the pointer field of node
B
to point to the new node, C.
StartPointer
node
Figure 23.10 Conceptual diagram of adding a new node into a linked list
To delete a node, D, within the list (Figure 23.11), we copy the pointer field of the node to be
deleted, D, into the pointer field of node B.
StartPointer
Remember that, in real applications, the data would consist of much more than a key field
and one data item. This is why linked lists are preferable to linear lists. When list elements
need reordering, only pointers need changing in a linked list. In a linear list, all data items
would need to be moved.
Using linked lists saves time, however we need more storage space for the pointer fields.
In Chapter 16 we looked at composite data types, in particular the user-defined record type.
We grouped together related data items into record data structures. To use a record
variable,
we first define a record type. Then we declare variables of that record type.
We can store the linked list in an array of records. One record represents a node and
consists
of the data and a pointer. When a node is inserted or deleted, only the pointers need to
change. A pointer value is the array index of the node pointed to.
Unused nodes need to be easy to find. A suitable technique is to link the unused nodes to
form another linked list: the free list. Figure 23.12 shows our linked list and its free list.
StartPointer
FreeListPtr
325
326
When an array of nodes is first initialised to work as a linked list, the linked list will be empty.
So the start pointer will be the null pointer. All nodes need to be linked to form the free
list. Figure 23.13 shows an example of an implementation of a linked list before any data is
inserted into it.
StartPointer
FreeListPtr
List
Data
Pointer
[1]
[2]
[3]
[4]
5
[5]
[6]
[7]
We now code the basic operations discussed using the conceptual diagrams in Figures
23.05
to 23.12.
PROCEDURE InitialiseList
PROCEDURE InsertNode(NewItem)
IF PreviousNodePtr = StartPointer
ELSE // insert new node between previous node and this node
List [NewNodePtr] .Pointer <- List [PreviousNodePtr] .Pointer
List [PreviousNodePtr] .Pointer «- NewNodePtr
ENDIF
ENDIF
ENDPROCEDURE
After three data items have been added to the linked list, the array contents are as shown in
Figure 23.14.
327
StartPointer 1
FreeListPtr 4
List
Data
Pointer
[1]
[2]
[3]
[4]
[5]
[6]
[7]
Figure 23.14 Linked list of three nodes and free list of four nodes
328
PROCEDURE DeleteNode(Dataltem)
AND List [ThisNodePtr] .Data <> Dataltem // and item not found
PreviousNodePtr ThisNodePtr // remember this node
ENDIF
ENDIF
ENDPROCEDURE
TASK 23.04
Convert the pseudocode for the linked-list handling subroutines to program code.
Incorporate the subroutines into a program and test them.
Note that a stack ADT and a queue ADT can be treated as special cases of linked lists. The
linked list stack only needs to add and remove nodes from the front of the linked list. The
linked list queue only needs to add nodes to the end of the linked list and remove nodes
from
the front of the linked list.
TASK 23.05
Write program code to implement a stack as a linked list. Note that the adding and removing
of nodes is much simpler than for an ordered linked list.
TASK 23.06
Write program code to implement a queue as a linked list. You may find it helpful to
introduce
another pointer that always points to the end of the queue. You will need to update it when
you add a new node to the queue.
In the real world, we draw tree structures to represent hierarchies. For example, we can
draw
a family tree showing ancestors and their children. A binary tree is different to a family tree
because each node can have at most two ‘children’.
In computer science binary trees are used for different purposes. In Chapter 20 (Section
20.05), you saw the use of a binary tree as a syntax tree. In this chapter, you will use an
ordered binary tree ADT (such as the one shown in Figure 23.15) as a binary search tree.
Root node
Figure 23.15 Conceptual diagram of an ordered binary tree
329
Repeat
If the data value is greater than the current node’s data value, follow the right branch.
If the data value is smaller than the current node’s data value, follow the left branch.
This type of tree has a special use as a search tree. Just like
the binary search applied to an ordered linear list, the binary Figure 23.16 Conceptual
diagram of adding a node to an
search tree gives the benefit of a faster search than a linear search or searching a linked list.
The ordered binary tree also has a benefit when adding a new node: other nodes do not
need to be moved, only a left or right pointer needs to be added to link the new node into the
existing tree.
We can store the binary tree in an array of records (see Figure 23.17). One record
represents a
node and consists of the data and a left pointer and a right pointer. Unused nodes are linked
together to form a free list.
RootPointer
FreePtr l
Tree
LeftPointer
Data
RightPointer
[1]
[2]
[3]
[4]
[5]
[6]
[7]
330
// take node from free list, store data item and set null pointers
IF RootPointer = NullPointer
ENDIF
ENDWHILE
IF TurnedLeft = TRUE
THEN
ELSE
ENDIF
ENDIF
ENDIF
ENDPROCEDURE
AND Tree [ThisNodePtr] .Data <> Searchltem // and search item not found
IF Tree [ThisNodePtr] .Data > Searchltem
THEN // follow left pointer
ENDIF
ENDWHILE
RETURN ThisNodePtr // will return null pointer if search item not found
ENDFUNCTION
TASK 23.07
If we want to store records in an array and have direct access to records, we can use the
concept of a hash table.
The idea behind a hash table is that we calculate an address (the array index) from the
key value of the record and store the record at this address. When we search for a record,
we calculate the address from the key and go to the calculated address to find the record.
Calculating an address from a key is called ‘hashing’.
Finding a hashing function that will give a unique address from a unique key value is very
difficult. If two different key values hash to the same address this is called a ‘collision’. There
are different ways to handle collisions:
• chaining: create a linked list for collisions with start pointer'at the hashed address
• using overflow areas: all collisions are stored in a separate overflow area, known as
‘closed hashing’
• using neighbouring slots: perform a linear search from the hashed address to find an
empty slot, known as ‘open hashing’.
RETURN Address
ENDFUNCTION
[0]
[i]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
32390
95312
45876
The fourth record key (64636) also hashes to index 6. This slot is already taken; we have a
collision. If we store our record here, we lose the previous record. To resolve the collision,
we can choose to store our record in the next available space, as shown in Figure 23.19.
[0]
[i]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
32390
95312
45876
64636
The fifth record key (23467) hashes to index 7. This slot has been taken up by the previous
record, so again we need to use the next available space (Figure 23.20).
[0]
in
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
32390
95312
45876
64636
23467
Figure 23.20 A hash table with two collisions resolved by open hashing
When searching for a record, we need to allow for these out-of-place records. We know if
the record we are searching for does not exist in the hash table when we come across an
unoccupied slot.
We will now develop algorithms to insert a record into a hash table and to search for a
record
in the hash table using its record key.
• The records stored in the hash table have a unique key stored in field Key.
333
ENDIF
ENDWHILE
ENDIF
ENDWHILE
ENDIF
ENDFUNCTION
23.09 Dictionaries
A real-world dictionary is a collection of key-value pairs. The key is the term you use to look
up the required value. For example, if you use an English-French dictionary to look up the
English word ‘book’, you will find the French equivalent word ‘livre’. A real-world dictionary is
organised in alphabetical order of keys.
An ADT dictionary in computer science is implemented using a hash table, so that a value
can
be looked up using a direct-access method.
Python has a built-in ADT dictionary. The hashing function is determined by Python. ForVB
and Pascal, we need to implement our own.
There are many built-in functions for Python dictionaries. These are beyond the scope of this
book. However, we need to understand how dictionaries are implemented. The following
pseudocode shows how to create a new dictionary.
TYPE DictionaryEntry
TASK 23.08
Exam-style Questions
Index «- Middle
ELSE
IF .
THEN
Last «- Middle + 1 •
ELSE
ENDIF
ENDIF
ENDWHILE
ENDFUNCTION
b The binary search does not work if the data in the array being searched is.
2 A queue Abstract Data Type (ADT) is to be implemented as a linked list of nodes. Each
node is a record, consisting
of a data field and a pointer field. The queue ADT also has a Frontof Queue pointer and an
Endof Queue pointer
associated with it. The possible queue operations are: JoinQueue and LeaveQueue.
a i Add labels to the diagram to show the state of the queue after three data items have been
added to the
queue in the given order: Apple, Pear, Banana.
[3]
[1]
335
[2]
[5]
ii
Add labels to the diagram to show how the unused nodes are linked to form a list of free
nodes. This list
has a startof FreeList pointer associated with it.
[2]
ii Write program code to create an array Queue with 50 records of type Node. Your solution
should link all
nodes and initialise the pointers FrontOfQueue, EndOfQueue and startof FreeList.
c The pseudocode algorithm for the queue operation JoinQueue is written as a procedure
with the header:
where Newitem is the new value to be added to the queue. The procedure uses the
variables shown
in the following identifier table:
Identifier
Data type
Description
NullPointer
INTEGER
Constant set to -1
STRING
Value to be added
ii Complete the pseudocode using the identifiers from the table in part (i).
THEN
Report Error
ELSE
ENDIF
Queue[.].Pointer <-
EndOfQueue .
ENDIF
ENDPROCEDURE
[3]
[7]
[7]
[6]
Learning objectives
KEY TERMS
Example
Students in a particular college take an end-of-year test. Any student with 90 marks or more
gets a distinction. Students with fewer than 20 marks fail. All other students get a pass.
We set up a decision table by allowing one row for each condition and one row for each
possible action. We need one column for every possible combination of conditions. Two
conditions require four columns; three conditions require eight columns. Table 24.01 shows
the decision table for awarding grades.
Conditions
Condition
alternatives
Actions
Action entries
338
Conditions
=90 marks
< 20 marks
Actions
Distinction
Pass
Fail
The real power of decision tables becomes apparent when the conditions and resulting
actions are more complex. Inspection of the action entries sometimes shows redundancies
and the decision table can be simplified. This means the program code to be written will also
be simplified.
Consider an online order company that charges $5 for delivery of packages. If the order
value is over $50, the package is small and the customer has a promotion code, the
delivery is free. If the order value is over $50 and the package is small, the delivery charge
is $1. If the order value is over $50 and the customer has a promotion code, the delivery
charge is $1.
We complete the conditions in a decision table for the order form in the systematic
manner shown. Table 24.02 shows the delivery charge conditions.
Conditions
small package
Y
N
promotion code
Next, we look at each combination of conditions in turn and decide which action needs
to be taken and mark those with X (see Table 24.03).
Conditions
ordervalueover$50
N
N
small package
promotion code
Actions
free delivery
$1 charge
$5 charge
X
To find redundancies, we look at each action and then check whether the conditions are
required:
Free delivery only applies if all 3 conditions are true. There are no redundancies here.
The $1 charge applies if condition 1 is true and either condition 2 or condition 3 is true.
There are no redundancies here either.
The $5 charge applies in all cases where condition 1 is false. The redundant conditions
are shown by the shaded cells. We can therefore simplify the table (see Table 24.04) We
put a dash in the cells where the condition can be true or false-the action will be the
same. The dash is sometimes referred to as the ‘don’t care’ symbol.
Conditions
ordervalueover$50
small package
N
promotion code
Actions
free delivery
$1 charge
$5 charge
Decision tables can also be used to define outputs dependent on inputs so can be a
basis for testing a program.
339
A structure can consist of elementary components (they have no parts) and composite
components (sequence, selection or iteration). A sequence has two or more components.
Selection consists of two or more parts, only one of which is selected. Iteration consists of
one part that repeats zero or more times.
WORKED EXAMPLE 24.02
Customer Name:
Customer Address:
Product ID
Description
Quantity
Unit Price
Price
The first stage of designing a program to process the data in this order form is to draw a
data structure diagram of the data.
Using the top-down approach, at the top level the order form consists of these
components: the header, the order body, the totals and the payment method.
• The header is a sequence composite component containing customer name and address.
• Repetition is shown by an asterisk (*) in the corner of components that are repeated.
• Selection is shown by a circle in the corner of components where only one is chosen.
Chapter 24: Algorithm Design Methods
TASK 24.01
Write pseudocode from the Jackson program structure diagram in Figure 24.04.
In more complicated systems, the output data can be subjected to the same analysis,
possibly leading to conflicts to be resolved.
A computer system can be seen as a finite state machine (FSM). An FSM has a start state.
An input to the FSM produces a transformation from one state to another state.
The information about the states of an FSM can be presented in a state-transition table.
KEY TERMS
Finite state machine (FSM): a machine that consists of a fixed set of possible states with a
set of
inputs that change the state and a set of possible outputs
State-transition table: a table that gives information about the states of an FSM
current state
SI
S2
input
SI
SI
b
S2
S2
A state-transition diagram can be used to describe the behaviour of Table 24.05 State-
transition table
If the FSM has a final state (also known as the halting state), this is
shown by a double-circled state (SI in the example).
KEY TERMS
If an input causes an output this is shown by a vertical bar (as in Figure 24.06). For example,
if
the current state is SI, an input of b produces output c and transforms the FSM to state S2.
b|c
a|d
Description of the system: The system has a battery power supply. The system is
activated when the start button is pressed. Pressing the start button when the system
is active has no effect. To de-activate the system, the operator must enter a PIN. The
system goes into alert mode when a sensor is activated. The system will stay in alert
mode for two minutes. If the system has not been de-activated within two minutes an
alarm bell will ring.
We can complete a state-transition table (Table 24.06) using the information from the
system description.
Current state
Event
Next state
System inactive
System active
System active
Enter PIN
System inactive
System active
Activate sensor
Alert mode
System active
System active
Alert mode
Enter PIN
System inactive
Alert mode
2 minutes pass
Alert mode
Alert mode
Enter PIN
System inactive
Alarm bell ringing
The start state is ‘System inactive’. We can draw a state-transition diagram (Figure 24.07)
from the information in Table 24.06.
343
A finite state machine has been designed that will take as input a positive binary integer,
one bit at a time, starting with the least significant bit. The FSM converts the binary
integer into the two’s complement negative equivalent. The method to be used is:
Current state
SI
SI
S2
S2
Input bit
0
1
Next state
SI
S2
S2
S2
Output bit
1|0
TASK 24.02
Write a program that simulates the intruder alarm system in Worked Example 24.03.
Question 24.01
What is the output from the FSM represented by the state-transition diagram in Figure 24.08,
when the input is 0101 ?
Does the FSM in Figure 24.08 work for converting a negative binary number into its positive
equivalent?
Exam-style Questions
1 Atoll road is a road on which motor vehicle drivers have to pay to drive. The payment is
calculated as follows: Motor
vehicles pay a standard charge. If passenger vehicles (cars and buses) use the road during
off-peak times (not within
06:00 hrs to 19:00 hrs), the charge is reduced. Passenger vehicles with more than three
occupants do not get charged
345
[6]
Conditions
passenger vehicle
19:00
Actions
standard charge
reduced charge
Free
[3]
2 A bank uses a data file to print a monthly statement for a bank account. The file consists of
a header (account number
and name of account holder), followed by a statement body (repeated transactions detailing
date of payment, recipient
and amount), followed by a trailer (final balance and message if overdrawn).
3 A car park has a barrier at the exit. The starting position of the barrier is lowered. When a
car wants to exit the car park,
the driver has to insert a coin into a coin slot at the barrier. The barrier raises and allows the
car to drive out of the car park.
After the car has passed through the barrier, the barrier lowers. In case of emergency, a
member of staff can open the
barrier using a remote control. The barrier will remain open until the remote control is used
again to lower the barrier.
346
The barrier has three states: lowered, raised and open. The transition from one state to
another is as shown in the
state-transition table:
Current state
Event
Next state
Barrier lowered
Coin inserted
Barrier raised
Barrier lowered
Open remotely
Barrier open
Barrier open
Close remotely
Barrier lowered
Barrier raised
Barrier lowered
3-
Recursion
Learning objectives
The classic mathematical example is the factorial function, nl, which is defined in Figure
25.01. This definition holds for all positive whole numbers.
Figure 25.02 shows expressions of the factorial function for the first four numbers.
4! = 4 x (4 - 1)| = 4 x 3!
3! = 3 x (3 - 1)| = 3 x 2!
2! = 2 x (2 - 1)1 = 2 x l!
1! = 1 x (l - i)! = i x 0!
Because 0! = 1:
41 = 4x3x2x1x1 = 24
Recursive solutions have a base case and a general case. The base case gives a result
without involving the general case. The general case is defined in terms of the definition
itself. It is very important that the general case must come closer to the base case with each
recursion, for any starting point.
KEYTERMS
Result Result * i
ENDFOR
RETURN Result
ENDFUNCTION
END IF
RETURN Result
ENDFUNCTION
The recursive pseudocode resembles the original mathematical definition of the factorial
function. The dry run in Table 25.02 (Section 25.03) shows how this works.
Discussion Point:
Carefully examine the two solutions to the factorial function. What happens if the iterative
function is called with parameter 0? What happens if the recursive function is called with
parameter 0? What changes would need to be made so the mathematical definition holds
for all values of n?
When writing a recursive subroutine, there are three rules you must observe. A recursive
subroutine must:
TASK 25.01
Question 25.01
What happens when the function is called with Factoriai(-2)? Which rule is not satisfied?
Consider a procedure to count down from a given integer. We can write the solution as an
iterative algorithm:
ENDPROCEDURE
We can also write the solution as a recursive algorithm. Consider what happens after the
first value has been output. The remaining numbers follow the same pattern of counting
down from the next smaller value. The base case is when n reaches 0. 0 will be output but
no further numbers. The general case is outputting n and then counting down from (n-1).
This can be written using pseudocode:
ENDIF
ENDPROCEDURE
Dry-running the recursive procedure from Worked Example 25.02, we can complete a trace
table as shown in Table 25.01.
Call number
Procedure call
OUTPUT
n>0
CountDownFrom(3)
TRUE
CountDownFrom(2)
TRUE
CountDownFrom(1)
TRUE
4
CountDownFrom(0)
FALSE
It is more complex to trace a subroutine that contains statements to execute after the
recursive call. Look at the slightly modified algorithm:
CALL CountUpTo(n-l)
ENDIF
OUTPUT n
ENDPROCEDURE
Note that the statements after call CountupTo(n-i) are not executed until control returns
to this statement as the recursive calls unwind.
What is the effect of moving the output statement to the end of the procedure? Figure 25.04
traces the execution of call countupTo (3)
Call number
Procedure call
n>0
OUTPUT
CountUpTo( 3)
TRUE
CountUpTo( 2)
TRUE
3
CountUpTo( 1)
TRUE
CountUpTo( 0)
FALSE
0^
(3)
CountUpTof 1)
TRUE
(2)
CountUpTo(2)
TRUE
2r^
(1)
CountUpTof 3)
TRUE
3_
When the base case is reached, the fourth call of the procedure is complete and the
procedure is exited. Control then passes back to the third call and so on. Note how we show
the trace as the recursive calls unwind. Don’t go back up the table and fill in the OUTPUT
column as this will not make it clear enough when the output occurred.
A recursive function has a statement after the recursive call to itself: the return statement.
Again we show what happens when the recursive calls unwind by filling in more rows in the
trace table. Let’s consider the factorial function again.
Result <- 1
ELSE
ENDIF
RETURN Result
ENDFUNCTION
351
Call
number
Procedure call
II
Result
Return
value
Factorial(4)
FALSE
4 * Factorial(3)
Factorial(3)
FALSE
3 * Factorial(2)
3
Factorial(2)
FALSE
2 * Factorial(1)
Factorial(1)
FALSE
1 * Factorial(0)
Factorial(0)
TRUE
1-
— w~
Factorial(1)
FALSE
1*1
i"i
(3)
Factorial(2)
FALSE
2*1
(2)
Factorial(3)
FALSE
3*2
(1)
Factorial(4)
FALSE
4*6
24 J
Another way to illustrate how the function calls unwind is by framing each call with a box
(see
Figure 25.06). When the inner-most box is completed the result is fed to the next outer one.
And so on until the outermost box has been completed.
Factorial(4)
Factorial(3)
Factorial(2)
Factorial(1)
Factorial(0)
Result 1
Result 1
Return 1
Return 2
Return 6 _
Return 24
TASK 25.02
IF (n = 0) OR (n = 1)
THEN
OUTPUT n
ELSE
OUTPUT (n MOD 2)
ENDIF
ENDPROCEDURE
Dry-run the procedure call x(i9) by completing a trace table. What is the purpose of
this algorithm?
Recursive subroutines can only be executed if the compiler produces object code that uses
a
stack to push return addresses and local variables when calling a subroutine repeatedly.
110 ENDFUNCTION
120
140
190 ENDPROGRAM
The first program statement to be executed is line 160. The actual parameter n has the
value 3. The function call causes the return address to be put on the stack, as shown in
Figure 25.07. Program execution jumps to line 30.
When line 80 is reached, the function call causes the return address to be stored on the
stack,
together with the current contents of the local variables. The locations used to store these
values are referred to as a stack frame (represented by the blue borders in Figure 25.07).
Each
recursive call will add another stack frame to the stack until the base case is reached.
When the base case is reached, the result of the function call Factoriai(o) is returned by
pushing it onto the stack. The result is popped off the stack by the previous invocation of
the function. With each return from a function call, the corresponding stack frame is taken
off and the values of the local variables are restored. Eventually, control is returned to line
160 with the result of the function call on the top of the stack. The value of Answer is output
in line 170.
Stack
Description
160
160
080
160
080
080
160
080
3
080
080
160
080
080
080
160
080
080
160
080
160
Return to call 2
Return to call 1
353
TASK 25.03
Use your program code from Task 25.01 and add the main program as shown in Worked
Example 25.03.
Amend your code in the following ways (line numbers are relative to the pseudocode in
Worked Example 25.03):
Recursive solutions are often more elegant and use less program code than iterative
solutions. However, repeated recursive calls can carry large overheads in terms of
memory usage and processor time (see Section 25.04). For example, the procedure call
CountDownFrom(ioo) will require 100 stack frames before it completes.
Exam-style Questions
2 The following is a recursively defined function which calculates the result of Base
Exponent .
For example, 2 3 is 8.
Result 1
ELSE
ENDIF
RETURN Result
ENDFUNCTION
b Trace the execution of the function call Power (2,4) showing for each re-entry into the
Power function, the
c Explain the role of the stack in the execution of the Power function.
f i Give one reason why a non-recursive Power function may be preferred to a recursive one.
ii Give one reason why a recursive Power function may be preferred to a non-recursive one.
3 The following is a recursively defined function which calculates the n th integer in the
sequence of Fibonacci
numbers.
03 THEN
04 Result 1
05 ELSE
07 ENDIF
08 RETURN Result
09 ENDFUNCTION
[2]
[2]
[1]
[6]
[3]
355
[i]
[i]
[i]
[i]
Chapter 26
Further Programming
Learning objectives
The features of low-level programming languages give us the ability to manipulate the
contents of memory addresses and registers directly and exploit the architecture of a given
processor. We solve problems in a very different way when we use the low-level
programming
paradigm than if we use a high-level paradigm. See Chapter 6 and Chapter 28 for low-level
programming examples. Note that each different type of processor has its own programming
language. There are ‘families’ of processors that are designed with similar architectures
and therefore use similar programming languages. For example, the Intel processor family
(present in many PC-type computers) uses the x86 instruction set.
Imperative programming involves writing a program as a sequence of explicit steps that are
executed by the processor. Most of the programs in this book use imperative programming
(Chapters 11 to 15 and Chapters 23 to 26). An imperative program tells the computer how to
get a desired result, in contrast to declarative programming where a program describes what
the desired result should be. Note that the procedural programming paradigm belongs to
the imperative programming paradigm. There are many imperative programming languages,
Pascal, C and Basic to name just a few.
The object-oriented paradigm is based on objects interacting with one another. These
objects are data structures with associated methods (see Chapter 27). Many programming
languages that were originally imperative have been developed further to support the object-
oriented paradigm. Examples include Pascal (under the name Delphi or Object Pascal) and
Visual Basic (the .NET version being the first fully object-oriented version). Newer
languages,
such as Python and Java, were designed to be object-oriented from the beginning.
Declarative programs are expressed as formal logic and computations are deductions from
the formal logic statements (see Chapter 29). Declarative programming languages include
SQL (see Chapter 10, Section 10.07) and Prolog (Chapter 29).
26.02 Records
We used records in Chapter 23 (Section 23.06 onwards) to declare nodes. Records are
user-
defined types (discussed in Chapter 16, Section 16.01).
357
Using records
A car manufacturer and seller wants to store details about cars. These details can
be stored in a record structure:
TYPE CarRecord
DECLARE VehiclelD
DECLARE Registration
DECLARE DateOfRegistration
DECLARE EngineSize
DECLARE PurchasePrice
ENDTYPE
Note that we can declare arrays of records. If we want to store the details of 100 cars,
we declare an array of type CarRecord
STRING
DATE
INTEGER
CURRENCY
Python
Python does not have a record type. However, we can use a class definition
(see Chapter 27 for more about classes).
The pseudocode example of a car record described in Worked Example 26.01 can be
programmed as follows:
self.Registration = ""
self .DateOfRegistration = None
self.EngineSize = 0
Car[1] .EngineSize = 2500 # assigning value to a field of the 2nd car in list
VB.NET
Structure CarRecord
End Structure
Car(2) .EngineSize = 2500 ' assign value to a field of 2nd car in array
Pascal
type
CarRecord = record
VehiclelD : string[20];
Registration : string[10];
DateOfRegistration : TDateTime;
EngineSize : integer;
PurchasePrice : currency;
end;
var Car : array [1. .100] of CarRecord; // declare an array of CarRecord type
In Chapter 13 (Section 13.09) we used text files to store and read lines of text. Text files only
allow us to write strings in a serial or sequential manner. We can append strings to the end
of
the file.
When we want to store records in a file, we create a binary file (see Chapter 16, Section
16.02).
We can store records serially or sequentially. We can also store records using direct access
to
a random file. Table 26.01 lists the operations we use for processing files.
KEY TERMS
Random file: a file that stores records at specific addresses that can be accessed directly
Structured English
Pseudocode
Close a file
CLOSEFILE
GETRECORD , cidentifier>
SEEK /
EOF ()
359
If we have an array of records, we may want to store the records on disk before the program
quits, so that we don’t lose the data. We can open a binary file and write one record after
another to the file. We can then read the records back into the array when the program is
run
again.
Table 26.02 shows the pseudocode for storing the car records from Worked Example 26.01
in a sequential file and accessing them.
FOR i 1 TO MaxRecords
ENDFOR
CLOSEFILE "CarFile"
ENDFOR
CLOSEFILE "CarFile"
Python
ThisCar = CarRecordO
pickle, dump (Car [i], CarFile) # write a whole record to the binary file
CarFile.close()
VB.NET
Option Explicit On
Imports System. 10
CarFileWriter. Write (Car (i) .VehiclelD) ' write a field to the binary file
Next
CarFile.Close()
CarFile.CloseO
Pascal
var
i := 1;
begin
end;
361
TASK 26.01
2 Write another program to read the file and display the contents on screen.
Table 26.03 shows the pseudocode for storing a car record from Worked Example 26.01
in a random-access file and accessing it.
Saving a record
Retrieving a record
Address Hash(ThisCar.VehiclelD)
CLOSEFILE "CarFile"
Address Hash(ThisCar.VehiclelD)
CLOSEFILE "CarFile"
seek moves a pointer to the given record address. The putrecord and getrecord
commands access the record at that address. After the command has been executed the
pointer points to the next record in the file.
Python
import pickle # this library is required to create binary files
ThisCar = CarRecordO
Address = hash(ThisCar.VehiclelD)
CarFileseek (Address)
Address = hash(VehiclelD)
CarFile.seek (Address)
CarFile.closeO
In Python, the hash function needs to allow for the record size in bytes. For example, if the
record
size is 58 bytes, then the second record slot starts at position 59. The n th record slot starts
at
position (n -1) x 58 +1.
VB.NET
CarFileWriter.Write (ThisCar.VehiclelD)
CarFileWriter.Write (ThisCar.Registration)
CarFile.CloseO
MyCar.VehiclelD = CarFileReader.ReadStringO
MyCar.EngineSize = CarFileReader.Readlnt32 ()
CarFile.CloseO
In VB.NET, the hash function needs to allow for the record size in bytes. For example, if the
record
size is 58 bytes, then the second record slot starts at position 59. The n th record slot starts
at
position (n -1) x 58 +1.
363
364
Pascal
var
Address := Hash(ThisCar.VehiclelD);
Seek(CarFile, Address);
Address := Hash(ThisCar.VehiclelD);
Seek(CarFile, Address);
In Pascal, the file is of the given record type and the addresses for the records are slot
addresses
where each slot has the required number of bytes to accommodate the record.
TASK 26.02
Write a complete program to save several car records to a random-access file. Write another
program to find a record in the random-access file using the record key. Display the record
data on screen.
Run-time errors can occur for many reasons. Some examples are division by zero, invalid
array index or trying to open a non-existent file. Run-time errors are called 'exceptions’. They
can be handled (resolved) with an error subroutine (known as an 'exception handler’), rather
than let the program crash.
TRY
EXCEPT
ENDTRY
Any run-time error that occurs during the execution of is caught and handled
by executing . There can be more than one except block, each handling a
different type of exception. Sometimes a finally block follows the exception handlers. The
statements in this block will be executed regardless of whether there was an exception or
not.
VB.NET and Delphi are designed to treat exceptions as abnormal and unpredictable
erroneous situations. Python is designed to use exception handling as flow-control
structures. You may find you need to include exception handling in the code for Worked
Example 26.02. Otherwise the end of file is encountered and the program crashes.
• ZeroDivisionError.
Here is a simple example of exception handling. Asking the user to key in an integer could
result in a non-integer input. This should not crash the program.
Python
n = int (Numberstring)
print (n)
except:
VB.NET
Try
n = Int (Numberstring)
Catch
End Try
Pascal
The integrated debugger must be switched off for exception handling to work. In
the Tools menu, select Debugger Options and ensure the Integrated Debugger
option is not ticked.
ReadLn(NumberString) ;
try
n := StrToInt (Numberstring) ;
except
end;
TASK 26.03
Add exception-handling code to your programs for Task 26.01 or Task 26.02. Test your code
handles exceptions without the program crashing.
Programming environments for Python, VB.NET and Pascal were introduced in Chapter 15.
Section 15.02 covered the features found in a typical integrated development environment
(IDE). Section 15.04 described the use of a debugger.
Chapter 7 (Section 7.05) discussed the operation of compilers and interpreters and their
relative merits. In theory, the ideal situation would be to use an interpreter while developing
a program, because partial programs can be tested and no time is wasted waiting for
compilation. When the program is finished the compiled object code could be distributed
without having to divulge the source code. Compiled code also runs faster than a program
executed using an interpreter. Compiled code will not contain any syntax errors. Unless
every
line of an interpreted program has been executed, it is possible that there are syntax errors
still present in the source code.
In practice, this choice rarely exists. Pascal programs can only be executed once compiled.
Similarly, VB.NET has to be compiled before it can be executed. Python programs, on the
other hand, run under an interpreter. Internally, Python source code is always translated
into a bytecode representation and this bytecode is then executed by the Python virtual
machine.
366
Summary
• Records can be stored in files in a serial, sequential or random (direct access) manner.
• Pascal and VB.NET programs must be compiled before they can be executed.
• Python programs are executed using an interpreter (the Python virtual machine).
Exam-style Questions
• The key field of a customer record is the customer ID (a number between 100001 and
999999).
a i Declare the record data type CustomerRecord required to store the data. Write program
code. [6]
iii Write a function FindRecord(customeriD : integer) that returns the index of the hash table
slot where the record for the customer with CustomerlD is stored.
c Before the program stops, the hash table records must be stored in a sequential file, so
that the records can
be restored to the array when the program is re-entered.
Write program code to store the records of the array CustomerData sequentially into a
binary file
CustomerData. DAT
d Instead of using a hash table, the company decide they want to store customer records
in a direct-access binary file.
A program allows a user to enter a filename for accessing a data file. If the user types in a
filename that does
not exist, the program crashes. Write program code that includes exception handling to
replace the following
pseudocode:
INPUT FileName
368
Learning objectives
By the end of this chapter you should be able to:
■ solve a problem by designing appropriate classes * write code that demonstrates the use
of classes,
Chapters 14 and 26 covered programming using the procedural aspect of our programming
languages. Procedural programming groups related programming statements into
subroutines. Related data items are grouped together into record data structures. To use a
record variable, we first define a record type. Then we declare variables of that record type.
OOP goes one step further and groups together the record data structure and the
subroutines that operate on the data items in this data structure. Such a group is called
an ‘object’. The feature of data being combined with the subroutines acting on the data is
known as encapsulation.
To use an object, we first define an object type. An object type is called a class.
KEY TERMS
Class: a type that combines a record with the methods that operate on the properties in the
record
A car manufacturer and seller wants to store details about cars. These details can be stored
in a record structure (see Chapter 16, Section 16.01 and Chapter 26, Section 26.02):
TYPE CarRecord
ENDTYPE
We can write program code to access and assign values to the fields of this record.
For example:
BYVAL NewRegistration)
ThisCar.Registration NewRegistration
ENDPROCEDURE
We can call this procedure from anywhere in our program. This seems a well-regulated way
of operating on the data record. However, we can also access the record fields directly from
anywhere within the scope of ThisCar:
369
Classes in OOP
The idea behind classes in OOP is that attributes can only be accessed through methods
written as part of the class definition and validation can be part of these methods. The
direct path to the data is unavailable. Attributes are referred to as ‘private’. The methods to
access the data are made available to programmers, so these are ‘public’.
Classes are templates for objects. When a class type has been defined it can be used to
create one or more objects of this class type.
KEY TERMS
The first stage of writing an object-oriented program to solve a problem is to design the
classes. This is part of object-oriented design. From this design, a program can be written
using an object-oriented programming (OOP) language.
The programming languages the syllabus prescribes can be used for OOP: Python 3,
VB.NET
and Delphi/ObjectPascal.
370
When designing a class, we need to think about the attributes we want to store. We also
need to think about the methods we need to access the data and assign values to the data
of an object. A data type is a blueprint when declaring a variable of that data type. A class
definition is a blueprint when declaring an object of that class. Creating a new object is
known as ‘instantiation’.
Any data that is held about an object must be accessible, otherwise there is no point in
storing it. We therefore need methods to access each one of these attributes. These
methods
are usually referred to as getters. They get an attribute of the object.
Any properties that might be updated after instantiation will need subroutines to update
their values. These are referred to as setters. Some properties get set only at instantiation.
These don’t need setters. This makes an object more robust, because you cannot change
properties that were not designed to be changed.
KEY TERMS
Constructor: a special type of method that is called to create a new object and initialise its
attributes
Creating a class
When a car is manufactured it is given a unique vehicle ID that will remain the same
throughout the car’s existence. The
engine size of the car is fixed at the time of manufacture. The registration ID will be given to
the car when the car is sold.
In our program, when a car is manufactured, we want to create a new car object. We need
to instantiate it using the
constructor. Any attributes that are already known at the time of instantiation can be set with
the constructor. In our
example, vehicieiD and Enginesize can be set by the constructor. The other attributes are
assigned values at the time
of purchase and registration. So we need setters for them. The identifier table for the car
class is shown in Table 27.01.
Identifier
Description
Car
Class identifier
VehicieiD
Registration
DateOfRegistration
Date of registration
EngineSize
PurchasePrice
Constructor ()
SetPurchasePrice ()
SetRegistrationO
SetDateOfRegistrationO
GetRegistrationO
GetDateOfRegistration)
GetEngineSize()
GetPurchasePrice()
371
372
Declaring a class
Attributes should always be declared as ‘Private’. This means they can only be accessed
through the class methods. So that the methods can be called from the main program, they
have to be declared as ‘Public’. There are other modifiers (such as ‘Protected’), but they are
beyond the scope of this book.
The syntax for declaring classes is quite different for the different programming languages.
We
will look at the three chosen languages. You are expected to write programs in one of these.
Python and VB.NET include the method body within the class declaration.
class Car:
self._VehiclelD = n
self._Registration = ""
self._DateOfRegistration = None
self._EngineSize = e
self._PurchasePrice = p
self._Registration = r
def GetVehicleID(self):
return(self. VehiclelD)
def GetRegistration(self):
return(self._Registration)
def GetDateOfRegistration(self):
return (self._DateOfRegistration)
def GetEngineSize(self):
return(self._EngineSize)
def GetPurchasePrice(self):
return (self.PurchasePrice)
The code below shows how properties and the constructor are declared in VB.NET.
VB.NET
Class Car
End Function
End Function
End Function
End Function
End Function
End Class
373
374
Pascal includes only the headers of the functions and procedures within the class definition.
The full method code follows the class definition. Note that the class name is included when
the method is coded after the class declaration.
The code below shows how properties and the constructor are declared in Pascal.
DateOfRegistration := d;
end;
function Car.GetVehiclelD : string;
begin
GetVehiclelD := VehiclelD;
end;
GetRegistration := Registration-
end;
GetDateOfRegistration := DateOfRegistration;
end;
GetEngineSize := EngineSize;
end;
GetPurchasePrice := PurchasePrice;
end;
Instantiating a class
To use an object of a class type in a program the object must first be instantiated. This
means the memory space must be reserved to store the attributes.
Python
VB.NET
Pascal
var ThisCar : Car;
Using a method
To call a method in program code, the object identifier is followed by the method identifier
and the parameter list.
The following code sets the purchase price for an object ThisCar of class car.
Python
VB.NET
ThisCar.SetPurchasePrice (12000)
Pascal
The following code gets and prints the vehicle ID for an object ThisCar of class car.
Python
VB.NET
Pascal
WriteLn(ThisCar.GetVehiclelD);
TASK 27.01
1 Copy the Car class definition into your program editor and write a simple program to test
that each method works.
2 A business wants to store data about companies they supply. The data to be stored
includes: company name, email address, date of last contact.
(b) Write program code to declare the class. Company name and email address are to
be set by the constructor and will never be changed.
(c) Instantiate one object of this class and test your class code works.
27.04 Inheritance
The advantage of OOP is that we can design a class (a base class or a superclass) and then
derive further classes (subclasses) from this base class. This means that we write the code
for
the base class only once and the subclasses make use of the attributes and methods of the
base class, as well as having their own attributes and methods. This is known as inheritance
and can be represented by an inheritance diagram (Figure 27.02).
(a) (b)
Inheritance: all attributes and methods of the base class are copied to the subclass
Library item
Book
CD
Title of book*
Title of CD*
Author of book*
Artist of CD*,
Whether it is on loan*
Whether it is on loan*
The information to be stored about books and CDs needs further analysis. Note that we
could
have a variable Title, which stores the book title or the CD title, depending on which type of
library item we are working with. There are further similarities (marked * in Table 27.02).
There are some items of data that are different for books and CDs. Books can be
requested by a borrower. For CDs, the genre is to be stored.
We can define a class Libraryitem and derive a Book class and a cd class from it. We can
draw the inheritance diagrams for the Libraryitem, Book and cd classes as in Figure 27.03.
Figure 27.03 Inheritance diagram for Library Item, Book and CD classes
377
Analysing the attributes and methods required for all library items and those only
required for books and only for CDs, we arrive at the class diagram in Figure 27.04.
Attribute specific
to Book class ^
Book
IsRequested : BOOLEAN
Constructor()
GetlsRequested()
’'Set IsRequested ()
Methods specific 1
to Book class ^
to CD class
to CD class
A base class that is never used to create objects directly is known as an abstract class.
Libraryitem is an abstract class.
KEY TERMS
Abstract class: a base class that is never used to create objects directly
The code below shows how a base class and its subclasses are declared in Python.
Python
initialiser method
import datetime
class Libraryltem:
self._Title = t
self._Author_Artist = a
self._ItemID = i
self._OnLoan = False
self._DueDate = datetime.date.today ()
def GetTitle(self):
return (self._Title)
self._OnLoan = True
self._OnLoan = False
def PrintDetails(self):
A subclass definition
self._IsRequested = False
self._RequestedBy = 0
def GetIsRequested(self) :
def SetIsRequested(self):
self._IsRequested = True
class CD(Libraryltem):
self._Genre = ""
def GetGenre(self):
return (self._Genre)
initialiser method
initialiser method
379
The code below shows how a base class and its subclasses are declared in VB.NET.
VB.NET
Class Libraryltem
Private Title As
i ne Dase class
definition
Return (Title)
End Function
End Sub
Sub PrintDetailsO
Console.WriteLine(Title &
End Sub
End Class
& OnLoan Sc
A subclass definition
Class Book'
Inherits Libraryltem
End Function
Sc DueDate)
Class CD
Inherits Libraryltem
Private Genre As String = ""
Public Function GetGenreO As String
Return (Genre)
End Function
The code below shows how a base class and its subclasses are declared in Pascal.
Pascal
Base class
definition
type
Libraryltem = class
private
Title : STRING;
Author_Artist : STRING;
ItemID : INTEGER;
OnLoan : BOOLEAN;
DueDate : TDATETIME; -
public
private
IsRequested : BOOLEAN;
public
end;
7^
INTEGER); override;
CD = class (Libraryltem)
private
Genre : STRING;
public
381
// implementation of methods
Title := t;
Author_Artist := a;
ItemID := i;
OnLoan := FALSE;
DueDate := 0;
end;
GetTitle := Title;
end;
procedure Libraryltem.Borrowing;
begin
OnLoan : = TRUE;
end;
procedure Libraryltem.Returning;
begin
OnLoan := FALSE;
end;
procedure Libraryltem.PrintDetails;
begin
end;
begin
procedure Book.SetlsRequested;
begin
IsRequested := TRUE;
end;
begin
GetlsRequested := IsRequested;
end;
begin
Genre := g;
end;
GetGenre := Genre;
end;
Genre := g;
end;
Instantiating a subclass
Creating an object of a subclass is done in the same way as with any class (See Section
27.03).
Python
VB.NET
Pascal
Using a method
Using an object created from a subclass is exactly the same as an object created from any
class.
TASK 27.02
Copy the class definitions for Libraryitem, Book and CD into your program editor. Write
the additional get methods. Write a simple program to test that each method works.
TASK 27.03
Write code to define a Borrower class as shown in the class diagram in Figure 27.05.
Borrower
BorrowerName
: STRING
EmailAddress
: STRING
BorrowerlD
: INTEGER
ItemsOnLoan
: INTEGER
Constructor()
GetBorrowerName()
GetEmailAddress()
GetBorrowerlD()
GetltemsOnLoan()
UpdateltemsOnLoan()
PrintDetails()
27.05 Polymorphism
Look at Worked Example 27.02 and the code that implements it in Section 27.04. The
constructor method of the base class is redefined in the subclasses. The constructor
for the Book class calls the constructor of the Libraryitem class and also initialises
the isRequested attribute. The constructor for the cd class calls the constructor of the
Libraryitem class and also initialises the Genre attribute.
The PrintDetails method is currently only defined in the base class. This means we
can only get information on the attributes that are part of the base class. To include the
additional attributes from the subclass, we need to declare the method again. Although the
method in the subclass will have the same identifier as in the base class, the method will
actually behave differently. This is known as polymorphism.
KEY TERMS
Polymorphism: the method behaves differently for different classes in the hierarchy
The code shown here includes a call to the base class method with the same name. You can
completely re-write the method if required.
Python
Libraryltem.PrintDetails(self)
-=4
print(self._IsRequested)
VB.NET
End Sub
Pascal
// define
procedure
begin
WriteLn('Book
inherited
the method
Book.PrintDetails;
Details')
WriteLn(IsRequested)
end;
385
TASK 27.04
Use your program code from Task 27.02. Re-define the PrintDetail methods for the Book
class and the CD class. Test your code.
TASK 27.05
Use your program code from Task 27.03. Add another attribute, BorroweriD, to the
Libraryltem class so that the item being loaned can have the borrower recorded.
TASK 27.06
Use your code from Task 27.02 or Task 27.04. Add another attribute, RequestedBy, to the
Book class so that the borrower making the request can be recorded.
TASK 27.07
Use your code from Task 27.06 to write the complete program to implement a simplified
library system.
Write code to provide the user with a menu to choose an option. An example of a menu that
would be suitable is shown in Figure 27.06.
3 - Add a new CD
4 - Borrow a book
5 - Return a book
6 - Borrow a CD
7 - Return a CD
8 - Request book
386
When objects are created they occupy memory. When they are no longer needed, they
should
be made to release that memory, so it can be re-used. If objects do not let go of memory,
we eventually end up with no free memory when we try and run a program. This is known as
‘memory leakage’.
Python
VB.NET
When an object is no longer required, the programmer can use the method
.Free. For example, ThisBook.Free.
In Section 27.04 we covered how a subclass inherits from a base class. This can be seen
as generalisation and specialisation. The base class is the most general class, subclasses
derived from this base class are more specialised.
We have other kinds of relationships between classes. Containment means that one class
contains other classes. For example, a car is made up of different parts and each part will be
an object based on a class. The wheels are objects of a different class to the engine object.
The engine is also made up of different parts. Together, all these parts make up one big
object.
Containment: a relationship in which one class has a component that is of another class type
Car
Wheel
387
Assuming that all attributes for the Lesson and Assessment class are set by values
passed as parameters to the constructor, the code for declaring the Lesson and
Assessment classes are straightforward.
class Course:
self._CourseTitle = t
self._MaxStudents = m
self._NumberOfLessons = 0
self._CourseLesson = []
self._CourseAssessment = Assessment
self._NumberOfLessons = self._NumberOfLessons + 1
self._CourseLesson.append(Lesson(t, d, r))
self._CourseAssessment = Assessment(t, m)
Class Course
CourseTitle = t
MaxStudents = m
End Sub
End Sub
Public Sub AddAssessment(ByVal t As String, ByVal m As Integer)
CourseAssessment = New Assessment
CourseAssessment.Create(t, m)
End Sub
For i = 1 To NumberOfLessons
CourseLesson(i) .OutputLessonDetails ()
Next
End Sub
End Class
Course = class
private
CourseTitle : string;
MaxStudents : integer;
NumberOfLessons : integer;
public
end;
MaxStudents := m;
end;
NumberOfLessons := NumberOfLessons + 1;
end;
procedure Course.OutputCourseDetails;
var i : integer;
begin
for i := 1 to NumberOfLessons do
WriteLn(CourseLesson[i].LessonTitle) ;
end;
def Main():
add 3 lessons
MyCourse.AddLessonC'Problem Solving", 60, False)
390
Console. ReadLine()
begin
ReadLn;
MyCourse.Free; // free memory
end.
TASK 27.08
Write the code required for the Lesson and Assessment classes. Add the code for the
Course class and test your program with the appropriate simple program from Worked
Example 27.03.
A class has attributes (declared as private) and methods (declared as public) that operate on
the
attributes. This is known as encapsulation.
A class and its attributes and methods can be represented by a class diagram.
Classes (subclasses) can inherit from another class (the base class or superclass). This
relationship between a base
class and its subclasses can be represented using an inheritance diagram.
A subclass has all the attributes and methods of its base class. It also has additional
attributes and/or methods.
Polymorphism describes the different behaviour of a subclass method with the same name
as the base class
method.
Containment is a relationship between two classes where one class has a component that is
of the
other class type. This can be represented using a containment diagram.
Exam-style Questions
391
• savingsAccount: the account holder must maintain a positive balance and gets paid
interest on the balance at
an agreed interest rate.
• attributes:
• AccountHolderName
• methods
• CreateNewAccount
• SetAccountHolderName
• GetAccountHolderName
• GetIBAN
b Write program code for the class definition of the superclass BankAccount. [5]
392
c i State the attributes and/or methods required for the subclass PersonaiAccount [4]
ii State the attributes and/or methods required for the subclass SavingsAccount. [4]
iii Name the feature of object-oriented program design that combines the attributes and
methods into
a class. [1]
2 A bus company in a town has two types of season ticket for their regular customers: pay-
as-you-go and contract.
All season ticket holders have their name and email address recorded.
A pay-as-you-go ticket holder pays a chosen amount into their account. Each time the ticket
holder makes a
journey on the bus, the price of the fare is deducted from the amount held in the account.
They can top up the amount
at any time.
A contract ticket holder pays a fixed fee per month. They can then make unlimited journeys
on the bus.
The bus company wants a program to process the season ticket data. The program will be
written using an object-oriented
programming language.
a Complete the class diagram showing the appropriate attributes and methods.
Pay-As-You-Go-TicketHolder
ContractTicketHolder
[7]
[2]
[2]
• Identifier: NewCustomer
• name: A. Smith
NodeClass
Data : STRING
Head : INTEGER
Pointer : INTEGER
Tail : INTEGER
Constructor()
Constructor()
SetData(d : STRING)
JoinQueue(Newltem : NodeClass)
SetPointer(x : INTEGER)
LeaveQueue() : STRING
GetData() : STRING
GetPointer() : INTEGER
Write program code to define Nodeciass, including the get and set methods.
d TheJoinQueue method is to
• create a new object, Node, of Nodeciass
[3]
[1]
393
[io]
[3]
[5]
7\
394
Low-level Programming
Learning objectives
For the purposes of this chapter, the instruction set used is given in Table 28.01.
Instruction
Explanation
Label
Op code
Operand
LDM
#n
to ACC
LDD
LDI
LDX
LDR
#n
to IX
STO
STX
+ the contents of the index register. Copy the contents from acc to this calculated address
STI
ADD
INC
DEC
JMP
CMP
Contents of
CMP
#n
JPE
JPN
Input/output instructions
IN
OUT
AND
#n
AND
XOR
#n
XOR
OR
#n
OR
LSL
#n
LSR
#n
Other
END
Labels an instruction
The data movement, arithmetic operation, comparison and jump instructions were
introduced in Chapter 6. The bit-wise manipulation instructions were introduced in
Chapter 22.
To write useful programs, we need instructions for input and output, in and out are
provided here to input and output single characters, represented internally by their
ASCII codes.
A label is a symbolic name for the memory location that it represents. You can treat it like a
variable name. When writing low-level programs, we can give absolute addresses of
memory
locations. This is very restrictive, especially if we want to change the program by adding
extra
instructions. Writing low-level instructions using symbolic addresses (labels), allows us to
think at a higher level. The assembler will allocate absolute addresses during the assembly
process (see Chapter 7, Section 7.05).
When writing a solution to a problem using low-level programming, we need to break down
the problem into simple steps that can be programmed using the instruction set available.
One approach is to think in terms of the basic constructs we discussed for high-level
languages. You can use the following examples as design patterns.
Assignment
Table 28.02 shows some examples of assembly language assignments that match the
pseudocode.
Pseudocode
examples
Assembly code
examples
Explanation
A 34
LDM #34
STO A
B «- B + 1
LDD B
INC ACC
STO B
BB+A
LDD B
ADD A
STO B
A <- -A
LDD A
XOR #&FF
INC ACC
STO A
TASK 28.01
Selection
Table 28.03 shows some examples of assembly language selections that match the
pseudocode.
Pseudocode
examples
Assembly code
examples
Explanation
IF A = 0
THEN
B <- B + 1
ENDIF
LDD A
CMP #0
JPN ENDIF
THEN: LDD B
INC ACC
STO B
ENDIF: ...
THEN
OUTPUT "Y"
ELSE
OUTPUT "N"
ENDIF
LDD A
XOR #&FF
INC ACC
ADD B
CMP #0
JPN ELSE
OUT
JMP ENDIF
OUT
ENDIF: ...
Add B.
TASK 28.02
THEN
BA
ELSE
BB-1
Repetition
Table 28.04 shows an example of repetition in assembly language that matches the
pseudocode.
Pseudocode
examples
Assembly code
examples
Explanation
A=0
REPEAT
OUTPUT
AA+ 1
UNTIL A = 5
LDM #0
STO A
OUT
LDD A
INC ACC
STO A
CMP #5
JPN LOOP
TASK 28.03
Number 1
Total 0
Max <e- 5
REPEAT
Input/output
Table 28.05 shows some examples of input and output in assembly language that match the
pseudocode.
Pseudocode
examples
Assembly code
examples
Explanation
INPUT A
IN
STO A
OUTPUT B
LDR #-l
LOOP: INC IX
LDX B
OUT
CMP #13
JPN LOOP
INPUT A
LDR #-l
LOOP: INC IX
IN
STX A
CMP #13
JPN LOOP
TASK 28.04
Count <- 0
REPEAT
TASK 28.05
Modify your assembly code instructions from Task 28.04 to implement this sequence of
pseudocode statements:
Count <- o
REPEAT
Count Count + 1
INPUT Character
UNTIL Character = "N"
OUTPUT Count
An absolute address is the numeric address of a memory location. A program using absolute
addresses cannot be loaded anywhere else in memory. Some assemblers produce relative
addresses, so that the program can be loaded anywhere in memory.
Relative addresses are addresses relative to a base address, for example the first
instruction
of the program. When the program is loaded into memory the base address is stored
in a base register br. Instructions that refer to addresses then use the value in the base
register, modified by the offset. For example, sto [br] + 10 will store the contents of the
accumulator at the address calculated from (contents of the base register) +10.
Table 28.06 shows an example of instructions using symbolic, relative and absolute
addressing.
Symbolic addressing
Offset from
base (start)
Absolute addressing
START: LDM #0
LDM #0
100 LDM #0
STO A
STO [BR] + 10
LDM #42
OUT
3
OUT
203 OUT
LDD A
LDD [BR] + 10
INC ACC
INC ACC
STO A
STO [BR] + 10
CMP #5
CMP #5
207 CMP #5
JPN LOOP
JPN [BR] + 2
END
END
209 END
A: 0
10
210 0
It is very important that, at the end of the program, control is passed back to the operating
system. Otherwise the binary pattern held in the next memory location will be interpreted as
an
instruction. If the content of that memory location does not correspond to a valid instruction,
the
processor will crash. The instruction end signals the end of the program instructions.
At the top level, we can write the problem using structured English:
Table 28.07 shows an example of instructions that implement the above queue¬
processing algorithms.
Instruction
Explanation
Label
Op code
Operand
JOINQ:
STI
TAILPTR
LDD
TAILPTR
INC
ACC
STO
TAILPTR
JMP
ENDQ
LEAVEQ:
LDI
HEADPTR
OUT
LDD
HEADPTR
INC
ACC
STO
HEADPTR
JMP
ENDQ
ENDQ:
HEADPTR:
QSTART
TAILPTR:
QSTART
QSTART:
n ii
currently empty
Note that the value shown in Table 28.07 at the memory locations labelled headptr and
tailptr is the address of the start of the memory locations reserved for the queue. As
values are added to the queue, the tailptr value will increase to point to the memory
location at the end of the queue data. When a value is taken from the queue, the headptr
value will increase to point to the memory location at the head of the queue data.
TASK 28.06
Write instructions to reverse a word entered at the keyboard. This requires access to an
area
of memory treated as a stack.
Cambridge International AS and A level Computer Science
Summary
• A problem to be solved must be broken down into simple steps that can be programmed
using the
processor’s given instruction set.
• Processing includes:
o loaded from a memory location using direct, relative, indirect or indexed addressing.
402
Exam-style Questions
1 The instruction set of a processor with one general-purpose register, the accumulator,
includes the following instructions.
Instruction
Explanation
Label
Op code
Operand
LDD
STO
Store the contents of acc at the given
address
ADD
ACC
IN
ACC
AND
LSL
#n
END
b The ASCII code for ‘O’ is the binary value 00110000 . The ASCII code for T is the binary
value 00110001 .
Write an and instruction to convert any numeric digit stored in acc in the form of an ASCII
code to its eight-bit
c Write the assembly code instructions to convert a two-digit number keyed in at the
keyboard to its BCD
representation. Store the result in the memory location labelled Result. [7]
Instruction
Explanation
Label
Op code
Operand
Store result
End of program
Mask:
Result:
&00
2 A given processor has one general-purpose register, the accumulator acc, and one index
register, ix. Part of the
Instruction
Explanation
Label
Op code
Operand
LDM
#n
LDD
LDX
+ the contents of the index register. Copy the contents of this calculated address to
ACC
LDR
#n
STO
STX
ADD
ACC
INC
JMP
CMP
CMP
#n
JPE
JPN
IN
ACC
END
:
cop code>
Labels an instruction
Write an assembly language program that takes a sequence of characters as input from the
keyboard and stores each character in successive locations, starting at the location labelled:
STRING. Input ends when the input character is T (ASCII code 33).
Instruction
Explanation
Label
Op code
Operand
input character
End of program
STRING:
Chapter 29
Declarative Programming
Learning objectives
■ solve a problem by writing appropriate facts and rules ■ write code that can satisfy a goal
using facts and rules,
Declarative languages include database query languages (such as SQL, see Chapter 10,
Prolog is a logic programming language widely used for artificial intelligence and expert
systems.
The Prolog programs in this chapter have been prepared using the SWI-Prolog environment
shown in Figure 29.01 (see www.swi-prolog.org for a free download).
406
Jnjxj
There are three basic constructs in Prolog: facts, rules and queries.
The program logic is expressed using clauses (facts and rules). Problems are solved by
running a query (goal).
A collection of clauses is called a ‘knowledge base’. Writing a Prolog program means writing
a
knowledge base as a collection of clauses. We use the program by writing queries.
Head Body.
Prolog has a single data type, called a ‘term’. A term can be:
• an atom, a general-purpose name with no inherent meaning that always starts with a
lower case letter
• a variable, denoted by an identifier that starts with a capital letter or an underscore (_)
The arguments themselves can be compound terms. A predicate has an arity (that is, the
number of arguments in parentheses).
01 capitalCity (paris).
02 capitalCity(berlin).
03 capitalCity (cairo).
The meaning of clause oi is: Paris is a capital city.
capitalCity (paris) is a compound term. Both capitalCity and paris are atoms.
capitalCity is called a predicate and paris is the argument.
capitalCity has arity 1, as it has just one argument. This can be written as capitaicity/i,
the /i showing that it takes one argument.
TASK 29.01
Launch the editor (File, New...) from the SWI-Prolog environment. Enter the three clauses
above, as shown in Figure 29.02. Then save the file (File, Save buffer) as Exl.
Clauses 01 to 03 are a knowledge base. We can run a query on this knowledge base.
To ask the question whether Paris is a capital city, we write:
capitalCity(paris).
capitalCity (london).
407
This is because the fact that London is a capital city has not been included in our knowledge
base.
TASK 29.02
Run your own queries. You first need to consult the knowledge base (File, Consult...) from
within the Prolog environment. Note that SWI-Prolog uses the prompt ?- (see Figure 29.03).
1 ?-
1 ?- capitalCity(paris).
true.
2 ?- capitalCity(london).
false.
If your query does not get a response, check that you have:
Let’s add some more facts to our knowledge base. Comments in Prolog are enclosed in /
and /•
05 cityInCountry(berlin / germany).
07 citylnCountry(munich, germany).
To find out which country Berlin is in, we can run the query (see Figure 29.04):
To find out which cities are in Germany, we can run the query (see Figure 29.04):
citylnCountry (City, germany).
Part 4
1 ?- cityInCountry(berlin,Country).
Country = germany.
2 ?- cityInCountry(City, germany).
City = berlin ;
City = munich.
3 ?-
Note how Prolog responds when running a query that includes a variable. When there is
more
than one answer, you need to type a semicolon after the first answer and Prolog will give the
second answer. The semicolon has the meaning OR. First city is instantiated to berlin and
then city is instantiated to munich.
02 vegetable (potato).
03 vegetable (tomato).
05 meat(beef).
06 meat (lamb).
Amount - 250 ;
Ingredient = tomato.
Amount = 100.
8 ?-
dmi*!
Consider the knowledge base from Worked Example 29.01. If we are not interested in
the amount of each ingredient, we can use the anonymous variable (represented by the
underscore character). The query then becomes
A rule’s body consists of calls to predicates, which are called the rule’s goals. A predicate is
either true or false, based on its terms. If the body of the rule is true, then the head of the
rule
is true too.
410
01
parent (f red,
02
parent (f red,
alia).
03
parent (f red,
paul).
04
parent (dave,
f red).
However, in Prolog the if is replaced by and the and is replaced with a comma:
A person has a sibling (brother or sister) if they have the same parent. We can write this as
the Prolog rule:
sibling (A, B)
parent(P, A),
parent (P, B).
If we run the query sibiing( jack, x)., we get the answers we expect, but we also get the
answer that Jack is his own sibling. To avoid this we modify the query to ensure that A is
not equal to B:
sibling (A, B)
parent (P, A),
parent (P, B),
not(A=B).
Question 29.01
What answer do you expect to get from Prolog to the following query:
sibling(dave, X).
TASK 29.03
Write a knowledge base for your own family. You can include more predicates, for example:
Predicate
Meaning
male(fred).
Fred is male
female(alia).
Alia is female
Using the knowledge base from Worked Example 29.01, we want to know which dishes
contain meat. We are not interested how much meat, so we don’t need to know the value
of the third argument of the predicate ingredient/ 3 . We can write the rule:
containsMeat (X)
ingredient(X, Meat, _ ),
meat(Meat).
411
Prolog respondsto a query with an answer, such astheonein Worked Example 29.03: x =
stew.
The = sign is not an assignment as in imperative programs. The = sign shows instantiation.
How does Prolog use the knowledge base to arrive at the answers? One way to see exactly
what Prolog is doing is to use the graphical debugger.
Use the knowledge base from Worked Example 29.03. After consulting the knowledge
base, start the debugger (Debug, Graphical debugger) from the Prolog environment.
Then type: trace, and then the goal as shown in Figure 29.06.
1 ?-
1 ?- trace.
true,
[trace] 1 ?- containsMeat(X).
ppiwpp
JnjxJ
containsMeat/1
vegetable (aubergine).
vegetable (potato).
vegetable (tomato).
containsMeat (X) :-
ingredient(X, Meat, _) ,
meat(Meat).
Using the space bar you can step through the program. When Prolog gives an answer in
the Prolog Environment window, remember to input a semicolon, so that Prolog will go
and check for another possible answer.
If you don’t use the graphical debugger but type trace, you can see the trace in the SWI-
Prolog window, as shown in Figure 29.08.
Chapter 29: Declarative Programming
1 ?-
1 ?- trace.
true.
^JgJxJ
.d
[trace]
1 ?-
Call:
(6)
Call:
(7)
Exit:
(7)
Call:
(7)
Fail:
(7)
Redo
(7)
Exit:
(7)
Call:
(7)
Fail:
(7)
Redo
(7)
Exit:
(7)
Call:
(7)
Exit:
(7)
Exit:
(6)
X = stew ;
Redo:
(7)
Exit:
(7)
Call:
(7)
Fail:
(7)
Fail:
(6)
false.
[trace]
2 ?-
containsMeat(X).
containsMeat(_G464) ? creep
meat(aubergine) ? creep
meat(tomato) ? creep
meat(tomato) ? creep
zl
• Redo indicates that the predicate is backed into for another answer.
Recursion for imperative languages is covered in Chapter 25. Recursion for declarative
languages is where a rule is defined by itself, or more precisely, a rule uses itself as a sub-
goal.
Let us expand the Family knowledge base from Worked Example 29.02.
ancestor(A, B) parent(A,
ancestor(A, B) parent(A,
B)
Note that recursion in declarative programming must follow the equivalent rules that
imperative programming must follow. A recursive rule must:
413
TASK 29.04
Add the ancestor rules to the Family knowledge base and check that the following query
gives
the correct results:
ancestor (A, jack).
In Chapter 25, Worked Example 25.01, we programmed the factorial function using
recursion with imperative programming. We can also program this function using
recursion in Prolog.
M is N - 1, / assign N-l to Mf /
414
TASK 29.05
Enter the code from Worked Example 29.05 into the Prolog editor. Save it and consult it.
Then
pose the following query:
factorials, Answer).
29.09 Lists
A list is an ordered collection of terms. It is denoted by square brackets with the terms
separated by commas or in the case of the empty list, []. For example [1,2,3] or [red, green,
blue]. An element can be any type of Prolog object. Different types can be mixed within one
list. Lists are used in Prolog where arrays may be used in procedural languages.
Any non-empty list can be thought of as consisting of two parts: the head and the tail. The
head is the first item in the list; the tail is the list that remains when we take the first element
away. This means that the tail of a list is always a list.
Lists are manipulated by separating the head from the tail. The separator used is a vertical
line (a bar):|
If Prolog tries to match [h|t] to [car, lorry, boat, ship] , it will instantiate h to car and
T to [lorry, boat, ship].
The clause definition showHeadAndTaii([H|T], h, t). can be used to pose the query:
showHeadAndTail( [fred, jack, emma], Head, Tail).
Head = fred,
The clause definition myList( [1,2,3]). can be used to pose the query:
myList ([H|T]).
H = 1,
T = [2, 3].
The clause definition emptyList (a) a = []. can be used to pose the query:
emptyList ( [1] ).
The built-in predicate append(A, b, c) joins list A and list Band produces list C.
produces:
true.
X=a;
X=b;
X=C;
X = d.
write (x). outputs the value currently instantiated with the variable X.
The built-in predicate read(A) reads a value from the keyboard into variable A.
read (Name) . waits for an atom to be input from the keyboard and instantiates the variable
Name with that value.
Note that the input must start with a lower case letter and not have spaces or be enclosed in
quotes.
We can write user-friendly programs using the read and write predicates.
415
416
Note how the interface with the user in the code below is written as a rule with the
separate steps separated by commas (representing AND).
assert/i adds the clause given as the argument to the knowledge base.
retractaii/i takes the given clause out of the knowledge base, so the next time the
program is run, the new facts will be added and used in the goal.
/ interface /
go:-
assert(sky(Sky)),
weather(Weather),
TASK 29.06
Test the recursively defined rule writeiist/i to output the elements of a list.
writeList( •[]).
1 A logic programming language is used to represent, as a set of facts and rules, details of
cities of the world. The set of
facts and rules are shown below in clauses labelled 1 to 17.
01 capital (vienna).
02 capital (london).
03 capital(Santiago).
04 capital (Caracas).
05 capital (tokyo).
06 cityln(vienna, austria).
07 cityIn(santiago / chile).
08 cityln(salzburg, austria).
09 cityIn(maracaibo, Venezuela).
12 continent(uk, europe).
13 continent(argentina, southAmerica).
14 iVisited(vienna).
15 ivisited(tokyo).
17 europeanCity (City)
Clause
Meaning
01
Vienna is a capital.
06
Vienna is in Austria.
10
14
17
a Write down the extra clauses needed to express the following facts:
b The clause cityIn(City, austria) would return the result: Vienna, salzburg.
• have a licence: there is a minimum age at which a person can be issued with a licence
and it is different for cars and trucks
• pass a theory test: it is the same test for cars and trucks
06 age(jhon, 20).
07 age(emma, 22).
08 age(sheena, 19).
09 hasLicence (f red).
10 hasLicence (jack).
11 hasLicence (mike).
12 hasLicence (jhon).
13 hasLicence (emma).
14 hasLicence (sheena).
15 allowedToDrive(X, V)
AND age(X, A)
AND A >= L.
16 passedTheoryTest(jack).
17 passedTheoryTest (emma).
18 passedTheoryTest (jhon).
19 passedTheoryTest (f red).
21 passedDrivingTest(fred, car).
22 passedDrivingTest(jack, car).
25 qualifiedDriver(X, V)
IF allowedToDrive(X, V)
AND passedTheoryTest(X)
Clause
Meaning
01
03
09
15
Person x is able to drive vehicle v if person x has a licence and the age a of
person x is greater than or equal to the minimum age l to drive vehicle v.
ii all drivers who have passed the theory test but not a driving test. [3]
d To produce the output from a clause, the inference engine uses a process called
backtracking.
AllowedToDrive(mike, car).
List the order in which clauses are used to produce the output. For each clause, describe the
result that it returns. [5]
419
Learning objectives
♦
Chapter 30: Software development
The first computers had to be programmed using machine code. This is a very tedious
method for writing programs. Assemblers were invented to generate a computer program
in machine code from assembly code instructions. Later interpreters and compilers were
invented to generate low-level code from high-level programs written by people. So you can
see that program generators have been around for a very long time.
Development is ongoing to invent program generators that will take ever more abstract
models and translate them into executable code. An integrated development environment
(IDE) fora modern high-level language provides facilities for software development, such as
a
source code editor with intelligent code completion, build automation and a debugger. Some
IDEs have more advanced forms of code generation. For example, programmers can design
GUIs interactively or generate code from a wizard or template. Computer-aided software
engineering (CASE) tools are also used to generate code.
In Chapter 13 (Section 13.08) we covered built-in functions. These are part of a program
library.
KEY TERMS
Program generator: a computer program that can be used to create other computer
programs
Program library: a collection of pre-compiled routines or modules that a program can use
• The user interface is poorly designed and the user makes mistakes.
How are errors found? The end user may report an error. This is not good for the reputation
of the software developer. Testing software before it is released for general use is essential.
Research has shown that the earlier an error can be found, the cheaper it is to fix it. It is very
important that software is tested throughout its development.
The purpose of testing is to discover errors. Edsger Dijkstra, a famous Dutch computer
scientist, said ‘Program testing can be used to show the presence of bugs, but never to
show
their absence!’.
We covered logic errors and run-time errors in Chapter 15. In Section 15.03 we discussed
black-box and white-box testing. In Section 15.04 we used debugging facilities in an IDE. In
Section 15.05 we worked through program code by dry-running it and recording the steps in
a trace table. Dry-running program code is also sometimes referred to as a ‘walkthrough’.
These testing methods are used early on in software development, for example when
individual modules are written. Sometimes programmers themselves use these testing
422
Discussion Point:
Do you think that a program tester will find errors the programmer did not know about? You
can try out the idea by letting your friends test a program that you think works perfectly.
Each individual module may have passed all the tests, but when modules are joined
together
into one program, it is vital that the whole program is tested. This is known as integration
testing. Integration testing is usually done incrementally. This means that a module at a time
is added and further testing is carried out before the next module is added.
Software will be tested in-house by software testers before being released to customers.
This
type of testing is called alpha testing.
Bespoke software (written for a specific customer) will then be released to the customer.
The customer will check that it meets their requirements and works as expected. This
stage is referred to as acceptance testing. It is generally part of the hand-over process. On
successful acceptance testing, the customer will sign off the software.
When software is not bespoke but produced for general sale, there is no specific customer
to perform acceptance testing and sign off the software. So, after alpha testing, a version is
released to a limited audience of potential users, known as ‘beta testers’. These beta testers
will use the software and test it in their own environments. This early release version is
called
a beta version and the chosen users perform beta testing. During beta testing, the users will
feed back to the software house any problems they have found, so that the software house
can correct any reported faults.
KEYTERMS
Integration testing: individually tested modules are joined into one program and tested to
ensure the
modules interact correctly
Beta testing: testing of software by a limited number of chosen users before general release
During the design stage of a software project, a suitable testing strategy must be worked
out to ensure rigorous testing of the software from the very beginning. Consideration should
be given to which testing methods are appropriate for the project in question. A carefully
designed test plan has to be produced.
• flow of control: does the user get appropriate choices and does the chosen option go to
the correct module?
• validation of input: has all data been entered into the system correctly?
This outline test plan needs to be made into a detailed test plan.
How can we carry out these tests? We need to select data that will allow us to see whether
it is handled correctly. This type of data is called ‘test data’. It differs from real, live data
because it is specifically chosen with a view of testing different possibilities. We distinguish
between different types of test data, listed in Table 30.01.
Type of test data
Explanation
Normal (valid)
Abnormal (erroneous)
Boundary (extreme)
Data values that are at a boundary or an extreme end of the range of normal data; test
data should include values just within the boundary (that is, valid data) and just outside
the boundary (that is, invalid data)"
Look at the Pyramid Problem (code shown in Section 14.07). This is a simple program,
but we can use it to illustrate how to choose test data. There are just two user inputs: the
number of symbols that make up the base and the symbol that is to be used to construct
the pyramid. Let’s consider just the test data for the number of symbols (Table 30.02).
Type of test
data
Example
test values
Explanation
Normal
(valid)
Any odd positive integer would be suitable as test data. However, it should be bigger
than 1 to check that the pyramid is correctly formed. More than one different value to
test would be a good idea.
Abnormal
(erroneous)
Any number that is not a positive odd integer. This will require several tests to ensure
that the following types of data are not accepted:
-7
• negative integer
• even integer
7.5
• real number
i*i
• non-numeric input
You should not take shortcuts and choose one negative even integer or one negative
real number and think you can test two things at the same time. You will not know
whether the test fails for just one reason or both.
423
Boundary
What is a boundary value? The smallest possible pyramid is a single symbol. So the
(extreme)
Sometimes choosing test data throws up some interesting questions that need to be
considered when designing the solution:
The output would not look like a pyramid if there is a wrap-around. So the program
81
really should check how many symbols fit onto one line and notallow the user to input
a number greater than this. If the number of characters across the screen is 80, then 79
would be just within the boundary but 81 would be outside the boundary, and should
not be accepted.
Note that by testing with values within the boundary you are also testing normal data,
albeit at the extreme ends of the normal range.
424
The best way to write a program that works correctly is to prevent errors in the first place.
How can we minimise the errors in a program? A major cause of errors is poor requirements
analysis. When designing a solution it is very important that we understand the problem and
what the user of the system wants or needs. We should use:
If you embark on writing a large program, you may wish to map out stages and a schedule
when you should achieve certain milestones. This is especially important if you are working
to a deadline.
Commercial software consists of very large programs that require many people to work on
them. Usually there are programmers to write code designed by senior program designers.
There will be software testers and document writers. If new hardware is required, there will
be engineers and installers. To manage people and resources and schedule activities, a
project manager is usually appointed.
r
Discussion Point:
The first task of a project management team is to breakdown the project into individual
activities that need to be completed to produce the final product. These activities will take a
certain amount of time and will need to be done in a certain order. Some activities can only
start
when other activities have been completed. This is where scheduling becomes very
important.
Project managers can use various methods to help them. They can make use of the
Program
Evaluation and Review Technique (PERT) to establish the critical path. Then they can use a
Gantt chart to schedule activities.
PERT charts
PERT was developed for the US Navy to simplify the planning and scheduling of a large and
complex project. It is a method to analyse the activities required to complete a project,
especially the time required to complete each activity. It also helps to identify the minimum
time needed to complete the project (critical path analysis).
An activity may result in a document, a report or some other building block of the project.
Such a building block is called a deliverable.
KEY TERMS
A software developer is to produce software for a customer. The activities, deliverables and
milestones in Table 30.03 have been identified.
Activity
Description
Weeks to complete
Deliverables
Milestone
Start
Identify requirements
Requirement specification
Produce design
Program design
Test modules
E
Integration testing
Install software
Acceptance testing
Write documentation
Technical documentation
User documentation
Train users
3
Users trained
10
Go live
Finish
11
425
426
The project manager produces the PERT chart shown in Figure 30.01 for this project.
• Activities are represented by arrows linking the milestones. The arrows are labelled with
the activity code below the arrow and the duration above the arrow.
• Nodes 1,2,3,5,6,7,8,10 and 11 are joined by solid arrows. These activities must be
completed in sequence; they are called ‘dependent activities’.
• Activities that must be completed in sequence but that don’t require resources or
completion time are represented by dotted lines and are called ‘dummy activities’. The
dotted line between milestones 3 and 4 indicates that the program modules must be
tested before software installation can begin, but the time required to do the testing is
on another path (path D).
Critical path
The critical path is the longest possible continuous pathway from Start to Finish. It
determines the shortest time required to complete the project. Any time delays along the
critical path will delay the final milestone.
KEY TERMS
Critical path: the longest possible continuous pathway from Start to Finish
In Worked Example 30.02, the critical path is A, B, C, E, F, G, K, L. This means the shortest
possible time to complete the project is 20 weeks.
Question 30.01
What would be the effect of Activity H taking six weeks instead of the original four weeks?
Explain.
Gantt charts
Following on from Worked Example 30.02, the Gantt chart for the project is shown in
Figure 30.02.
Activity
Identify requirements 1
Produce design
Test modules
Integration testing
Install software
Acceptance testing
Write documentation
Train users
Go live
<N
Week Number
t-H
CM
CO
LD
LD
Is-
00
CD
rH
t-H
rH
CM
rH
CO
rH
1—1
LO
rH
CD
1—1
Is-
1—1
CO
rH
CD
i-H
• The horizontal axis represents time. In this example the schedule is worked out in
weekly steps. This could be done on a daily or monthly basis, depending on the overall
length of the project.
• Individual activities are shown as horizontal bars, one activity per row.
• Activities can overlap. In this example, module testing can begin before all the program
code has been written. The documentation can be started before all the testing has
been completed.
• Some activities can only begin when others have been completed. In this example,
integration testing can only start after all modules have been successfully tested.
Software can only be installed after integration testing has been successfully completed.
If any activities take longer than planned, the chart may need to be modified to represent the
revised schedule. For example, if serious problems are encountered during acceptance
testing,
further design, program coding, module testing and integration testing may be required.
Question 30.02
Redraw the Gantt chart from Figure 30.02 to show the position if the acceptance testing
failed and an extra week of design, two extra weeks of coding, one week of module testing
and a further week of integration testing are required before the software can be re-installed.
427
Exam-style Questions
428
1 A procedure to output a row in a tally chart has been written using pseudocode:
OUTPUT (■/')
ENDIF
ENDFOR
ENDIF
Design suitable test data that will test the procedure adequately. Justify your choices in each
case.
[9]
2 A business approaches a software house for a bespoke system. The systems analyst for
this
project has drawn up the following outline activities:
Activity
Description
Weeks to complete
dependent on activity
Identify requirements
Programming
10
Order hardware
Module testing
Technical documentation
Install hardware
E
P
Integration testing
Alpha testing
Acceptance testing
User documentation
Train users
Go live
1
K, L, P
a i Complete the PERT chart, showing all activities and durations. [8]
ii Use the PERT chart to work out the critical path. [3]
b i Draw a Gantt chart from the information in the above table. [8]
ii Using the information from the Gantt chart, calculate the time required for the project from
start
to completion.
[3]
Glossary
430
Glossary
Glossary
431
Glossary
432
Glossary
l
Variable: a storage location for a data value that has an Word: a small number of bytes
handled as a unit by the
r identifier computer system
433
Index
Index
abstraction, 318
access restrictions, 92
accumulators, 61
ACM (Association for Computing
Machinery), 100-101
actuator heads, 39
actuators, 311
adaptive maintenance, 244
address bus, 63
addressing modes, 71-72
ADTs (abstract data types), 322-334
aggregation (containment), 386-390
algebraic expressions, 297-300
algorithms, 126-130,319-322
design methods, 338-344
dry-running, 240-43
scheduling, 292
alpha testing, 422
ALU (arithmetic and logic unit), 60
American Standard Code for
Information Interchange (ASCII) code,
9-10
algorithms, 128-29,130-33
assembly languages, 397
constants and variables, 181-82
Association for Computing Machinery
(ACM), 100-101
broadcasts, 260
buffers, 25
cables, 19,20-21
434
Index
integers, 5-8
text, 9-10
coding stage, 229
collapsed code blocks, 233-34
collisions, 332
control bus, 64
copyright, 104-5
CSMA/CD, 266
databases, 111-122
dates, 187,205-6
435
Index
detection, 81,421
prevention, 424
Ethernet protocol, 266-67
ethics, 100-107
exception handling, 364-65
executable code, 84
execution stage, 230
expanded code blocks, 233-34
exponent, 251-53
expressions, algebraic, 297-300
exrad (exponent), 251-53
bitmaps, 12-13
compression, 15-16,83
sound,14
'flash' memory, 41
flip-flop circuits, 274-75
floating-point representation, 251-52,
254-56
flowcharts, 128
for loops, 139,194-96
foreign keys, 114
formatting utility programs, 82
fragmentation, disk, 40
freeware, 106
frequency, sound, 13-14
front-end analysis, 294-96
FSMs (finite state machines), 341
FTP (File Transfer Protocol), 265-66
full adder circuits, 274
functions, 162-63,167,215-18
passing parameters to, 218-19
436
geostationary-Earth-orbit (GEO)
satellites, 22
getters, 370
gibi prefix, 13
guided media, 20
Huffman coding, 15
HyperText Markup Language (HTML),
31-32
Index
algorithms, 128-29,132-33
IPv6 addressing, 30
logic propositions, 50
loops
algorithms, 128-29,138-144
assembly languages, 398
high-level languages, 194-98
lossless compression, 15
lossy compression, 15
loudspeakers, 47-48
low-Earth-orbit (LEO) satellites, 22
low-level programming languages,
357, 395-401
maintenance, 243-44
malware, 307-8
mantissa, 251-54
437
Index
438
files, 15-16,25
streaming, 24-25
multi-programming, 288
Nyquist's theorem, 14
object code, 84
object-oriented design, 370
object-oriented programming (OOP),
357,369-390
objects, 370-71
output statements
algorithms, 128-29,132-33
overflow flag, 62
ownership, 104-5
P2P (peer-to-peer) file sharing, 105,267
packet switching, 261
parallel processing, 285
parameters, 167,218-223
parity bits, 94
parity block checks, 94-95
Pascal/Delphi Console Mode, 179-180
passwords, 90,308
pattern recognition, 319
PCBs (process control blocks), 290-91
PCT (projective capacitive touch)
screens, 44
pharming, 308
phishing, 308
phosphors, 43
PHP, 33,122
precision, 253-54
pre-condition loops, 197-98
predicates (Prolog), 407,410
prettyprinting, 230-32
primary keys, 114
printers, 45-47
Prolog, 406-16
Python, 177-78
radio transmission, 20
read heads, 39
read-write heads, 39
records, 113,249,357-59,369
recursion
routers, 22-23,262,264
run-length encoding, 15
satellites, 22
scanners, 46
algorithms, 128-29,136-37
assembly languages, 398
high-level languages, 188-194
semantic analysis, 295-96
sensors, 311
sequences, 128-29,132-33
sequential circuits, 274-75
sequential file processing, 359-361
sequential files, 250
serial files, 249-250
servers, 23
Index
slices, 203
swappingvalues, 131-32
SWI-Prolog,406
switches, 262
threads, 291
unguided media, 20
unicasts, 260
Unicode, 10
validation, data, 93
value, by, 220-21
values, 130-32
variables
in Prolog, 407,408-9
programming, 180-81,182
storage locations, 130-32
VB.NET (Visual Basic Console Mode),
178-79
vector graphics, 11
verification, data, 93
video, representation, 14-15
web servers, 24
white-box testing, 236-37
wide area networks (WANs), 21
WiFi hotspots, 22
WiFi networks, 267-68
WiMAX protocol, 268
wireless networks, 267-68
wireless transmission, 20-21
words (data), 64