0% found this document useful (0 votes)
9 views

Data Processing and Information

Uploaded by

ocom.aturia
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Data Processing and Information

Uploaded by

ocom.aturia
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 71

1.

01 Data Processing and information

Thursday 23 January 2025


Data
• Data is raw numbers, letters, symbols, sounds or
images with no meaning.
• The data P952BR could have
several meanings. It could possibly be:
– a product code
– a postal/ZIP code
– a car registration number
• As it is not known what the data means, it is meaningless.
Information
• When data items are given context and meaning,
they become information.
• A person reading the information will then know
what it means.
• Data is given context by identifying what sort of
data it is.
• This still does not make it information but it is a
step on the way to it becoming information.
• Information is data with context and meaning
Information

For the data to become


information, it needs to
be given meaning.
Information is useful
because it means
something.
Knowledge
• Knowledge: information to which human experience has been applied.
• Knowledge is basically what a person knows. This is known as their knowledge
base.
• A knowledge base gets larger over time as a person gains experience or
learning.
• Knowledge requires a person to understand what information is, based on
their experience and knowledge base.
• Crown Prince Salman was appointed Crown Prince of Saudi Arabia on 18 June
2012, This is information.
• Knowing that he had been Crown Prince for 2 years on1August 2014 is
knowledge. Knowledge allows data to be interpreted.
• In computing terms, knowledge is also what a machine knows through the
use of a knowledge base consisting of rules and facts, often found in
knowledge-based systems, modelling and simulation software.
Knowledge
1.02 Sources of data
Tuesday, August 17, 2021
Key Terms

• Static data: data that does not normally change


• Dynamic data: data that changes automatically without user
intervention
• Direct data source: data that is collected for the purpose for
which it will be used
• Indirect data source: data that was collected for a different
purpose (secondary source)
Static data
• Static means ‘still’.
• It is data that does not normally change.
• Static data is either fixed or has to be
changed manually by editing a document.
Dynamic data

• Dynamic means ‘moving’.


• It is data that updates as a result of the source data
changing.
• Dynamic data is updated automatically without user
intervention.
Static information sources compared with
dynamic information sources

Static information source Dynamic information source

The information does not change on a regular basis. Information is updated automatically when the
original data changes.
The information can go out of date quickly because It is most likely to be up to date as it changes
it is not designed to be changed on a regular basis. automatically based on the source data.
The information can be viewed offline because live An internet or network connection to the source data
data is not required. is required, which can be costly and can also be
slow in remote areas.
It is more likely to be accurate because time will The data may have been produced very quickly and
have been taken to check the information being so may contain errors.
published, as it will be available for a long period
of time.
Direct data source
• Data collected from a direct data source (primary
source) must be used for the same purpose for
which it was collected.
• It is often the case that the data will have been
collected or requested by the person who intends
to use the data.
• The data must not already exist for another
purpose though. When collecting the data, the
person collecting should know for what purpose
they intend to use the data.
Direct data source
Indirect data source
• Data collected from an indirect data source
(secondary source) already existed for
another purpose.
• Although it can still be collected by the
person who intends to use it, it was often
collected by a different person or
organisation.
Indirect data source
Which of the following are direct data sources and which
are indirect data sources?
Advantages and disadvantages of gathering
data from direct and indirect data sources
Direct data source Indirect data source
The data will be relevant because what is needed Additional data that is not required will exist that may
has been collected. take time to sort through and some data that is required
may not exist.
The original source is known and so can be trusted. The original source may not be known and so it can’t be
assumed that it is reliable.
It can take a longtime to gather original data rather The data is immediately available.
than use data that already exists.
A large sample of statistical data can be difficult to If statistical analysis is required, then there are
collect for one-off purposes. more likely to be large samples available.
The data is likely to be up to date because it has Data may be out of date because it was collected at a
been collected recently. different time.
Bias can be eliminated by asking specific questions. Original data may be biased due to its source.
The data can be collected and presented in the The data is unlikely to be in the format required, which
format required. may make extracting the data difficult.
1.03 Quality of information
• The quality of information is determined by a number of
attributes.
Accuracy
• Information that is inaccurate is clearly not good
enough.
• Data must be accurate in order to be considered
of good quality.
• Imagine being told that you need to check in at
the airport 45 minutes before the flight leaves, so
you turn up at 18:10 for a 19:05 flight only to find
that you were actually supposed to check in one
hour early.
Accuracy
Relevance
• Information must be relevant to its
purpose.
• Having additional information that is not
required means that the user has to search
through the data to find what is actually
required.
Relevance
Relevance
• Information must be relevant to its
purpose.
• Having additional information that is not
required means that the user has to search
through the data to find what is actually
required.
Relevance
Age
• Information must be up to date in order to
be useful.
• Old information is likely to be out of date
and therefore no longer useful.
• When using indirect data sources, always
check when the information was produced.
Age
Level of detail
• There needs to be the right amount of
information for it to be good quality.
• It’s possible to have either too little or too
much information provided.
• If there is too much information, then it
can be difficult to find the exact
information required.
• If there is not enough information, then it
is not possible to use it correctly.
Level of detail
Completeness
• All information that is required must be
provided in order for it to be of good
quality.
• Not having all the information required
means it cannot be used properly.
Task

• Look at the invitation below.

Describe how accuracy, relevance, level of detail and


completeness affect the quality of information in the
invitation.
Encryption
• One specific type of encoding is encryption.
• This is when data is scrambled so that it cannot be
understood.
• The purpose of encryption is to make the data difficult or
impossible to read if it is accessed by an unauthorised user.
• Data can be encrypted when it is stored on disks or other
storage media, or it can be encrypted when it is sent across
a network such as a local area network or the internet.
• Accessing encrypted data legitimately is known as
decryption.
Caesar cipher
• A cipher is a secret way of writing.
• In other words it is a code.
• Ciphers are used to convert a message into an encrypted message.
• It is a special type of algorithm which defines the set of rules to follow to
encrypt a message.
• Roman Emperor Julius Caesar created the Caesar cipher so that he could
communicate in secret with his generals.
• The Caesar cipher is sometimes known as a shift cipher because it selects
replacement letters by shifting along the alphabet.
Caesar cipher
Symmetric encryption
• This is the oldest method of encryption.
• It requires both the sender and recipient to possess the
secret encryption and decryption key.
• With symmetric encryption, the secret key needs to be
sent to the recipient.
• This could be done at a separate time, but it still has to be
transmitted whether by post or over the internet and it
could be intercepted.
Asymmetric encryption
• Asymmetric encryption is also known as public-key cryptography.
• Asymmetric encryption overcomes the problem of symmetric encryption keys
being intercepted by using a pair of keys.
• This will include a public key which is available to anybody wanting to send
data, and a private key that is known only to the recipient.
• They key is the algorithm required to encrypt and decrypt the data.
• The process works like this:
Asymmetric encryption
• In this example, Tomasz sends
a message to Helene.
• Tomasz encrypts the message
using Helene’s public key.
• Helene receives the encrypted
message and decrypts it
using her private key.
• This method requires a lot
more processing than
symmetric encryption and so it
takes longer to decrypt the
data.
Digital Certificates
• In order to find a public key, digital certificates are required which identify the user or
server and provide the public key.
• A digital certificate is unique to each user or server.
• A digital certificate usually includes:
– organisation name
– organisation that issued the certificate
– user’s email address
– user’s country
– user’s public key.
• When encrypted data is required by a recipient, the computer will request the digital
certificate from the sender.
• The public key can be found within the digital certificate.
Asymmetric encryption cont’d
• Asymmetric encryption is used for Secure Sockets
Layer (SSL) which is the security method used for
secure websites.
• Transport Layer Security (TLS) has superseded
SSL but they are both often referred to as SSL.
• Once SSL has established an authenticated
session, the client and server will create
symmetric keys for faster secure communication.
Hard disk
• Disk encryption will encrypt every single bit of data stored on a disk.
• This is different to encrypting single files.
• In order to access any file on the disk, the encryption key will be required.
• This type of encryption is not limited to disks and can be used on other storage media
such as backup tapes and Universal Serial Bus (USB) flash memory.
• It is particularly important that USB flash memory and backup tapes are encrypted
because these are portable storage media and so are susceptible to being lost or stolen.
• If the whole medium is encrypted, then anybody trying to access the data will not be
able to understand it.
• The data is usually accessed by entering a password or using a fingerprint to unlock the
encryption.
HTTPS
• Normal web pages that are not encrypted are fetched and
transmitted using Hypertext Transfer Protocol (HTTP).
• Anybody who intercepts web pages or data being sent
over HTTP would be able to read the contents of the web
page or the data.
• This is particularly a problem when sending sensitive data,
such as credit card information or usernames and
passwords.
HTTPS cont’d
• Hypertext Transfer Protocol Secure (HTTPS) is the encryption standard used
for secure web pages.
• It uses Secure Sockets Layer (SSL) or Transport Layer Security (TLS) to encrypt
and decrypt pages and information sent and received by web users.
• This is the encryption method that is used by banks when a user logs onto
online banking.
• A secure web page can be spotted by its address beginning with https:// and in
addition some browsers display a small padlock.
HTTPS cont’d
• When a browser requests a secure page, it will check the
digital certificate to ensure that it is trusted, valid and that
the certificate is related to the site from which it is coming.
• The browser then uses the public key to encrypt a new
symmetric key that is sent to the web server.
• The browser and web server can then communicate using
a symmetric encryption key, which is much faster than
asymmetric encryption.
HTTPS cont’d
Email
• Email encryption uses asymmetric encryption.
• This means that recipients of emails must have the private
key that matches the public key used to encrypt the
original email.
• In order for this to work, both the sender and recipient
need to send each other a digitally signed message that
will add the person’s digital certificate to the contact for
that person.
• Encrypting an email will also encrypt any attachments.
How encryption protects data

• Encryption only scrambles the data so that if it is found,


it cannot be understood.
• It does not stop the data from being intercepted, stolen
or lost.
• However, with strong 256-bit AES encryption it is
virtually impossible for somebody to decrypt the data
and so it is effectively protected from prying eyes.
1.05 Checking the accuracy of data
Validation
• Validation takes place when data is input into a computer system.
• The purpose is to ensure the data is sensible and conforms to defined rules.
• A railway season ticket will have an expiry date.
• The season ticket is valid until it expires.
• Once it expires it is invalid.
• The rule here is that the date the season ticket is used must be before its
expiry date.
• When data is validated, if it conforms to the rules then it will be accepted.
• If it does not conform to the rules, then it will be rejected and an error
message will be presented.
• Validation does not ensure that data is correct.
Presence check
• A presence check is used to ensure that data is entered.
• If data is entered, then it is accepted.
• If data is not entered, then the user will be presented with
an error message asking them to enter data.
Range check
• A range check ensures that data is within a defined range.
• A limit check has a single boundary.
• This could be the highest possible value or the lowest possible value.
• A range check includes two boundaries, which would be the lower boundary and the
upper boundary.
• The following symbols are used when comparing with a boundary:
> greater than
< less than
>= greater than or equal to
<= less than or equal to
Range check cont’d
• Data that is within the boundaries is
valid.
• Data that is outside the boundaries is
invalid.
• Data that is valid and within the
boundaries is not necessarily correct.
• A grade of C could be entered when a
grade A should have been entered.
• C is valid but incorrect.
Type Check
• A type check ensures that data must be of a defined data type.
• Data that is of the correct data type is valid.
• Data that is valid and of the correct data type is not necessarily
correct.
• A date of birth of 28/12/2087 could be entered.
• The date is valid because it is a date data type, but it is clearly incorrect.
Length check
• A length check ensures data is of a defined length or
within a range of lengths.
• Data that is of the allowed length is not necessarily correct.
• For example, a valid date might require six digits.
• A date of 2ndFeb would be a valid length because it contains
six characters, but it would not be correct because it does
not follow the required format.
Format check
• A format check ensures data matches a
defined format.
• It is sometimes known as a picture check
and the data has to follow a pattern.
• Data that matches the pattern is valid.
• Data that is valid and of the defined
format is not necessarily correct.
• An email address of fdc@jb meets the
rules above but is clearly incorrect.
Lookup check
• A lookup check tests to see if data exists in a list.
• It is similar to referential integrity in Chapter9, but uses a
list defined within the validation rule.
Consistency check
• A consistency check compares data in one field with data in another field that
already exists within a record, to see whether both are consistent with each
other.
Check digit
• A check digit is a number (or letter) that is added to the
end of an identification number being input.
• It is a form of redundancy check because the check digit is
redundant (not needed for the identification number, but
just used for validation).
• When the identification number is first created, an
algorithm (a series of calculations) is performed on it to
generate a check digit.
• When the identification number is input, the same
algorithm is performed on it.
• The result of the algorithm should match the check digit.
• If it matches, then the data is valid.
• If it does not match then the data is invalid.
Check digit cont’d
Check digit cont’d
• There are a variety of calculations that can be
performed to determine what the check digit
should be.
• The important thing is that the same calculation
used to create the check digit in the first place
should be used to confirm the check digit when
the identification number is input.
Verification

• Verification is the process of checking that the


data entered into the computer system matches
the original source.
Visual checking
• A method of verification can be for the user to visually
check that the data entered matches the original source.
• This can be done by reading the data displayed on screen and
comparing it with the original data.
• If the data matches, then it has passed the verification process.
• If it does not match, then it has failed the verification process and
needs to be re-entered.
• Visual checking does not ensure that the data entered is correct.
• If the original data is wrong, then the verification process may still
pass.
• For example, if the intended data is ABCD but ABC is on
the source document, then ABC will be entered into the
computer and verified, but it should have been ABCD in the
first place.
Double data entry
• Another method of verification is to input data into the
computer system twice.
• The two items of data are compared by the computer
system and if they match, then they are verified.
• If there are any differences, then one of the inputs
must have been incorrect.
• It is still possible to pass double entry verification and
for the data to be incorrect.
• If the data is entered incorrectly twice, then the two
values may match.
• For example, if the CAPS key is left on by mistake then
both entries would match.
The need for both validation and verification

• As you will have seen in the two sections above, it


is possible to enter valid data that is still incorrect.
• It is also possible to verify incorrect data.
• By using both validation and verification, the
chances of entering incorrect data are reduced.
• If data that is incorrect passes a validation check,
then the verification check is likely to spot the
error.
The need for both validation and
verification
Proofreading
• Proof reading is the process of checking information.
• For example, when this book was written it was checked for
spelling errors, grammar errors, formatting and accuracy.
• Proof reading can take place for a document or when data is
input.
• When proof reading a document, it is best to have a proof reader
who is different from the original author of the document, as they
will be able to check the work objectively and identify errors.
• However, it is also possible for the original author to proof read
their own document, but they may not notice some of their own
errors. When data is input, it is usually proof read by the person
inputting the data.
Summary
• Information has context and meaning so a person knows what it means.
• The quality of information can be affected by the accuracy, relevance, age, level of
detail and completeness of the information.
• Proofreading is the process of checking information.
• Data are raw numbers, letters, symbols, sounds or images without meaning. Knowledge
allows data to be interpreted and is based on rules and facts. Static data
does not normally change.
• Dynamic data updates as a result of the source data changing.
• Data collected from a direct data source (primary source) must be used for the same
purpose for which it was collected.
• Data collected from an indirect source (secondary source) already existed for another
purpose.
Summary
• Coding is the process of representing data by assigning
a code to it for classification or identification.
• Encoding is the process of storing data in a specific format.
• Encryption is when data is scrambled so that it cannot be
understood.
• Validation ensures that data is sensible and allowed.
• Validation checks include a presence check, range check,
type check, length check, format check and check digit.
• Verification is the process of checking data has been
transferred correctly.
• Verification can be done visually or by double data entry.
END

You might also like