A.3 Further Aspects of Database Management
A.3 Further Aspects of Database Management
Contents
A.3.1 Explain the role of a database administrator. .............................................................................. 1
A.3.2 Explain how end-users can interact with a database. .................................................................. 3
A.3.3 Describe different methods of database recovery. ....................................................................... 9
A.3.4 Outline how integrated database systems function. .................................................................. 12
A.3.5 Outline the use of databases in areas such as stock control, police records, health records,
employee data. .................................................................................................................................... 14
A.3.6 Suggest methods to ensure the privacy of the personal data and the responsibility of those
holding personal data not to sell or divulge it in any way. .................................................................. 17
A.3.7 Discuss the need for some databases to be open to interrogation by other parties (police,
government, etc). ................................................................................................................................ 20
A.3.8 Explain the difference between data matching and data mining. Difference between data
matching and data mining ................................................................................................................... 30
Typically there are three types of users for a DBMS. They are:
1. The End User who uses the application. Ultimately, this is the user who actually puts the
data in the system into use in business. This user need not know anything about the
organization of data in the physical level. She also need not be aware of the complete data in
the system. She needs to have access and knowledge of only the data she is using.
2. The Application Programmer who develops the application programs. She has more
knowledge about the data and its structure since she has manipulate the data using her
programs. She also need not have access and knowledge of the complete data in the system.
3. The Database Administrator (DBA) who is like the super-user of the system.
Responsibilities / Functions of Data Base Administrator:
Data Base Administrator (DBA) is a person or group in charge for
implementing DBMS in an organization.
Database Administrator's job requires a high degree of technical expertise and the
ability to understand and interpret management requirements at a senior level.
In practice the DBA may consist of team of people rather than just one person
The role/functionalities of the DBA is very important and is defined by the
following functions.
● Defining the Schema
The DBA defines the schema which contains the (i) structure of the data in the
application. The DBA determines (ii) what data needs to be present in the system
and (iii) how this data has to be represented and organized.
• Makes decisions concerning the content of the database:
It is the DBA's job to decide exactly what information is to be held in the database-in
other words, to identify the' entities of interest to the enterprise and to identify
information to be recorded about those entities .
● Liaising with Users
The DBA needs to interact continuously with the users to understand the data in
the system and its use.
• Provides support to users:
It is the responsibility of the DBA to provide support to the users, to ensure that the data
they require is available, and to write the\ necessary external schemas (using the
appropriate external data definition language).
In addition, the mapping between any given eA1ernal schema and the conceptual' schema
must also be specified.
● Defining Security & Integrity Checks
The DBA finds about the access restrictions to be defined and defines security
checks accordingly. Data Integrity checks are also defined by the DBA.
DBA is responsible for providing the authorization and authentication checks such that
no malicious users can access database and it must remain protected. DBA must also
ensure the integrity of the database.
● Defining Backup / Recovery Procedures
The DBA also defines procedures for backup and recovery. Defining backup
procedures includes specifying what data is backed up, the periodicity of taking
backups and also the medium and storage place for the backup data.
In the event of damage to any portion\ of the database-caused by human error, say, or a
failure in the hardware or supporting operating system-it is essential to be able to repair
the data concerned with a minimum of delay and with as little effect as possible on the
rest of the system.
The DBA must define and implement an appropriate recovery strategy to recover he
database from all types of failures.
• Plans storage structures and access strategies:
The DBA must also decide how the data is to be represented in the database, and must
specify the representation by writing the storage structure definition (using the internal
data definition language).
In addition, the associated mapping between the storage structure definition and the
conceptual schema must also be specified.
● Monitoring Performance
The DBA has to continuously monitor the performance of the queries and take
measures to optimize all the queries in the application.
The DBA is responsible for so organizing the system as to get the performance that is
"best for the enterprise," and for making the appropriate adjustments as requirements
change.
Understanding SQL
Syntax and Semantics: SQL commands have a defined syntax and
semantic rules that must be followed for the database management
system to understand and execute the commands.
Key SQL Operations
Creating Data Structures: SQL is used to define data structures,
including the creation of tables and the establishment of relationships
between them using `CREATE TABLE` and `ALTER
TABLE` statements.
CRUD Operations: This acronym refers to the four basic types of
SQL operations: Create (`INSERT`), Read (`SELECT`), Update
(`UPDATE`), and Delete (`DELETE`).
SQL in Practice
Querying Data: Crafting `SELECT` statements is fundamental. It may
involve specifying columns, conditions (`WHERE` clause), and joining
tables to combine records.
Data Modification: This involves changing existing data, adding new data,
or removing data using `INSERT`, `UPDATE`, and `DELETE`.
Transaction Control: SQL includes commands like `BEGIN
TRANSACTION`, `COMMIT`, and `ROLLBACK` to manage transaction
processing, ensuring data integrity.
Advanced SQL Concepts
Subqueries and Joins: Complex queries may involve nested queries, known
as subqueries, or combining rows from two or more tables based on a
related column between them.
Stored Procedures and Triggers: These are SQL codes that can be saved and
executed as needed. Stored procedures are manually triggered, while triggers
are automatic responses to certain events in the database.
(ii) Query By Example (QBE)
Query By Example (QBE) is a method of querying databases where the
user provides an example of the data they are looking for, as opposed to
writing a structured query.
Principles of QBE
Template-Filled Queries: The user fills in a grid or form, and the system
generates a query based on this "template".
Graphical Interface: Typically, QBE interfaces are graphical, making them
more approachable than command-line tools.
Using QBE
Fields as Query Parameters: Users can enter conditions into fields, which
correspond to query conditions.
Logical Connectives: Logical operators like 'AND', 'OR', and 'NOT'
can be represented graphically, aiding users in constructing complex
queries.
QBE in Complex Queries
Joining Tables: QBE can handle joins by allowing users to fill out forms for
multiple tables simultaneously.
Aggregation and Grouping: QBE tools may also offer a way to perform
operations like 'SUM', 'COUNT', etc., through a graphical interface.
(iv) Visual Queries
Visual queries provide an intuitive interface for database querying,
ideal for users who may not be familiar with query languages.
Characteristics of Visual Queries
Direct Manipulation Interfaces: These systems often involve dragging and
dropping elements, linking objects, or using menus to construct queries.
Visualisation of Data Structures: They can also visualise tables, fields,
and relationships, helping users understand the database schema.
Creating Visual Queries
Building Blocks: Visual queries are often constructed by combining
"blocks" that represent tables, attributes, or operations.
Immediate Feedback: Many visual query systems provide immediate
visual feedback on the data that matches the current query state, aiding
in iterative query design.
(iii) Natural Language Interfaces
Natural language interfaces (NLIs) allow users to write queries in their
natural language, which the system then translates into executable queries.
Functionality of NLIs
Natural Language Processing (NLP): NLIs use NLP techniques to parse and
understand user input.
Query Translation: The interface translates the natural language input
into a formal query language like SQL.
Advantages and Challenges of NLIs
User-Friendly: NLIs are extremely user-friendly, allowing non-technical
users to query databases without knowing a query language.
Interpretation Issues: The main challenge is accurately interpreting the
user's intent, especially when the input is ambiguous or vague.
Examples and Practical Use
Conversational Queries: Users can ask questions like "Which employees
were hired last year?" and the NLI will process this into the corresponding
database query.
Clarification and Learning: Good NLIs will ask for clarification when
needed and may learn from user interactions to improve over time.
Practical Activities
Hands-on activities help solidify the theoretical aspects of database
interaction methods.
Activities for SQL
Query Crafting: Building increasingly complex `SELECT` statements with
various clauses (`WHERE`, `GROUP BY`, `HAVING`, `ORDER BY`) to
handle real-world data retrieval scenarios.
Data Manipulation Exercises: Practicing `INSERT, UPDATE`,
and `DELETE` operations to understand the impact of these commands on
database data.
Transactional Control Tasks: Implementing transactions to understand
how multiple operations can be executed as a single unit, and how
rollback mechanisms work to maintain data integrity.
Activities for QBE
Form Designing: Creating forms to represent different queries,
understanding how fields correspond to database columns.
Logical Operations Application: Applying logical operators within the QBE
forms to understand their effect on the outcome of the queries.
Multi-table Querying: Linking forms from different tables to perform
joint queries, gaining insights into how QBE handles relationships
between tables.
Activities for Visual Queries
Interface Navigation: Becoming familiar with the tools and features of a
visual query system through exploration.
Query Visualisation: Designing queries by dragging and dropping tables and
fields, and visually setting the criteria for data selection.
Result Interpretation: Observing how changes in the visual query
interface affect the results, promoting an understanding of query logic.
Activities for Natural Language Interfaces
Query Phrasing: Experimenting with different ways of asking the same
question and observing how the NLI interprets and translates them.
Ambiguity Resolution: Learning how to phrase queries clearly and
unambiguously to get the best results from an NLI.
Understanding Limitations: Identifying the limitations of NLIs by
trying out queries that are difficult for the system to interpret and
understanding why they fail.
A.3.3 Describe different methods of database recovery.
A.3.3 Data recovery
Data recovery is the” process of salvaging and handling the data through the data from
damaged, failed, corrupted, or inaccessible secondary storage media when it cannot be
accessed normally”. Often the data are being salvaged from storage media such as
internal or external hard disk drives, solid-state drives (SSD), USB flash drive, storage
tapes, CDs, DVDs, RAID (Redundant Array of Inexpensive/ Independent Disks), and
other electronics.
Data recovery scenario
1. The most common data recovery scenario involves an operating system failure,
accidental damage etc. (typically on a single-disk, single-partition, single-OS system),
in which case the goal is simply to copy all wanted files to another disk.
This can be easily accomplished using a Live CD, many of which provide a means
to mount the system drive and backup disks or removable media, and to move the
files from the system disk to the backup media with a file manager or optical disc
authoring software.
Such cases can often be mitigated by disk partitioning and consistently storing
valuable data files (or copies of them) on a different partition from the replaceable OS
system files.
2. Another scenario involves a disk-level failure, such as a compromised file system or
disk partition, or a hard disk failure. In any of these cases, the data cannot be easily
read.
Depending on the situation, solutions involve repairing the file system, partition table
or master boot record, or hard disk recovery techniques ranging from software-based
recovery of corrupted data, hardware-software based recovery of damaged service
areas (also known as the hard drive's "firmware"), to hardware replacement on a
physically damaged disk. If hard disk recovery is necessary, the disk itself has
typically failed permanently, and the focus is rather on a one-time recovery, salvaging
whatever data can be read.
3. In a third scenario, files have been "deleted" from a storage medium. Typically, the
contents of deleted files are not removed immediately from the drive; instead,
references to them in the directory structure are removed, and the space they occupy
is made available for later overwriting. In the meantime, the original file contents
remain, often in a number of disconnected fragments, and may be recoverable.
The term "data recovery" is also used in the context of forensic applications
or espionage, where data which have been encrypted or hidden, rather than damaged,
are recovered.
Recovery may be required due to
physical damage to the storage device or logical damage to the file system that prevents it
from being mounted by the host operating system (OS).
Physical damage
A wide variety of failures can cause physical damage to storage media.
CD-ROMs can have their metallic substrate or dye layer scratched off; hard disks can
suffer any of several mechanical failures, such as head crashes and failed
motors; tapes can simply break.
Physical damage always causes at least some data loss, and in many cases the logical
structures of the file system are damaged as well. Any logical damage must be dealt with
before files can be salvaged from the failed media.
Most physical damage cannot be repaired by end users. For example, opening a hard
disk drive in a normal environment can allow airborne dust to settle on the platter and
become caught between the platter and the read/write head, causing new head crashes
that further damage the platter and thus compromise the recovery process. Furthermore,
end users generally do not have the hardware or technical expertise required to make
these repairs.
Consequently, data recovery companies are often employed to salvage important data
with the more reputable ones using class 100 dust- & static-free cleanrooms.
Recovery techniques
Recovering data from physically damaged hardware can involve multiple techniques.
Some damage can be repaired by replacing parts in the hard disk. This alone may make
the disk usable, but there may still be logical damage. A specialized disk-imaging
procedure is used to recover every readable bit from the surface. Once this image is
acquired and saved on a reliable medium, the image can be safely analyzed for logical
damage and will possibly allow much of the original file system to be reconstructed.
Hardware repair
Media that has suffered a catastrophic electronic failure requires data recovery in order to
salvage its contents.
A common misconception is that a damaged printed circuit board (PCB) may be replaced
during recovery procedures by an identical PCB from a healthy drive. While this may
work in rare circumstances on hard drives manufactured before 2003, it will not work on
newer hard drives. Each hard drive has what is called a System Area. This portion of the
drive, which is not accessible to the end user, contains adaptive data that helps the drive
operate within normal parameters. One function of the System Area is to log defective
sectors within the drive; essentially telling the hard drive where it can and cannot write
data. The sector lists are also stored on various chips attached to the PCB, and they are
unique to each hard drive. If the data on the PCB do not match what is stored on the
platter, then the drive will not calibrate properly.
Logical damage
The term "logical damage" refers to situations in which the error is not a problem in the
hardware and requires software-level solutions.
Corrupt partitions and file systems, media errors
In some cases, data on a hard drive can be unreadable due to damage to the partition
table or file system, or to (intermittent) media errors.
Overwritten data
When data have been physically overwritten on a hard disk drive it is generally assumed
that the previous data are no longer possible to recover.
To guard against this type of data recovery, Gutmann and Colin Plumb designed a
method of irreversibly scrubbing data, known as the Gutmann method and used by
several disk-scrubbing software packages.
Solid-state drives (SSD) overwrite data differently from hard disk drives (HDD) which
makes at least some of their data easier to recover. Most SSDs use flash memory to store
data in pages and blocks, referenced by logical block addresses (LBA) which are
managed by the flash translation layer (FTL). When the FTL modifies a sector it writes
the new data to another location and updates the map so the new data appear at the target
LBA. This leaves the pre-modification data in place, with possibly many generations, and
recoverable by data recovery software.[11]
The ability to easily access data in a single location saves end-users time and effort
and enables cross-departmental collaboration.
Data easily accessed, analyzed, and implemented saves time, money, and effort.
All end users, departments, and locations can be sure they view the same data and
make business decisions according to the same information.
Accessible data is also powerful data. Moving at the speed of business means
making intelligent decisions based on relevant, current information. Database
continuous integration ensures that as data is updated in various locations, the
integration location – the data set accessed company-wide – keeps pace.
Every business needs to keep track of the items that it manufactures or sells (the stock).
The system that monitors the items in stock is called the stock control system.
E.g. in a store, the stock includes all of the items on the shelves and out the back in the
storeroom.
It is important that a business does not keep too much stock, nor too little.
● Too much stock costs money as you have to store it all somewhere
● Too much perishable stock (e.g. food) means that it may go bad before it is sold
● Too little stock means that you might run out of stock before the next delivery arrives
● Description
● Item price
● Stock level (the number of items held in stock)
● Minimum stock level (when stock falls below this, it needs to be reordered)
● Reorder quantity (how many items we should order each time)
When items are added in to stock (because a delivery has arrived) this is recorded in the stock
control system.
The code of the new items is input to the system (usually using a barcode scanner, or similar
technology). The record for the item is found in the stock database, or a new record is created,
and the stock level is increased.
In many stores, the POS system is directly linked to the stock control system, so that stock levels
are adjusted as soon as an item is sold.
Selling / Delivering Stock
When items are taken from stock (because they have been sold, or delivered somewhere) this is
recorded in the stock control system.
The code of the item is being sold/delivered is input to the system (usually using
a barcode scanner, or similar technology). The record for the item is found in the stock database,
and the stock level is decreased.
In many stores, the POS system is directly linked to the stock control system, so that stock levels
are adjusted as soon as an item is sold.
Stock control systems make it very easy for stock levels to be monitored, and for stock to be
reordered when it is running low.
The stock control system regularly goes through all the records in the stock database and checks
if the stock level is less than the minimum stock level.
If the stock is too low, it is reordered from the supplier. The quantity that is ordered is read from
the stock database (larger amounts for more popular items)
Use of databases in Health record
An electronic health record (EHR), or electronic medical record (EMR), is a systematic
collection of electronic health information about an individual patient or population. It is a record
in digital format that is theoretically capable of being shared across different health care settings.
In some cases this sharing can occur by way of network-connected, enterprise-wide information
systems and other information networks or exchanges.
EHRs may include a range of data, including demographics, medical history, medication and
allergies, immunization status, laboratory test results, radiology images, vital signs, personal
statistics like age and weight, and billing information.
The system is designed to represent data that accurately captures the state of the patient at all
times. It allows for an entire patient history to be viewed without the need to track down the
patient’s previous medical record volume and assists in ensuring data is accurate, appropriate and
legible. It reduces the chances of data replication as there is only one modifiable file, which
means the file is constantly up to date when viewed at a later date and eliminates the issue of lost
forms or paperwork. Due to all the information being in a single file, it makes it much more
effective when extracting medical data for the examination of possible trends and long term
changes in the patient.
Legal frameworks provide the foundation for data privacy practices. They are designed to protect
personal data and lay out the responsibilities of data holders.
This act is based on key principles that guide the processing and handling of personal data:
Lawfulness, fairness, and transparency: Data must be processed lawfully, fairly, and
transparently in relation to the data subject.
Purpose limitation: Collected for specific, explicit, and legitimate purposes and not
further processed in a manner that is incompatible with those purposes.
Data minimization: Adequate, relevant, and limited to what is necessary in relation to
the purposes for which they are processed.
Accuracy: Kept accurate and, where necessary, kept up to date.
Storage limitation: Kept in a form which permits identification of data subjects for no
longer than is necessary.
Integrity and confidentiality: Processed in a manner that ensures appropriate security of
the personal data.
The act outlines offences related to the unauthorized access to computer materials, with specific
focus on:
Role-Based Access Control (RBAC): Assigning permissions based on roles within the
organisation, ensuring individuals have access only to what is necessary for their job
functions.
Least Privilege Principle: Users should be given the minimum levels of access – or
permissions – needed to perform their job functions.
Deploying Encryption Techniques
End-to-End Encryption (E2EE): Ensuring that data is encrypted on the sender's system
or device and only the recipient is able to decrypt it.
Public Key Infrastructure (PKI): Using a pair of keys to encrypt and decrypt data,
which ensures that only the intended recipient can read the information.
Anonymisation: Processing personal data in such a manner that the data subject is not or
no longer identifiable.
Pseudonymisation: Replacing private identifiers with fake identifiers or pseudonyms.
This helps to reduce risks to the data subjects and help entities to comply with their data
protection obligations.
Automated Tools for Monitoring: Implementing software solutions that can detect
unusual patterns of access or transactions that could indicate a breach.
Data Lifecycle Management: Establishing policies for the timely and secure deletion of
data that is no longer required for the purpose it was collected for.
Regular Security Training: Organising training sessions for employees to ensure they
are aware of the latest data protection practices and threats.
Organisations must navigate a complex landscape of challenges while ensuring data privacy:
Data Protection Officers (DPOs): Many organisations are required to appoint a DPO to
oversee compliance with data protection laws.
Privacy by Design: Integrating core privacy considerations into all stages of the
development process of new products, services, or technologies.
International Considerations
Cross-Border Data Transfers: Managing the complexities of data privacy across
jurisdictions is a significant challenge, especially with varying international data
protection laws.
By thoroughly understanding and implementing the above frameworks and strategies,
organisations can aim to protect personal data to the highest standards. Data privacy is a dynamic
field, and staying informed and prepared is key to maintaining the integrity and confidentiality of
personal data. Through these measures, data holders can ensure they are fulfilling their ethical
and legal responsibilities, fostering trust, and mitigating the risks associated with data breaches
and misuse.
Data integrity
Data integrity refers to maintaining and assuring the accuracy and consistency of data over its
entire life-cycle and is a critical aspect to the design, implementation and usage of any system
which stores, processes, or retrieves data.
The term data integrity is broad in scope and may have widely different meanings depending on
the specific context – even under the same general umbrella of computing. This article provides
only a broad overview of some of the different types and concerns of data integrity.
Data integrity is the opposite of data corruption, which is a form of data loss.
The overall intent of any data integrity technique is the same:
ensure data is recorded exactly as intended (such as a database correctly rejecting
mutually exclusive possibilities,) and upon later retrieval,
ensure the data is the same as it was when it was originally recorded.
In short, data integrity aims to prevent unintentional changes to information.
Data integrity is not to be confused with data security, the discipline of protecting data
from unauthorized parties.
Any unintended changes to data as the result of a storage, retrieval or processing operation,
including malicious intent, unexpected hardware failure, and human error, is failure of data
integrity.
If the changes are the result of unauthorized access, it may also be a failure of data security.
Depending on the data involved this could manifest itself as benign as a single pixel in an image
appearing a different color than was originally recorded, to the loss of vacation pictures or a
business-critical database, to even catastrophic loss of human life in a life-critical system.
Data Security Challenges
This chapter presents an overview of data security requirements, and examines the full spectrum
of data security risks that must be countered. It then provides a matrix relating security risks to
the kinds of technology now available to protect your data. This chapter contains the following
sections:
● Understanding the Many Dimensions of System Security
● Fundamental Data Security Requirements
● Security Requirements in the Internet Environment
● A World of Data Security Risks
● A Matrix of Security Risks and Solutions
● The System Security Team
In an Internet environment, the risks to valuable and sensitive data are greater than ever
before. Figure 1-1 presents an overview of the complex computing environment which your data
security plan must encompass.
You must protect databases and the servers on which they reside; you must administer and
protect the rights of internal database users; and you must guarantee the confidentiality of
ecommerce customers as they access your database. With the Internet continually growing, the
threat to data traveling over the network increases exponentially.
To protect all the elements of complex computing systems, you must address security issues in
many dimensions, as outlined in Table 1-1:
Table 1-1 Dimensions of Data Security
Dimension Security Issues
Physical Your computers must be physically inaccessible to unauthorized users. This
means that you must keep them in a secure physical environment.
Personnel The people responsible for system administration and data security at your site
must be reliable. You may need to perform background checks on DBAs
before making hiring decisions.
Procedural The procedures used in the operation of your system must assure reliable data.
For example, one person might be responsible for database backups. Her only
role is to be sure the database is up and running. Another person might be
responsible for generating application reports involving payroll or sales data.
His role is to examine the data and verify its integrity. It may be wise to
separate out users' functional roles in data management.
Technical Storage, access, manipulation, and transmission of data must be safeguarded
by technology that enforces your particular information control policies.
Think carefully about the specific security risks to your data, and make sure the solutions you
adopt actually fit the problems. In some instances, a technical solution may be inappropriate. For
example, employees must occasionally leave their desks. A technical solution cannot solve this
physical problem: the work environment must be secure.
The following sections describe the basic security standards which technology must ensure:
● Confidentiality
● Integrity
● Availability
Confidentiality
A secure system ensures the confidentiality of data. This means that it allows individuals to see
only the data which they are supposed to see. Confidentiality has several different aspects,
discussed in these sections:
● Privacy of Communications
● Secure Storage of Sensitive Data
● Authenticated Users
● Granular Access Control
Privacy of Communications
How can you ensure the privacy of data communications? Privacy is a very broad concept. For
the individual, it involves the ability to control the spread of confidential information such as
health, employment, and credit records. In the business world, privacy may involve trade secrets,
proprietary information about products and processes, competitive analyses, as well as marketing
and sales plans. For governments, privacy involves such issues as the ability to collect and
analyze demographic information, while protecting the confidentiality of millions of individual
citizens. It also involves the ability to keep secrets that affect the country's interests.
How can you ensure that data remains private, once it has been collected? Once confidential data
has been entered, its integrity and privacy must be protected on the databases and servers where
it resides.
Authenticated Users
How can you designate the persons and organizations who have the right to see data?
Authentication is a way of implementing decisions about whom to trust. Authentication methods
seek to guarantee the identity of system users: that a person is who he says he is, and not an
impostor.
How much data should a particular user see? Access control is the ability to cordon off portions
of the database, so that access to the data does not become an all-or-nothing proposition. A clerk
in the Human Relations department might need some access to the emp table--but he should not
be permitted to access salary information for the entire company. The granularity of access
control is the degree to which data access can be differentiated for particular tables, views, rows,
and columns of a database.
Note the distinction between authentication, authorization, and access control. Authentication is
the process by which a user's identity is checked. When a user is authenticated, he is verified as
an authorized user of an application. Authorization is the process by which the user's privileges
are ascertained. Access control is the process by which the user's access to physical data in the
application is limited, based on his privileges. These are critical issues in distributed systems. For
example, if JAUSTEN is trying to access the database, authentication would identify her as a a
valid user. Authorization would verify her right to connect to the database with Product Manager
privileges. Access control would enforce the Product Manager privileges upon her user session.
Integrity
A secure system ensures that the data it contains is valid. Data integrity means that data is
protected from deletion and corruption, both while it resides within the database, and while it is
being transmitted over the network. Integrity has several aspects:
● System and object privileges control access to application tables and system commands,
so that only authorized users can change data.
● Referential integrity is the ability to maintain valid relationships between values in the
database, according to rules that have been defined.
● A database must be protected against viruses designed to corrupt the data.
● The network traffic must be protected from deletion, corruption, and eavesdropping.
Availability
A secure system makes data available to authorized users, without delay. Denial-of-service
attacks are attempts to block authorized users' ability to access and use the system when needed.
System availability has a number of aspects:
The Internet environment expands the realm of data security in several ways, as discussed in
these sections:
● Promises and Problems of the Internet
● Increased Data Access
● Much More Valuable Data
● Larger User Communities
● Hosted Systems and Exchanges
Information is the cornerstone of e-business. The Internet allows businesses to use information
more effectively, by allowing customers, suppliers, employees, and partners to get access to the
business information they need, when they need it. Customers can use the Web to place orders
which can be fulfilled more quickly and with less error, suppliers and fulfillment houses can be
engaged as orders are placed, reducing or eliminating the need for inventory, and employees can
obtain timely information about business operations. The Internet also makes possible new,
innovative pricing mechanisms, such as online competitive bidding for suppliers, and online
auctions for customers. These Internet-enabled services all translate to reduced cost: there is less
overhead, greater economies of scale, and increased efficiency. The greatest promise of e-
business is more timely, more valuable information accessible to more people, at reduced cost of
information access.
The promise of e-business is offset by the security challenges associated with the
disintermediation of data access. Cutting out the middleman--removing the distributors,
wholesalers and retailers from the trading chain--too often cuts out the information security the
middleman provides. Likewise, the user community expands from a small group of known,
reliable users accessing data from the intranet, to thousands of users accessing data from the
Internet. Application hosting providers and exchanges offer especially stringent--and sometimes
contradictory--requirements of security by user and by customer, while allowing secure data
sharing among communities of interest.
While putting business systems on the Internet offers potentially unlimited opportunities for
increasing efficiency and reducing cost, it also offers potentially unlimited risk. The Internet
provides much greater access to data, and to more valuable data, not only to legitimate users, but
also to hackers, disgruntled employees, criminals, and corporate spies.
One of the chief e-business benefits of the Internet is disintermediation. The intermediate
information processing steps which employees typically perform in traditional businesses, such
as typing in an order received over the phone or by mail, are removed from the e-business
process. Users who are not employees and are thus outside the traditional corporate boundary
(including customers, suppliers, and partners) can have direct and immediate online access to
business information which pertains to them.
Making business information accessible by means of the Internet vastly increases the number of
users who may be able to access that information. When business is moved to the Internet, the
environment is drastically changed. Companies may know little or nothing about the users
(including, in many cases, employees) who are accessing their systems. Even if they know who
their users are, it may be very difficult for companies to deter users from accessing information
contrary to company policy. It is therefore important that companies manage access to sensitive
information, and prevent unauthorized access to that information before it occurs.
E-business relies not only on making business information accessible outside the traditional
company, it also depends on making the best, most up-to-date information available to users
when they need it. For example, companies can streamline their operations and reduce overhead
by allowing suppliers to have direct access to consolidated order information. This allows
companies to reduce inventory by obtaining exactly what they need from suppliers when they
need it. Companies can also take advantage of new pricing technology, such as online
competitive bidding by means of exchanges, to obtain the best price from suppliers, or offer the
best price to consumers.
Streamlining information flow through the business system allows users to obtain better
information from the system. In the past, data from external partners, suppliers, or customers was
often entered into the system through inefficient mechanisms that were prone to error and delay.
For example, many companies accepted the bulk of their orders by phone, letter, or fax, and this
information was typed in by clerks or sales people. Even when electronic data interchange
mechanisms existed, they were typically proprietary and difficult to integrate with companies'
internal data infrastructure. Now, businesses that allow other businesses and consumers to submit
and receive business information directly through the Internet can expect to get more timely,
accurate, and valuable information, at less expense than if traditional data channels were used.
Formerly, when information was entered into a business system, it was often compartmentalized.
Information maintained by each internal department, such as sales, manufacturing, distribution,
and finance, was kept separate, and was often processed by physically separate and incompatible
databases and applications--so-called "islands of information". This prevented businesses from
taking full advantage of the information they already had, since it was difficult for different
departments to exchange information when it was needed, or for executives to determine the
latest and most accurate status of the business. Companies have found that linking islands of
information and consolidating them where possible, allows users to obtain better information,
and to get more benefit from that information. This makes the information more valuable.
Improving the value of data available to legitimate users generally improves its value to intruders
as well. This increases the potential rewards to be gained from unauthorized access to that data,
and the potential damage that can be done to the business if the data were corrupted. In other
words, the more effective an e-business system is, the greater the need to protect it against
unauthorized access.
The sheer size of the user communities which can access business systems by way of the Internet
not only increases the risk to those systems, but also constrains the solutions which can be
deployed to address that risk. The Internet creates challenges in terms of scalability of security
mechanisms, management of those mechanisms, and the need to make them standard and
interoperable.
Scalability
Security mechanisms for Internet-enabled systems must support much larger communities of
users than systems which are not Internet-enabled. Whereas the largest traditional enterprise
systems typically supported thousands of users, many Internet-enabled systems have millions of
users.
Manageability
Traditional mechanisms for identifying users and managing their access, such as granting each
user an account and password on each system she accesses, may not be practical in an Internet
environment. It rapidly becomes too difficult and expensive for system administrators to manage
separate accounts for each user on every system.
Interoperability
Unlike traditional enterprise systems, where a company owns and controls all components of the
system, Internet-enabled e-business systems must exchange data with systems owned and
controlled by others: by customers, suppliers, partners, and so on. Security mechanisms deployed
in e-business systems must therefore be standards-based, flexible, and interoperable, to ensure
that they work with others' systems. They must support thin clients, and work in multitier
architectures.
The principal security challenge of hosting is keeping data from different hosted user
communities separate. The simplest way of doing this is to create physically separate systems for
each hosted community. The disadvantage of this approach is that it requires a separate
computer, with separately installed, managed, and configured software, for each hosted user
community. This provides little in the way of economies of scale to a hosting company.
Several factors can greatly reduce costs to hosting service providers. These factors include
mechanisms which allow multiple user communities to share a single hardware and software
instance; mechanisms which separate data for different user communities; and ways to provide a
single administrative interface for the hosting provider.
Exchanges have requirements for both data separation and data sharing. For example, an
exchange may ensure that a supplier's bid remains unviewable by other suppliers, yet allow all
bids to be evaluated by the entity requesting the bid. Furthermore, exchanges may also support
communities of interest in which groups of organizations can share data selectively, or work
together to provide such things as joint bids.
The integrity and privacy of data are at risk from unauthorized users, external sources listening in
on the network, and internal users giving away the store. This section explains the risky
situations and potential attacks that could compromise your data.
● Data Tampering
● Eavesdropping and Data Theft
● Falsifying User Identities
● Password-Related Threats
● Unauthorized Access to Tables and Columns
● Unauthorized Access to Data Rows
● Lack of Accountability
● Complex User Management Requirements
Data Tampering
Privacy of communications is essential to ensure that data cannot be modified or viewed in
transit. Distributed environments bring with them the possibility that a malicious third party can
perpetrate a computer crime by tampering with data as it moves between sites.
In a data modification attack, an unauthorized party on the network intercepts data in transit and
changes parts of that data before retransmitting it. An example of this is changing the dollar
amount of a banking transaction from $100 to $10,000.
In a replay attack, an entire set of valid data is repeatedly interjected onto the network. An
example would be to repeat, one thousand times, a valid $100 bank account transfer transaction.
Data must be stored and transmitted securely, so that information such as credit card numbers
cannot be stolen.
Over the Internet and in Wide Area Network (WAN) environments, both public carriers and
private network owners often route portions of their network through insecure land lines,
extremely vulnerable microwave and satellite links, or a number of servers. This situation leaves
valuable data open to view by any interested party. In Local Area Network (LAN) environments
within a building or campus, insiders with access to the physical wiring can potentially view data
not intended for them. Network sniffers can easily be installed to eavesdrop on network traffic.
Packet sniffers can be designed to find and steal user names and passwords.
You need to know your users. In a distributed environment, it becomes more feasible for a user
to falsify an identity to gain access to sensitive and important information. How can you be sure
that user Pat connecting to Server A from Client B really is user Pat?
In addition, malefactors can hijack connections. How can you be sure that Client B and Server A
are what they claim to be? A transaction that should go from the Personnel system on Server A
to the Payroll system on Server B could be intercepted in transit and routed instead to a terminal
masquerading as Server B.
Identity theft is becoming one of the greatest threats to individuals in the Internet environment.
Criminals attempt to steal users' credit card numbers, and then make purchases against the
accounts. Or they steal other personal data, such as checking account numbers and driver's
license numbers, and set up bogus credit accounts in someone else's name.
Nonrepudiation is another identity concern: how can a person's digital signature be protected? If
hackers steal someone's digital signature, that person may be held responsible for any actions
performed using their private signing key.
Password-Related Threats
In large systems, users must remember multiple passwords for the different applications and
services that they use. For example, a developer can have access to a development application on
a workstation, a PC for sending e-mail, and several computers or intranet sites for testing,
reporting bugs, and managing configurations.
Users typically respond to the problem of managing multiple passwords in several ways:
● They may select easy-to-guess passwords--such as a name, fictional character, or a word
found in a dictionary. All of these passwords are vulnerable to dictionary attacks.
● They may also choose to standardize passwords so that they are the same on all machines
or Web sites. This results in a potentially large exposure in the event of a compromised
password. They can also use passwords with slight variations that can be easily derived
from known passwords.
● Users with complex passwords may write them down where an attacker can easily find
them, or they may just forget them--requiring costly administration and support efforts.
All of these strategies compromise password secrecy and service availability. Moreover,
administration of multiple user accounts and passwords is complex, time-consuming, and
expensive.
The database may contain confidential tables, or confidential columns in a table, which should
not be available indiscriminately to all users authorized to access the database. It should be
possible to protect data on a column level.
Certain data rows may contain confidential information which should not be available
indiscriminately to users authorized to access the table.
You need granular access control--a way to enforce confidentiality on the data itself. For
example, in a shared environment businesses should only have access to their own data;
customers should only be able to see their own orders. If the necessary compartmentalization is
enforced upon the data, rather than added by the application, then it cannot be bypassed by users.
Systems must therefore be flexible: able to support different security policies depending on
whether you are dealing with customers or employees. For example, you may require stronger
authentication for employees (who can see more data) than you do for customers. Or, you may
allow employees to see all customer records, while customers can only see their own records.
Lack of Accountability
If the system administrator is unable to track users' activities, then users cannot be held
responsible for their actions. There must be some reliable way to monitor who is performing
what operations on the data.
Systems must often support thousands of users, or hundreds of thousands of users: thus they
must be scalable. In such large-scale environments, the burden of managing user accounts and
passwords makes your system vulnerable to error and attack. You need to know who the user
really is--across all tiers of the application--to have reliable security.
Multitier Systems
This problem becomes particularly complex in multitier systems. Here, and in most packaged
applications, the typical security model is that of One Big Application User. The user connects to
the application, and the application (or application server) logs on and provides complete access
for everyone, with no auditing and unlimited privileges. This model places your data at risk--
especially in the Internet, where your Web server or application server depends upon a firewall.
Firewalls are commonly vulnerable to break-ins.
To meet the challenges of scale in security administration, you should be able to centrally
manage users and privileges across multiple applications and databases by using a directory
based on industry standards. This can reduce system management costs and increase business
efficiency.
Further, creating and building separate databases for multiple application subscribers is not a
cost-efficient model for an application service provider. While technically possible, the separate
database model would quickly become unmanageable. To be successful, a single application
installation should be able to host multiple companies--and be administered centrally.
The primary differences between data mining and data matching are the system designs,
methodology used, and the purpose.
Data mining is the use of pattern recognition to identity trends with a sample data.
Data matching is the process of extracting and storing data to allow easier reporting.
Data-matching is the large scale comparison of records or files collected or held for different
purposes, with a view to identifying matters of interest. Data-matching can be conducted for a
number of purposes, including detecting errors and illegal behaviour, locating individuals,
ascertaining whether a particular individual is eligible to receive a benefit, and facilitate debt
collection.
Data-mining has been defined as ‘a set of automated techniques used to extract buried or
previously unknown pieces of information from large databases. Data-mining can be used in
different contexts to achieve different goals. For example, it is increasingly used by organisations
to enable them to ‘design effective sales campaigns, precision targeted marketing plans, and
develop products to increase sales and profitability.
Data-mining can also be used by law enforcement agencies to investigate criminal activities. For
example, in 2006 it was reported that the National Security Agency in the United States was
collecting telephone records of millions of Americans to analyse calling patterns in an effort to
detect terrorist activities.
(1) the data are prepared (or ‘scrubbed’) for use in the data-mining process;
Data-matching and data-mining practices that involve personal information raise a number of
privacy concerns. A major concern is that the practices can reveal large amounts of previously
unknown personal information about individuals. This concern is exacerbated by the fact that
data-matching or data-mining can occur without the knowledge or consent of the data subject,
thereby limiting the ability of the data subject to seek access to information derived from a data-
matching or data-mining program.
Another concern relates to the accuracy of the data derived from a data-matching or data-mining
process. Data-matching and data-mining involve using information collected for different
purposes and in different contexts. If information is incorrect or incomplete at the time of
collection, or ceases to be accurate some time after collection, the information generated by the
data-matching or data-mining process will be inaccurate. In the case of data-mining, an
additional concern is that it is often difficult to inform the data subject of the exact purpose for
which his or her personal information is to be collected or used. This is because data-mining
activities aim to discover previously unknown information. Further, there is concern about the
storage of large amounts of personal information gathered for the purpose of data-matching or
data-mining