0% found this document useful (0 votes)
18 views8 pages

Ch6 Ais6 Reviewer

Chapter 6 discusses the challenges of managing data in traditional file environments, including data redundancy, program-data dependence, lack of flexibility, poor security, and limited data sharing. It introduces Database Management Systems (DBMS) as a solution to centralize data, reduce redundancy, and improve data access and management. The chapter also covers the capabilities of relational DBMS, the importance of normalization, and the emergence of non-relational databases and cloud database services for handling large and diverse data sets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views8 pages

Ch6 Ais6 Reviewer

Chapter 6 discusses the challenges of managing data in traditional file environments, including data redundancy, program-data dependence, lack of flexibility, poor security, and limited data sharing. It introduces Database Management Systems (DBMS) as a solution to centralize data, reduce redundancy, and improve data access and management. The chapter also covers the capabilities of relational DBMS, the importance of normalization, and the emergence of non-relational databases and cloud database services for handling large and diverse data sets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

CHAPTER 6: FOUNDATIONS OF BUSINESS data redundancy and inconsistency, processing

INTELLIGENCE: DATABASES AND inflexibility, and wasted storage resources.


INFORMATION MANAGEMENT A. Data Redundancy and Inconsistency
Data Redundancy
6-1 What are the problems of managing data
o the presence of duplicate data in multiple
resources in a traditional file environment?
data files so that the same data are stored
in more than one place or location.
File Organization Terms and Concepts
o occurs when different groups in an
Bit organization independently collect the same
o represents the smallest unit of data a piece of data and store it independently of
computer can handle. each other.
o represents either a 0 or a 1. o wastes storage resources and also leads to
o can be grouped to form a byte to represent data inconsistency, where the same
one character, number, or symbol. attribute may have different values.
Byte
B. Program-Data Dependence
o a group of bits, represents a single character,
o refers to the coupling of data stored in files
which can be a letter, a number, or another
and the specific programs required to
symbol.
update and maintain those files such that
o can be grouped to form a field.
changes in programs require changes to the
Field data.
o a grouping of characters into a word, a group o Every traditional computer program has to
of words, or a complete number (such as a describe the location and nature of the data
person’s name or age). with which it works. In a traditional file
o related fields can be grouped to form a environment, any change in a software
record. program could require a change in the data
Record accessed by that program.
o a group of related fields, such as the
student’s name, the course taken, the date, C. Lack of Flexibility
and the grade, comprises a record. o A traditional file system can deliver routine
o describes an entity. scheduled reports after extensive
File programming efforts, but it cannot deliver ad
o a group of records of the same type. hoc reports or respond to unanticipated
o related files can be organized into a information requirements in a timely fashion.
o The information required by ad hoc requests
database.
is somewhere in the system but may be too
expensive to retrieve.
o Several programmers might have to work
for weeks to put together the required data
items in a new file.
D. Poor Security
o Because there is little control or
management of data, access to and
dissemination of information may be out of
control.
o Management may have no way of knowing
who is accessing or even making changes
to the organization’s data.
E. Lack of Data Sharing and Availability
o Because pieces of information in different
files and different parts of the organization
cannot be related to one another, it is
virtually impossible for information to be
shared or accessed in a timely manner.
o Information cannot flow freely across
different functional areas or different parts of
Entity the organization.
o a person, place, thing, or event on which we o If users find different values of the same
store and maintain information. piece of information in two different
Attribute systems, they may not want to use these
o Each characteristic or quality describing a systems because they cannot trust the
particular entity. accuracy of their data.
A database management system (DBMS)
Problems with the Traditional File Environment
solves these problems with software that permits
The use of a traditional approach to file centralization of data and data management so that
processing encourages each functional area in a businesses have a single consistent source for all
corporation to develop specialized applications. their data needs. Using a DBMS minimizes
redundant and inconsistent files.
Each application requires a unique data
file (its own files) that is likely to be a subset of the
6-2 What are the major capabilities of database
master file and its own computer program to
management systems (DBMS), and why is a
operate. These subsets of the master file lead to
relational DBMS so powerful?
field. Each table also contains a key field to
A more rigorous definition of a database is uniquely identify each record for retrieval or
a collection of data organized to serve many manipulation.
applications efficiently by centralizing the data and
controlling redundant data.
Rather than storing data in separate files for
each application, data appear to users as being
Microsoft Access is a relational DBMS for desktop
stored in only one location. A single database
systems.
services multiple applications.
DB2, Oracle Database, and Microsoft SQL
Database Management Systems (DBMS) Server are relational DBMS for large mainframes
o software that permits an organization to and midrange computers.
centralize data, manage them efficiently, and MySQL is a popular open-source DBMS.
provide access to the stored data by Row
application programs. o the actual information about a single
o acts as an interface between application
supplier that resides in a table.
programs and the physical data files. o commonly referred to as records, or in very
o relieves the programmer or end user from the
technical terms, as tuples.
task of understanding where and how the Key Field
data are actually stored by separating the o the unique identifier for all the information in
logical and physical views of the data.
any row of the table.
 Logical View presents data as they would
 Primary Key
be perceived by end users or business
 Each table in a relational database
specialists.
has one field that is designated as
 Physical view shows how data are
its primary key.
actually organized and structured on
 This primary key cannot be
physical storage media.
duplicated.
o makes the physical database available for
 Foreign Key
different logical views required by users.  It is essentially a lookup field to look
When the application program calls for a up data about the supplier of a
data item, such as gross pay, the DBMS finds this specific part.
item in the database and presents it to the
application program. Using traditional data files, the
programmer would have to specify the size and
format of each data element used in the program
and then tell the computer where they were
located.

How a DBMS Solves the Problems of the


Traditional File Environment
Database Management Systems (DBMS)
o reduces data redundancy and inconsistency
by minimizing isolated files in which the
same data are repeated.
o may not enable the organization to eliminate
data redundancy entirely, but it can help
control redundancy.
o can help the organization ensure that every
occurrence of redundant data has the same
values. Relational database tables can be
o uncouples programs and data, enabling combined easily to deliver data required by users,
data to stand on their own. provided that any two tables share a common data
o enables the organization to centrally element.
manage data, their use, and security.
Three Basic Operations of a Relational DBMS
Data sharing throughout the organization is
1. Select Operation
easier because the data are presented to users as
o creates a subset consisting of all records
being in a single location rather than fragmented in
in the file that meet stated criteria.
many different systems and files.
o in other wors, it creates a subset of rows
Relational DBMS that meet certain criteria.
o the most popular type of DBMS today for 2. Join Operation
PCs as well as for larger computers and o combines relational tables to provide the
mainframes. user with more information than is
o has been the primary method for organizing available in individual tables.
and maintaining data in information systems 3. Project Operation
because it is so flexible and accessible. o creates a subset consisting of columns in
o represent data as two-dimensional tables a table, permitting the user to create new
(called relations). tables that contain only the information
o Tables may be referred to as files. Each required.
table contains data on an entity and its
Capabilities of Database Management Systems
attributes. Each row represents a record,
and each column represents an attribute or
A DBMS includes capabilities and tools for  how the organization will need to change
organizing, managing, and accessing the data in to manage data from a companywide
the database. The principal capabilities of a DBMS: perspective.
a. Data Definition Language o The database requires both a conceptual
o capability to specify the structure of the design and a physical design.
content of the database. a. The conceptual/logical design of a
o it would be used to create database database is an abstract model of the
tables and to define the characteristics database from a business perspective.
of the fields in each table. b. The physical design shows how the
b. Data dictionary database is actually arranged on direct
o where the information about the access storage devices.
database would be documented.
o an automated or manual file that stores A. Normalization and Entity-Relationship
definitions of data elements and their Diagrams
characteristics. Conceptual Database Design
o Microsoft Access has a rudimentary o describes how the data elements in the data
data dictionary capability that displays base are to be grouped.
information about the size, format, and o the design process identifies relationships
other characteristics of each field in a among data elements and the most efficient
database. way of grouping data elements together to
c. Data Manipulation Language meet business information requirements.
o a specialized language for accessing o the process also identifies redundant data
and manipulating the data in the elements, and the groupings of data elements
database. required for specific application programs.
o used to add, change, delete, and o groups of data are organized, refined, and
retrieve the data in the database. streamlined until an overall logical view of the
o contains commands that permit end relationships among all the data in the data
users and programming specialists to base emerges.
extract data from the database to satisfy
information requests and develop To use a relational database model
applications. effectively, complex groupings of data must be
streamlined to minimize redundant data elements
A. Querying and Reporting and awkward many to-many relationships.
o The most prominent data manipulation
language today is Structured Query The process of creating small, stable, yet
Language (SQL). flexible and adaptive data structures from complex
o Users of DBMS for large and midrange groups of data is called normalization.
computers, such as DB2, Oracle, or SQL An unnormalized relation contains
Server, would employ SQL to retrieve repeating groups. This relationship contains what
information they needed from the database. are called repeating data groups because there can
o Microsoft Access also uses SQL, but it be many parts on a single order to a given supplier.
provides its own set of user-friendly tools for
querying databases and for organizing data
from databases into more polished reports.
o In Microsoft Access, you will find features
that enable users to create queries by
identifying the tables and fields they want After normalization, the original relation
and the results and then selecting the rows (ORDER) has been broken down into four smaller
from the database that meet particular relations. A more efficient way to arrange the data
criteria. These actions in turn are translated is to break down ORDER into smaller relations,
into SQL commands. each of which describes a single entity.
o Microsoft Access and other DBMS include
capabilities for report generation so that the
data of interest can be displayed in a more
structured and polished format than would
be possible just by querying.
o Crystal Reports is a popular report
generator for large corporate DBMS,
although it can also be used with Access.
o Access also has capabilities for developing
desktop system applications. These include Relational database systems try to enforce
tools for creating data entry screens, referential integrity rules to ensure that
reports, and developing the logic for relationships between coupled tables remain
processing transactions. consistent. When one table has a foreign key that
points to another table, you may not add a record to
Designing Databases the table with the foreign key unless there is a
o To create a database, you must understand: corresponding record in the linked table. A well-
 the relationships among the data, designed relational database will not have many-to-
 the type of data that will be maintained in many relationships, and all attributes for a specific
the database, entity will only apply to that entity. It will try to
 how the data will be used, enforce referential integrity rules to ensure that
relationships between coupled tables remain
consistent.
information without requiring tedious, expensive,
Entity-Relationship Diagram and time-consuming database mapping.
Database designers document their data Non-relational databases are becoming
model with an entity-relationship diagram. An entity popular for managing types of data that can’t be
relationship diagram graphically depicts the handled easily by the relational data model. Both
relationship between entities (tables) in a relational relational and non-relational database products are
database. available as cloud computing services.

B. Cloud Databases
Amazon and other cloud computing vendors
provide relational database services as well.
a. Amazon Relational Database Service
(Amazon RDS) offers MySQL, SQL Server,
This diagram illustrates the relationship
Oracle Database, PostgreSQL, MariaDB, or
between the entities SUPPLIER, PART,
Amazon Aurora DB (compatible with
LINE_ITEM, and ORDER.
MySQL) as database engines. Pricing is
o The boxes represent entities.
based on usage.
o The lines connecting the boxes represent b. Oracle has its own Database Cloud
relationships. Services using its relational Oracle
 A line connecting two entities that ends in Database.
two short marks designates a one-to one c. Microsoft Windows SQL Azure Database
relationship. is a cloud-based relational database service
 A line connecting two entities that ends based on Microsoft’s SQL Server DBMS.
with a crow’s foot topped by a short mark
indicates a one-to-many relationship. Cloud-based data management services
have special appeal for web-focused start-ups or
If the business doesn’t get its data model small to medium-sized businesses seeking
right, the system won’t be able to serve the database capabilities at a lower price than in-house
business well. The company’s systems will not be database products.
as effective as they could be because they’ll have
to work with data that may be inaccurate, 6-3 What are the principal tools and
incomplete, or difficult to retrieve. technologies for accessing information from
Non-relational Databases and Databases in the databases to improve business performance
Cloud and decision making?
Cloud computing, unprecedented data A. Big Data
volumes, massive workloads for web services, and o these data may be unstructured or semi-
the need to store new types of data require structured and thus not suitable for
database alter natives to the traditional relational relational database products that organize
model of organizing data in the form of tables, data in the form of columns and rows.
columns, and rows. Companies are turning to o describes these data sets with volumes so
“NoSQL” non-relational database technologies for huge that they are beyond the ability of
this purpose. typical DBMS to capture, store, and
analyze.
Non-relational Database Management Systems
o doesn’t refer to any specific quantity but
o use a more flexible data model and are
usually refers to data in the petabyte and
designed for managing large data sets across
exabyte range—in other words, billions to
many distributed machines and for easily
trillions of records, all from different sources.
scaling up or down.
o produced in much larger quantities and
o useful for accelerating simple queries against
much more rapidly than traditional data.
large volumes of structured and unstructured
data, including web, social media, graphics, The Challenge of Big Data
and other forms of data that are difficult to o Businesses are interested in big data because
analyze with traditional SQL based tools. they can reveal more patterns and interesting
There are several different kinds of NoSQL relationships than smaller data sets, with the
databases, each with its own technical features and potential to provide new insights into
behavior. customer behavior, weather patterns, financial
a. Oracle NoSQL Database market activity, or other phenomena.
b. Amazon’s SimpleDB – one of the Amazon o Big data is also finding many uses in the
Web Services that run in the cloud. public sector.
c. SimpleDB – provides a simple web services o However, to derive business value from these
interface to create and store multiple data data, organizations need new technologies
sets, query data easily, and return the results. and tools capable of managing and analyzing
There is no need to predefine a formal nontraditional data along with their traditional
database structure or change that definition if enterprise data. They also need to know what
new data are added later. questions to ask of the data and limitations of
d. MongoDB Open-source NoSQL Database – big data. Capturing, storing, and analyzing big
quickly integrate disparate data on more than data can be expensive, and information from
100 million customers and deliver a big data may not necessarily help decision
consolidated view of each. makers. It’s important to have a clear
understanding of the problem big data will
The NoSQL database is able to use solve for the business.
structured, semi-structured, and unstructured
B. Business Intelligence Infrastructure
A contemporary infrastructure for business breaking down processing of huge
intelligence has an array of tools for obtaining data sets and assigning work to the
useful information from all the different types of various nodes in a cluster.
data used by businesses today, including semi- c. HBase, Hadoop’s non-relational
structured and unstructured big data in vast database, provides rapid access to
quantities. These capabilities include: the data stored on HDFS and a
transactional platform for running
1. Data Warehouses and Data Marts
high-scale real-time applications.
Data Warehouse o Hadoop can process large quantities of any
o a database that stores current and historical kind of data, including structured
data of potential interest to decision makers transactional data, loosely structured data
throughout the company. the data originate in such as Facebook and Twitter feeds,
many cores operational transaction systems, complex data such as web server log files,
such as systems for sales, customer and unstructured audio and video data.
accounts, and manufacturing, and may o Hadoop runs on a cluster of inexpensive
include data from website transactions. servers, and processors can be added or
o extracts current and historical data from removed as needed.
multiple operational systems inside the o Companies use Hadoop for analyzing very
organization. these data are combined with large volumes of data as well as for a
data from external sources and transformed staging area for unstructured and semi-
by correcting inaccurate and incomplete data structured data before they are loaded into
and restructuring the data for management a data warehouse.
reporting and analysis before being loaded o Yahoo uses Hadoop to track users’ behavior
into the data warehouse. so it can modify its home page to fit their
o makes the data available for anyone to interests.
access as needed, but the data cannot be o Life sciences research firm NextBio uses
altered. Hadoop and HBase to process data for
o provides a range of ad hoc and standardized pharmaceutical companies conducting
query tools, analytical tools, and graphical genomic research.
reporting facilities.
3. In-Memory Computing
Enterprise-wide Data Warehouses o It relies primarily on a computer’s main
o a central data warehouse serves the entire memory (RAM) for data storage.
organization, or they create smaller, Conventional DBMS use disk storage
decentralized warehouses called data marts. systems.
Data Mart o Users access data stored in system primary
o a subset of a data warehouse in which a memory, thereby eliminating bottlenecks
summarized or highly focused portion of the from retrieving and reading data in a
organization’s data is placed in a separate traditional, disk-based database and
database for a specific population of users. dramatically shortening query response
times.
2. Hadoop o In-memory processing makes it possible for
o Relational DBMS and data warehouse very large sets of data, amounting to the
products are not well suited for organizing size of a data mart or small data
and analyzing big data or data that do not warehouse, to reside entirely in memory.
easily fit into columns and rows used in their o Complex business calculations that used to
data models. take hours or days are able to be completed
o For handling unstructured and semi- within seconds, and this can even be
structured data in vast quantities, as well as accomplished using handheld devices.
structured data, organizations are using o Leading commercial products for in-memory
Hadoop. computing include SAP HANA and Oracle
o Hadoop is an open-source software Exalytics. Each provides a set of integrated
framework managed by the Apache software components, including in-memory
Software Foundation that enables database software and specialized analytics
distributed parallel processing of huge software, that run on hardware optimized for
amounts of data across inexpensive in-memory computing work.
computers.
o It breaks a big data problem down into sub- 4. Analytic Platforms
problems, distributes them among up to o Commercial database vendors have
thousands of inexpensive computer developed specialized high-speed analytic
processing nodes, and then combines the platforms using both relational and non-
result into a smaller data set that is easier to relational technology that are optimized for
analyze. analyzing large data sets.
o Hadoop consists of several key services: o Analytic platforms such as IBM PureData
a. Hadoop Distributed File System System for Analytics, feature
(HDFS) for data storage and preconfigured hardware-software systems
MapReduce for high-performance that are specifically designed for query
parallel data processing. HDFS links processing and analytics. IBM PureData
together the file systems on the System for Analytics features tightly
numerous nodes in a Hadoop cluster integrated database, server, and storage
to turn them into one big file system. components that handle complex analytic
b. Hadoop’s MapReduce was inspired queries 10 to 100 times faster than
by Google’s MapReduce system for traditional systems.
o Analytic platforms also include in-memory There are data mining applications for all
systems and NoSQL non-relational the functional areas of business and for
database management systems. government and scientific work.
o Analytic platforms are now available as o One popular use for data mining is to
cloud services. provide detailed analyses of patterns in
customer data for one-to-one marketing
C. Analytical Tools: Relationships, Patterns, campaigns or for identifying profitable
Trends customers.
1. Online Analytical Processing (OLAP) 3. Text Mining and Web Mining
o OLAP supports multidimensional data Unstructured Data
analysis, enabling users to view the same o most in the form of text files
data in different ways using multiple o is believed to account for more than 80
dimensions.
percent of useful organizational information.
o Each aspect of information—product,
o one of the major sources of big data that
pricing, cost, region, or time period—
firms want to analyze.
represents a different dimension.
o e-mail, memos, call center transcripts,
o OLAP represents relationships among data
survey responses, legal cases, patent
as a multidimensional structure, which can
descriptions, and service reports are all
be visualized as cubes of data and cubes
valuable for finding patterns and trends that
within cubes of data, enabling more
will help employees make better business
sophisticated data analysis.
decisions.
o OLAP enables users to obtain online
answers to ad hoc questions such as these
in a fairly rapid amount of time, even when Text Mining
the data are stored in very large databases. o available to help businesses analyze large
unstructured data sets consisting of text.
2. Data Mining
o able to extract key elements from
o Data mining is more discovery-driven.
unstructured big data sets, discover
o Data mining provides insights into corporate
patterns and relationships, and summarize
data that cannot be obtained with OLAP by the information.
finding hidden patterns and relationships in o analyze transcripts of calls to customer
large databases and inferring rules from
service centers to identify major service and
them to predict future behavior. The patterns
repair issues or to measure customer
and rules are used to guide decision making
sentiment about their company.
and forecast the effect of those decisions.
o Data mining analyzes large pools of data, Sentiment Analysis
including the contents of data warehouses, o able to mine text comments in an e-mail
to find patterns and rules that can be used message, blog, social media conversation,
to predict future behavior and guide or survey form to detect favorable and
decision making. unfavorable opinions about specific
o The types of information obtainable from subjects.
data mining include: Analytic Software
a. Associations: occurrences linked to a o analyzes customer service notes, e-mails,
single event. survey responses, and online discussions to
b. Sequences: events are linked over discover signs of dissatisfaction that might
time. cause a customer to stop using the
c. Classification: recognizes patterns company’s services.
that describe the group to which an o able to automatically identify the various
item belongs by examining existing
“voices” customers use to express their
items that have been classified and by
feedback (such as a positive, negative, or
infer ring a set of rules. It helps
conditional voice) to pinpoint a person’s
discover the characteristics of
intent to buy, intent to leave, or reaction to a
customers who are likely to leave and
specific product or marketing message.
can provide a model to help managers
predict who those customers are so Web
that the managers can devise special o another rich source of unstructured big data
campaigns to retain such customers. for revealing pat terns, trends, and insights
d. Clustering: works in a manner similar into customer behavior.
to classification when no groups have Web Mining
yet been defined. A data mining tool o discovery and analysis of useful patterns
can discover different groupings within
and information from the World Wide Web
data, such as finding affinity groups for
(or web)
bank cards or partitioning a database
o examining the structure of websites and
into groups of customers based on
demographics and types of personal activities of website users as well as the
investments. contents of webpages.
e. Forecasting: uses predictions in a o help them understand customer behavior,
different way. It uses a series of evaluate the effectiveness of a particular
existing values to forecast what other website, or quantify the success of a
values will be. marketing campaign.
o These systems perform high-level analyses o looks for patterns in data through content
of patterns or trends, but they can also drill mining, structure mining, and usage mining.
down to provide more detail when needed.
o the process of extracting knowledge from o provides an up-to-date online directory of
the content of webpages, which may include more than 700,000 suppliers of industrial
text, image, audio, and video data. products.
o examines data related to the structure of a o used to send out huge paper catalogs with
particular website. this information, now it provides this
o examines user interaction data recorded by information to users online via its website
a web server whenever requests for a and has become a smaller, leaner company.
website’s resources are received. Facebook (social networking service)
Databases and the Web o helps users stay connected with each other
and meet new people.
Because many back-end databases cannot
o features “profiles” with information on 1.6
interpret commands written in HTML, the web
server passes these requests for data to software billion active users with information about
that translates HTML commands into SQL so the themselves, including interests, friends,
commands can be processed by the DBMS photos, and groups with which they are
working with the database. affiliated.
o maintains a very large database to house
Conventional databases can be linked via and manage all of this content.
middleware to the web or a web interface to
facilitate user access to an organization’s internal 6-4 Why are information policy, data
data. administration, and data quality assurance
In a client/server environment, the DBMS essential for managing the firm’s data
resides on a dedicated computer called a database resources?
server. The DBMS receives the SQL requests and Establishing an Information Policy
provides the required data. Middleware transfers
information from the organization’s internal Every business, large and small, needs an
database back to the web server for delivery in the information policy. Your firm’s data are an important
form of a web page to the user. resource, and you don’t want people doing
whatever they want with them. You need to have
rules on how the data are to be organized and
maintained and who is allowed to view the data or
change them.
Information Policy
o specifies the organization’s rules for sharing,
disseminating, acquiring, standardizing,
classifying, and inventorying information.
o lays out specific procedures and
The middleware working between the web accountabilities, identifying which users and
server and the DBMS is an application server organizational units can share information,
running on its own dedicated computer. where information can be distributed, and
The application server software handles who is responsible for updating and
all application operations, including transaction maintaining the information.
processing and data access, between browser- o governs the maintenance, distribution, and
based computers and a company’s back-end use of information in the organization.
business applications or databases. The If you are in a small business, the
application server takes requests from the web information policy would be established and
server, runs the business logic to process implemented by the owners or managers. In a
transactions based on those requests, and provides large organization, managing and planning for
connectivity to the organization’s back-end systems information as a corporate resource often require a
or databases. formal data administration function.
Alternatively, the software for handling these Data Administration
operations could be a custom program or a CGI o responsible for the specific policies and
script. A CGI script is a compact program using
procedures through which data can be
the Common Gateway Interface (CGI) specification
managed as an organizational resource.
for processing data on a web server.
o these responsibilities include developing an
There are a number of advantages to using information policy, planning for data,
the web to access an organization’s internal overseeing logical database design and
databases: data dictionary development, and
o the web browser software is much easier to monitoring how information systems
use than proprietary query tools. specialists and end user groups use data.
o the web interface requires few or no changes
Data Governance
to the internal database. It costs much less to o used to describe many of these activities.
add a web interface in front of a legacy
o promoted by IBM, data governance deals
system than to redesign and rebuild the
with the policies and processes for
system to improve user access.
managing the availability, usability, integrity,
o accessing corporate databases through the
and security of the data employed in an
web is creating new efficiencies,
enterprise with special emphasis on
opportunities, and business models.
promoting privacy, security, data quality, and
ThomasNet.com (formerly Thomas Register) compliance with government regulations.
In close cooperation with users, the design
group establishes the physical database, the logical
relations among elements, and the access rules
and security procedures. The functions it performs
are called database administration.
Ensuring Data Quality
If a database is properly designed and
enterprise-wide data standards established,
duplicate or inconsistent data elements should be
minimal. Most data quality problems, however, such
as misspelled names, transposed numbers, or
incorrect or missing codes, stem from errors during
data input. The incidence of such errors is rising as
companies move their businesses to the web and
allow customers and suppliers to enter data into
their websites that directly update internal systems.
Before a new database is in place,
organizations need to identify and correct their
faulty data and establish better routines for editing
data once their data base is in operation.
Data Quality Audit
o analysis of data quality often begins with a
data quality audit.
o a structured survey of the accuracy and
level of completeness of the data in an
information system.
o can be performed by surveying entire data
files, surveying samples from data files, or
surveying end users for their perceptions of
data quality.

Data Cleansing/Data Scrubbing


o consists of activities for detecting and
correcting data in a database that are
incorrect, incomplete, improperly formatted,
or redundant.
o not only corrects errors but also enforces
consistency among different sets of data
that originated in separate information
systems.
o specialized data-cleansing software is
available to automatically survey data files,
correct errors in the data, and integrate the
data in a consistent companywide format.
Data Quality Problems
o not just business problems but also pose
serious problems for individuals, affecting
their financial condition and even their jobs.
A small minority of companies allow
individual departments to be in charge of
maintaining the quality of their own data. However,
best data administration practices call for
centralizing data governance, standardization of
organizational data, data quality maintenance, and
accessibility to data assets.
Firms must take special steps to make sure
they have a high level of data quality. These include
using enterprise-wide data standards, databases
designed to minimize inconsistent and redundant
data, data quality audits, and data cleansing
software.

You might also like