Data Knowledge Management
Data Knowledge Management
MANAGEMENT
CONTENTS
❖Introduction
❖Managing Data
❖Database Approach
❖Big Data
❖Data Warehouse and Data Mart
❖Knowledge Management
DATA & KNOWLEDGE MANAGEMENT -
INTRODUCTION
• Data, information, and knowledge are terms that are often used interchangeably, but they have distinct meanings,
especially in the context of information systems and technology.
• Data
▪ Data refers to raw, unorganized facts or figures that are collected and stored. It can be in the form of
numbers, text, images, or any other type of input.
▪ Data, by itself, lacks context and meaning. It is the most basic form of representation and requires further
processing to become useful. Few Example of data :
✓ A spreadsheet containing sales figures
✓ A database with customer information
✓ Temperature readings from a weather station
✓ GPS coordinates from a mobile device
DATA & KNOWLEDGE MANAGEMENT -
INTRODUCTION
• Knowledge
▪ Knowledge goes beyond information in that it involves understanding and expertise. It is the result of gaining
insights, experience, and being able to apply information in a meaningful way.
▪ Knowledge is the culmination of information and personal understanding, allowing individuals to make
informed judgments and take effective action.
▪ Examples of knowledge include:
✓ A doctor diagnosing and treating a patient based on their symptoms
✓ A chef creating a new recipe by combining different ingredients and cooking techniques
✓ A lawyer using legal precedents to argue a case in court
✓ Customer support utilizing the company processes and procedures to answer a client question
MANAGING DATA
• Managing Data
• All IT applications require data.
• These data should be of high quality, meaning that they should be accurate, complete, timely, consistent,
accessible, relevant, and concise.
• Unfortunately, the process of acquiring, keeping, and managing data is becoming increasingly difficult.
DIFFICULTIES IN MANAGING DATA
• The amount of data is increasing exponentially with time. Much historical data must be kept for a long time, and
new data are added rapidly. For example, to support millions of customers, large retailers such as Walmart must
manage many petabytes of data. (A petabyte is approximately 1,000 terabytes, or trillions of bytes).
• Data are also scattered throughout organizations, and they are collected by many individuals using various
methods and devices. These data are frequently stored in numerous servers and locations and in different
computing systems, databases, formats, and human and computer languages.
• Another problem is that data are generated from multiple sources: internal sources (for example, corporate
databases and company documents); personal sources (for example, personal thoughts, opinions, and
experiences); and external sources (for example, commercial databases, government reports, and corporate
websites).
DIFFICULTIES IN MANAGING DATA
• The amount of data is increasing exponentially with time. Much historical data must be kept for a long time, and
new data are added rapidly. For example, to support millions of customers, large retailers such as Walmart must
manage many petabytes of data. (A petabyte is approximately 1,000 terabytes, or trillions of bytes).
• Data are also scattered throughout organizations, and they are collected by many individuals using various
methods and devices. These data are frequently stored in numerous servers and locations and in different
computing systems, databases, formats, and human and computer languages.
• Another problem is that data are generated from multiple sources: internal sources (for example, corporate
databases and company documents); personal sources (for example, personal thoughts, opinions, and
experiences); and external sources (for example, commercial databases, government reports, and corporate
websites).
DIFFICULTIES IN MANAGING DATA
• Adding to these problems is the fact that new sources of data such as blogs, podcasts, tweets, Facebook posts,
YouTube videos, texts, and RFID tags and other wireless sensors are constantly being developed, and the data
these technologies generate must be managed. Also, the data become less current over time. For example,
customers move to new addresses or they change their names, companies go out of business or are bought,
new products are developed, employees are hired or fired, and companies expand into new countries.
• Data are also subject to data rot. Data rot refers primarily to problems with the media on which the data are
stored. Over time, temperature, humidity, and exposure to light can cause physical problems with storage media
and thus make it difficult to access data. The second aspect of data rot is that finding the machines needed to
access the data can be difficult. For example, it is almost impossible today to find 8-track players to listen to
music on. Consequently, a library of 8-track tapes has become relatively worthless, unless you have a
functioning 8-track player or you convert the tapes to a more modern medium such as DVDs.
DIFFICULTIES IN MANAGING DATA
• Data security, quality, and integrity are critical, yet they are easily jeopardized. Legal requirements relating to
data also differ among countries as well as among industries, and they change frequently.
• Another problem arises from the fact that, over time, organizations have developed information systems for
specific business processes, such as transaction processing, supply chain management, and customer relationship
management. Information systems that specifically support these processes impose unique requirements on
data, which results in repetition and conflicts across the organization. For example, the marketing function
might maintain information on customers, sales territories, and markets.
DATA GOVERNANCE
• Data governance is an approach to managing information across an entire organization. It involves a formal
set of business processes and policies that are designed to ensure that data are handled in a certain, well-
defined fashion. That is, the organization follows unambiguous rules for creating, collecting, handling, and
protecting its information. The objective is to make information available, transparent, and useful for the people
who are authorized to access it, from the moment it enters an organization until it is outdated and deleted.
• One strategy for implementing data governance is master data management. Master data management is a
process that spans all organizational business processes and applications. It provides companies with the ability
to store, maintain, exchange, and synchronize a consistent, accurate, and timely “single version of the truth” for
the company’s master data
DATA GOVERNANCE
• Master data are a set of core data, such as customer, product, employee, vendor, geographic location, and so
on, that span the enterprise information systems. It is important to distinguish between master data and
transaction data. Transaction data, which are generated and captured by operational systems, describe the
business’s activities, or transactions. In contrast, master data are applied to multiple transactions and are used to
categorize, aggregate, and evaluate the transaction data.
• Let’s look at an example of a transaction: You (Mary Jones) purchase one Samsung 42-inch plasma television, part
number 1234, from Bill Roberts at Best Buy, for $2,000, on April 20, 2013. In this example, the master data are
“product sold,” “vendor,” “salesperson,” “store,” “part number,” “purchase price,” and “date.” When specific
values are applied to the master data, then a transaction is represented. Therefore, transaction data would be,
respectively, “42-inch plasma television,” “Samsung,” “Best Buy,” “Bill Roberts,” “1234,” “$2,000,” and “April 20,
2013.”
BIG DATA - DEFINITION
• It is difficult to define Big Data. Here we present two descriptions of the phenomenon. First, the technology research firm Gartner
(www.gartner.com) defines Big Data as diverse, high-volume, high-velocity information assets that require new forms of processing
in order to enhance decision making, lead to insights, and optimize business processes. Second, the Big Data Institute
➢ Exhibit variety;
➢ Can be captured, processed, transformed, and analyzed in a reasonable amount of time only by sophisticated information systems.
BIG DATA - EXAMPLES
• Big Data has three distinct characteristics: volume, velocity, and variety. These characteristics distinguish Big
Data from traditional data:
1. Volume: We have noted the huge volume of Big Data. Consider machine-generated data, which are
generated in much larger quantities than nontraditional data. For example, sensors in a single jet engine
can generate 10 terabytes of data in 30 minutes. (See our discussion of the Internet of Things in
Chapter 8.) With more than 25,000 airline flights per day, the daily volume of data from just this single
source is incredible. Smart electrical meters, sensors in heavy industrial equipment, and telemetry from
automobiles compound the volume problem.
BIG DATA - CHARACTERISTICS
• Volume:We have noted the huge volume of Big Data. Consider machine-generated data,
2. Velocity: The rate at which data flow into an organization is rapidly increasing. Velocity is critical because it
increases the speed of the feedback loop between a company, its customers, its suppliers, and its business partners.
For example, the Internet and mobile technology enable online retailers to compile histories not only on final sales,
but on their customers’ every click and interaction. Companies that can quickly use that information—for
example, by recommending additional purchases—gain competitive advantage.
3. Variety: Traditional data formats tend to be structured and relatively well described, and they change slowly.
Traditional data include financial market data, point-of-sale transactions, and much more. In contrast, Big Data
formats change rapidly. They include satellite imagery, broadcast audio streams, digital music files, web page
content, scans of government documents, and comments posted on social networks.
BIG DATA - ISSUES
• Despite its extreme value, Big Data does have issues. In this section, we take a look at data integrity, data
quality, and the nuances of analysis that are worth noting.
• Big Data Can Come from Untrusted Sources. One of the characteristics of Big Data is variety,
meaning that Big Data can come from numerous, widely varied sources. These sources may be internal or
external to the organization. For example, a company might want to integrate data from unstructured
sources such as e-mails, call center notes, and social media posts with structured data about its customers
from its data warehouse. The question is, how trustworthy are those external sources of data? For
example, how trustworthy is a Tweet? The data may come from an unverified source. Furthermore, the
data itself, reported by the source, may be false or misleading.
BIG DATA - ISSUES
• Big Data Is Dirty. Dirty data refers to inaccurate, incomplete, incorrect, duplicate, or erroneous data.
Examples of such problems are misspelling of words, and duplicate data such as retweets or company
press releases that appear multiple times in social media. Suppose a company is interested in performing a
competitive analysis using social media data. The company wants to see how often a competitor’s product
appears in social media outlets as well as the sentiments associated with those posts. The company notices
that the number of positive posts about the competitor is twice as great as the number of positive posts
about itself. This finding could simply be a case of the competitor pushing out its press releases to multiple
sources; in essence, blowing its own horn. Alternatively, the competitor could be getting many people to
retweet an announcement.
BIG DATA - ISSUES
• Big Data Changes, Especially in Data Streams. Organizations must be aware that data quality in an
analysis can change, or the data themselves can change, because the conditions under which the data are
captured can change. For example, imagine a utility company that analyzes weather data and smart-meter
data to predict customer power usage. What happens when the utility is analyzing these data in real time
and it discovers that data are missing from some of its smart meters?
MANAGING BIG DATA
• Big Data makes it possible to do many things that were previously impossible; for example, spot business
trends more rapidly and accurately, prevent disease, track crime, and so on. When properly analyzed, Big
Data can reveal valuable patterns and information that were previously hidden because of the amount of
work required to discover them. Leading corporations, such as Walmart and Google, have been able to
process Big Data for years, but only at great expense. Today’s hardware, cloud computing ,and open-
source software make processing Big Data affordable for most organizations.
MANAGING BIG DATA
• Managing big data involves the collection, storage, processing, analysis, and visualization of large and complex datasets
that traditional data processing tools cannot handle efficiently. Here's a detailed overview of the key aspects of
managing big data:
1. Data Collection
✓ Sources: Data can be collected from various sources such as social media, IoT devices, transactional systems, and
more.
✓ Tools: Use tools like Apache Flume,Apache Kafka, and AWS Kinesis to stream and collect data in real-time.
2. Data Storage
✓ Storage Systems: Use distributed storage systems capable of handling large volumes of data. Examples include
Hadoop Distributed File System (HDFS),Amazon S3, Google Cloud Storage, and Microsoft Azure Blob Storage.
✓ Database Management Systems: NoSQL databases like Apache Cassandra, MongoDB, and HBase are often used
to store big data due to their scalability and flexibility.
MANAGING BIG DATA
3. Data Processing
✓ Batch Processing: Tools like Apache Hadoop and Apache Spark are used for processing large datasets in batches. They allow
for the distribution of data processing tasks across multiple nodes in a cluster.
✓ Stream Processing: For real-time data processing, tools like Apache Storm, Apache Flink, and Apache Kafka Streams are used
to process data as it arrives.
4. Data Analysis
✓ Data Warehousing: Solutions like Amazon Redshift, Google BigQuery, and Snowflake provide data warehousing capabilities
to store and analyze big data.
✓ Machine Learning: Big data can be analyzed using machine learning algorithms to uncover patterns and make predictions.
Tools like Apache Mahout,TensorFlow, and Spark MLlib are commonly used.
✓ Data Mining: Techniques such as clustering, classification, and association are used to extract valuable insights from big data.
MANAGING BIG DATA
5. Data Visualization
✓Visualization Tools: Tools like Tableau, Power BI, and D3.js are used to create interactive and informative
visualizations of big data.
✓Dashboards: Custom dashboards can be built to monitor key metrics and trends in real-time.
6. Data Security and Privacy
✓Encryption: Ensure data is encrypted both in transit and at rest to protect sensitive information.
✓Access Control: Implement strict access control policies to ensure that only authorized users can access
and manipulate data.
✓Compliance: Ensure compliance with data protection regulations such as GDPR, CCPA, and HIPAA.
MANAGING BIG DATA
• Data Warehouse
• A data warehouse is a large, centralized repository of integrated data from various sources. It is designed to support
business intelligence activities, such as querying and analysis. Data warehouses store historical data and are optimized
for read access and complex queries, making them ideal for reporting and data analysis.
• Key Characteristics
1. Subject-Oriented: Organized around major subjects such as sales, finance, or customer data.
2. Integrated: Combines data from various sources into a consistent format.
3. Non-Volatile: Data is stable and does not change frequently. Once data is entered into the warehouse, it is not
typically updated or deleted.
4. Time-Variant: Historical data is maintained, allowing for analysis over different periods.
DATAWAREHOUSE AND DATA MART
Architecture
• ETL Process: Data is extracted from source systems, transformed to ensure consistency and quality, and loaded into
the data warehouse.
• Data Storage: Utilizes a schema design, such as star schema or snowflake schema, to organize data.
• OLAP (Online Analytical Processing): Supports complex queries and multi-dimensional analysis.
Benefits
• Consolidates data from multiple sources, providing a unified view.
• Improves data quality and consistency.
• Enhances decision-making by providing historical insights.
• Optimizes performance for complex queries and reporting.
DATAWAREHOUSE AND DATA MART
• Data Mart
• A data mart is a subset of a data warehouse, focused on a specific business line or department, such as sales,
marketing, or finance. Data marts are designed to meet the specific needs of a particular group of users.
• Key Characteristics
1.Subject-Specific: Focuses on a single subject or area, providing targeted data.
2.Smaller in Scope: Contains a smaller volume of data compared to a data warehouse.
3.User-Friendly: Simplifies access to data for specific user groups, often with tailored interfaces.
4.Quicker Implementation: Easier and faster to implement than a full-scale data warehouse.
DATAWAREHOUSE AND DATA MART
• Knowledge management (KM) is a process that helps organizations manipulate important knowledge that
comprises part of the organization’s memory, usually in an unstructured format. For an organization to be
successful, knowledge, as a form of capital, must exist in a format that can be exchanged among persons.
• It aims to enhance organizational learning, innovation, and efficiency by managing both explicit and tacit knowledge.
• Key Components of Knowledge Management
1. Knowledge Creation and Capture
• Explicit Knowledge: Documented information such as manuals, reports, and databases.
• Tacit Knowledge: Personal know-how and experiences that are harder to formalize and communicate.
• Techniques: Knowledge can be captured through interviews, surveys, documentation, and collaboration tools.
KNOWLEDGE MANAGEMENT
• A functioning KMS follows a cycle that consists of six steps (see Figure
5.13). The reason the system is cyclical is that knowledge is dynamically
refined over time. The knowledge in an effective KMS is never finalized
because the environment changes over time and knowledge must be
updated to reflect these changes.The cycle works as follows:
1. Create knowledge. Knowledge is created as people determine new
ways of doing things or develop know-how. Sometimes external knowledge
is brought in.
2. Capture knowledge. New knowledge must be identified as valuable and
be represented in a reasonable way.
KNOWLEDGE MANAGEMENT CYCLE