Business Data Analytics Part 3
Business Data Analytics Part 3
Source data
Tasks
1/ Plan Data Collection
3/ Collect Data
4/ Validate Data
Plan data collection
Planning
considerations
❖ what data is needed
❖ the availability of the data
❖ the need for historical data
❖ determining when and how the
data will be collected
❖ how the data will be validated
once collected
What is the difference between structured and
unstructured data?
Structured data is data that is organized, Unstructured data is the exact opposite of
well-thought-out and formatted, such as data structured data as it exists outside of any
residing in a database management system organized repository like a database.
(DBMS).
Structured data is easily accessed by initiating a Unstructured data takes on many forms and
query in a query language such as SQL (standard sources such as text from word processing
query language). documents, emails, social media sites, image,
audio, or video files.
Case study: sourcing data
Determine the data sets
A five Vs
assessment ❖ Volume
helps to ❖ Velocity
determine ❖ Variety
❖ Veracity
which datasets ❖ Value
to consider
Technique: Data modelling
Data models describe business entities and
relationships between them
Customers Orders Products
Data models describe business entities and
relationships between them
Customers Orders Products
ID Owner_ID
Name Address
First Normal Form
* Order ID
Product ID *
Quantity
Second Normal Form
* Order ID
Product ID *
Quantity
Second Normal Form
1 *
Customers Orders Products
* Order ID
Product ID *
Quantity
Third Normal Form
1 *
Customers Orders Products
* Order ID
Product ID *
Quantity
Third Normal Form
1 *
Customers Orders Products
Users Customers
● format ● format
● attributes of interest or potential interest ● new attributes that need to be created
● data type and data size of the attributes ● source attributes that need to be
transformed
● creation of new custom fields
Mapping
considerations ❖
❖
Which attributes will be migrated
Which new attributes need to be
created in the target repository
Target Source Rules ❖ Data size
Customers.Name Users.First_Name Concatenate with ❖ New custom defined attributes
Users.Last_Name space
that need manipulation or
Customers.Address Users.Address N/A
calculation
Customers.Mobile Users.Phone If starts with 04
Usage considerations
Strengths: Limitations:
Marketing team Total marketing Currency per $40000 per Martech system
budget time month
process ❖
❖
Get alignment
Get sign off
❖ Publish
❖ Maintain
Collect data
Collecting data involves the activities performed to
help with data setup, preparation, and collection.
Passive Data Collection: unobtrusive data Active Data Collection: actively seeking
collection from users in their day-to-day information from stakeholders for a specific goal.
transactions with the organization.
This type of data is available without an analytics This type of data is not readily available with the
objective in mind, and a large portion of such data organization (and requires e.g. surveys or
may already exist with the organization. self-reports).
Before data professionals begin collecting
large amounts of data, it may be necessary to
test the data collection approach by using a
small number of observations.
Technique: Extract, Transform, and
Load (ETL)
Core principles of ETL
Data store
a collection of data where data may be read repeatedly
and where it can be stored for future use.
Process
a manual or automated activity performed for a business
reason.
Data flow
a collection of data where data may be read repeatedly
and where it can be stored for future use.
Order
Customer Bill Ordering
Inventory Order
details details
Order
Inventory details
details
Aggregated
Reporting Manager
data
Validate data
Validating data
involves assessing
that the planned data
sources can and
should be used and,
when accessed, the
data obtained are
providing the types of
results expected.
Characteristics of high-quality data:
1/ Accuracy
2/ Completeness
3/ Consistency
4/ Uniqueness
5/ Timeliness
Two types of validation
Data validation may be performed by a data Business validation is performed by key
analyst, data scientist, or business analysis stakeholders with the authority to approve data
practitioner with sufficient skills to use the sources for use in analytics initiatives and the
necessary tools to access data and the underlying knowledge to assess data accuracy.
competencies to analyze the results