0% found this document useful (0 votes)
57 views45 pages

Business Analytics Unit I

Business analytics (BA) is a discipline that utilizes data analysis and statistical models to solve business problems and drive decision-making. It relies on data quality, skilled analysts, and a commitment to leveraging data insights. The document also distinguishes between business analytics, data science, and data analytics, outlining their methodologies, skills required, and applications across various industries.

Uploaded by

ammuraks2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views45 pages

Business Analytics Unit I

Business analytics (BA) is a discipline that utilizes data analysis and statistical models to solve business problems and drive decision-making. It relies on data quality, skilled analysts, and a commitment to leveraging data insights. The document also distinguishes between business analytics, data science, and data analytics, outlining their methodologies, skills required, and applications across various industries.

Uploaded by

ammuraks2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 45

What is business analytics?

Business analytics (BA) is a set of disciplines and technologies for solving


business problems using data analysis, statistical models and other quantitative
methods. It involves an iterative, methodical exploration of an organization's data,
with an emphasis on statistical analysis, to drive decision-making.

Data-driven companies treat their data as a business asset and actively look for
ways to turn it into a competitive advantage. Success with business analytics
depends on data quality, skilled analysts who understand the technologies and the
business, and a commitment to using data to gain insights that inform business
decisions.

How business analytics works


Before any data analysis takes place, BA starts with several foundational
processes:

 Determine the business goal of the analysis.


 Select an analysis methodology.
 Get business data to support the analysis, often from various systems and
sources.
 Cleanse and integrate data into a single repository, such as a data
warehouse or data mart.

Initial analysis is typically performed on a smaller sample data set of data.


Analytics tools range from spreadsheets with statistical functions to complex data
mining and predictive modeling applications. Patterns and relationships in the raw
data are revealed. Then new questions are asked, and the analytic process iterates
until the business goal is met.

Deployment of predictive models involves a statistical process known as scoring


and uses records typically located in a database. Scores help enterprises make
more informed, real-time decisions within applications and business processes.
What is Data Science
Data Science is a field that deals with extracting meaningful information and
insights by applying various algorithms preprocessing and scientific methods on
structured and unstructured data. This field is related to Artificial Intelligence and
is currently one of the most demanded skills. Data science comprises
mathematics, computations, statistics, programming, etc to gain meaningful
insights from the large amount of data provided in various formats.

What is Data Analytics


Data Analytics is used to get conclusions by processing the raw data. It is helpful
in various businesses as it helps the company to make decisions based on the
conclusions from the data. Basically, data analytics helps to convert a Large
number of figures in the form of data into Plain English i.e., conclusions which
are further helpful in making in-depth decisions. Below is a table of differences
between Data Science and Data Analytics:
Difference Between Data Science and Data Analytics

There is a significant difference between Data Science and Data Analytics. We


will see them one by one for each feature.

Feature Data Science Data Analytics

Python is the most commonly


The Knowledge of
used language for data science
Coding Python and R Language
along with the use of other
Language is essential for Data
languages such as C++, Java,
Analytics.
Perl, etc.

In-depth knowledge of Basic Programming


Programming
programming is required for data skills is necessary for
Skills
science. data analytics.

Use of Data Science makes use of Data Analytics does not


Machine machine learning algorithms to use machine learning to
Feature Data Science Data Analytics

Learning get insights. get the insight of data.

Hadoop Based analysis


Data Science makes use of Data
is used for getting
Other Skills mining activities for getting
conclusions from raw
meaningful insights.
data.

The Scope of data


The scope of data science is
Scope analysis is micro i.e.,
large.
small.

Data science deals with Data Analysis makes


Goals explorations and new use of existing
innovations. resources.

Data Science mostly deals with Data Analytics deals


Data Type
unstructured data. with structured data.

The statistical skills are


Statistical Statistical skills are necessary in
of minimal or no use in
Skills the field of Data Science..
data analytics.

Data analytics skills

Data analytics requires a wide range of skills to be performed effectively. According


to search and enrollment data among Coursera’s community of 87 million global
learners, these are the top in-demand data science skills, as of December 2021:
 Structured Query Language (SQL), a programming language commonly used for
databases
 Statistical programming languages, such as R and Python, commonly used to create
advanced data analysis programs
 Machine learning, a branch of artificial intelligence that involves using algorithms to
spot data patterns
 Probability and statistics, in order to better analyze and interpret data trends
 Data management, or the practices around collecting, organizing and storing data
 Data visualization, or the ability to use charts and graphs to tell a story with data
 Econometrics, or the ability to use data trends to create mathematical models that
forecast future trends based

While careers in data analytics require a certain amount of technical knowledge,


approaching the above skills methodically—for example by learning a little bit each
day or learning from your mistakes—can help lead to mastery, and it’s never too late
to get started.

Life Cycle Phases of Data Analytics

In this tutorial, we're going to talk about the different phases of the life cycle of
data analytics, in which we will go over different life cycle phases and then go
over them in detail.

Life Cycle of Data Analytics

The Data analytics lifecycle was designed to address Big Data problems and data
science projects. The process is repeated to show the real projects. To address the
specific demands for conducting analysis on Big Data, the step-by-step
methodology is required to plan the various tasks associated with the acquisition,
processing, analysis, and recycling of data.

Phase 1: Discovery -

o The data science team is trained and researches the issue.


o Create context and gain understanding.
o Learn about the data sources that are needed and accessible to the project.
o The team comes up with an initial hypothesis, which can be later confirmed
with evidence.

Phase 2: Data Preparation -

o Methods to investigate the possibilities of pre-processing, analysing, and


preparing data before analysis and modelling.
o It is required to have an analytic sandbox. The team performs, loads, and
transforms to bring information to the data sandbox.
o Data preparation tasks can be repeated and not in a predetermined sequence.
o Some of the tools used commonly for this process include - Hadoop, Alpine
Miner, Open Refine, etc.

Phase 3: Model Planning -

o The team studies data to discover the connections between variables. Later,
it selects the most significant variables as well as the most effective models.
o In this phase, the data science teams create data sets that can be used for
training for testing, production, and training goals.
o The team builds and implements models based on the work completed in the
modelling planning phase.
o Some of the tools used commonly for this stage are MATLAB and
STASTICA.

Phase 4: Model Building -

o The team creates datasets for training, testing as well as production use.
o The team is also evaluating whether its current tools are sufficient to run the
models or if they require an even more robust environment to run models.
o Tools that are free or open-source or free tools Rand PL/R, Octave, WEKA.
o Commercial tools - MATLAB, STASTICA.

Phase 5: Communication Results -

o Following the execution of the model, team members will need to evaluate
the outcomes of the model to establish criteria for the success or failure of
the model.
o The team is considering how best to present findings and outcomes to the
various members of the team and other stakeholders while taking into
consideration cautionary tales and assumptions.
o The team should determine the most important findings, quantify their value
to the business and create a narrative to present findings and summarize
them to all stakeholders.

Phase 6: Operationalize -

o The team distributes the benefits of the project to a wider audience. It sets up
a pilot project that will deploy the work in a controlled manner prior to
expanding the project to the entire enterprise of users.
o This technique allows the team to gain insight into the performance and
constraints related to the model within a production setting at a small scale
and then make necessary adjustments before full deployment.
o The team produces the last reports, presentations, and codes.
o Open source or free tools such as WEKA, SQL, MADlib, and Octave.
Types of Data Analytics

There are four major types of data analytics:


1. Predictive (forecasting)
2. Descriptive (business intelligence and data mining)
3. Prescriptive (optimization and simulation)
4. Diagnostic analytics

1.Predictive Analytics
Predictive analytics turn the data into valuable, actionable information. predictive
analytics uses data to determine the probable outcome of an event or a likelihood
of a situation occurring. Predictive analytics holds a variety of statistical
techniques from modeling, machine learning, data mining, and game theory that
analyze current and historical facts to make predictions about a future
event. Techniques that are used for predictive analytics are:
 Linear Regression
 Time Series Analysis and Forecasting
 Data Mining
Basic Corner Stones of Predictive Analytics
 Predictive modeling
 Decision Analysis and optimization
 Transaction profiling
2.Descriptive Analytics
Descriptive analytics looks at data and analyze past event for insight as to how to
approach future events. It looks at past performance and understands the
performance by mining historical data to understand the cause of success or
failure in the past. Almost all management reporting such as sales, marketing,
operations, and finance uses this type of analysis.
The descriptive model quantifies relationships in data in a way that is often used
to classify customers or prospects into groups. Unlike a predictive model that
focuses on predicting the behavior of a single customer, Descriptive analytics
identifies many different relationships between customer and product.
Common examples of Descriptive analytics are company reports that provide
historic reviews like:
 Data Queries
 Reports
 Descriptive Statistics
 Data dashboard

3.Prescriptive Analytics
Prescriptive Analytics automatically synthesize big data, mathematical science,
business rule, and machine learning to make a prediction and then suggests a
decision option to take advantage of the prediction.
Prescriptive analytics goes beyond predicting future outcomes by also suggesting
action benefits from the predictions and showing the decision maker the
implication of each decision option. Prescriptive Analytics not only anticipates
what will happen and when to happen but also why it will happen. Further,
Prescriptive Analytics can suggest decision options on how to take advantage of a
future opportunity or mitigate a future risk and illustrate the implication of each
decision option.
For example, Prescriptive Analytics can benefit healthcare strategic planning by
using analytics to leverage operational and usage data combined with data of
external factors such as economic data, population demography, etc.
3.Diagnostic Analytics
In this analysis, we generally use historical data over other data to answer any
question or for the solution of any problem. We try to find any dependency and
pattern in the historical data of the particular problem.
For example, companies go for this analysis because it gives a great insight into a
problem, and they also keep detailed information about their disposal otherwise
data collection may turn out individual for every problem and it will be very
time-consuming. Common techniques used for Diagnostic Analytics are:
 Data discovery
 Data mining
 Correlations

Future Scope of Data Analytics

1. Retail: To study sales patterns, consumer behavior, and inventory


management, data analytics can be applied in the retail sector. Data analytics
can be used by retailers to make data-driven decisions regarding what products
to stock, how to price them, and how to best organize their stores.
2. Healthcare: Data analytics can be used to evaluate patient data, spot trends in
patient health, and create individualized treatment regimens. Data analytics
can be used by healthcare companies to enhance patient outcomes and lower
healthcare expenditures.
3. Finance: In the field of finance, data analytics can be used to evaluate
investment data, spot trends in the financial markets, and make wise
investment decisions. Data analytics can be used by financial institutions to
lower risk and boost the performance of investment portfolios.
4. Marketing: By analyzing customer data, spotting trends in consumer behavior,
and creating customized marketing strategies, data analytics can be used in
marketing. Data analytics can be used by marketers to boost the efficiency of
their campaigns and their overall impact.
5. Manufacturing: Data analytics can be used to examine production data, spot
trends in production methods, and boost production efficiency in the
manufacturing sector. Data analytics can be used by manufacturers to cut costs
and enhance product quality.
6. Transportation: To evaluate logistics data, spot trends in transportation routes,
and improve transportation routes, the transportation sector can employ data
analytics. Data analytics can help transportation businesses cut expenses and
speed up delivery times.
What is Data Collection?

Data collection is the process of collecting and evaluating information or data from
multiple sources to find answers to research problems, answer questions, evaluate
outcomes, and forecast trends and probabilities. It is an essential phase in all types
of research, analysis, and decision-making, including that done in the social
sciences, business, and healthcare.

Accurate data collection is necessary to make informed business decisions, ensure


quality assurance, and keep research integrity.

During data collection, the researchers must identify the data types, the sources of
data, and what methods are being used. We will soon see that there are many
different data collection methods. There is heavy reliance on data collection in
research, commercial, and government fields.

Before an analyst begins collecting data, they must answer three questions first:

 What’s the goal or purpose of this research?


 What kinds of data are they planning on gathering?
 What methods and procedures will be used to collect, store, and process the
information?
Additionally, we can break up data into qualitative and quantitative types.
Qualitative data covers descriptions such as color, size, quality, and appearance.
Quantitative data, unsurprisingly, deals with numbers, such as statistics, poll
numbers, percentages, etc.

Why Do We Need Data Collection?

Before a judge makes a ruling in a court case or a general creates a plan of attack,
they must have as many relevant facts as possible. The best courses of action come
from informed decisions, and information and data are synonymous.

The concept of data collection isn’t a new one, as we’ll see later, but the world has
changed. There is far more data available today, and it exists in forms that were
unheard of a century ago. The data collection process has had to change and grow
with the times, keeping pace with technology.
Whether you’re in the world of academia, trying to conduct research, or part of the
commercial sector, thinking of how to promote a new product, you need data
collection to help you make better choices.

Now that you know what is data collection and why we need it, let's take a look at
the different methods of data collection. While the phrase “data collection” may
sound all high-tech and digital, it doesn’t necessarily entail things like
computers, big data, and the internet. Data collection could mean a telephone
survey, a mail-in comment card, or even some guy with a clipboard asking
passersby some questions. But let’s see if we can sort the different data collection
methods into a semblance of organized categories.

What Are the Different Data Collection Methods?

Primary and secondary methods of data collection are two approaches used to
gather information for research or analysis purposes. Let's explore each data
collection method in detail:

1. Primary Data Collection:

Primary data collection involves the collection of original data directly from the
source or through direct interaction with the respondents. This method allows
researchers to obtain firsthand information specifically tailored to their research
objectives. There are various techniques for primary data collection, including:

a. Surveys and Questionnaires: Researchers design structured questionnaires or


surveys to collect data from individuals or groups. These can be conducted through
face-to-face interviews, telephone calls, mail, or online platforms.

b. Interviews: Interviews involve direct interaction between the researcher and the
respondent. They can be conducted in person, over the phone, or through video
conferencing. Interviews can be structured (with predefined questions), semi-
structured (allowing flexibility), or unstructured (more conversational).

c. Observations: Researchers observe and record behaviors, actions, or events in


their natural setting. This method is useful for gathering data on human behavior,
interactions, or phenomena without direct intervention.
d. Experiments: Experimental studies involve the manipulation of variables to
observe their impact on the outcome. Researchers control the conditions and
collect data to draw conclusions about cause-and-effect relationships.

e. Focus Groups: Focus groups bring together a small group of individuals who
discuss specific topics in a moderated setting. This method helps in understanding
opinions, perceptions, and experiences shared by the participants.

2. Secondary Data Collection:

Secondary data collection involves using existing data collected by someone else
for a purpose different from the original intent. Researchers analyze and interpret
this data to extract relevant information. Secondary data can be obtained from
various sources, including:

a. Published Sources: Researchers refer to books, academic journals, magazines,


newspapers, government reports, and other published materials that contain
relevant data.

b. Online Databases: Numerous online databases provide access to a wide range of


secondary data, such as research articles, statistical information, economic data,
and social surveys.

c. Government and Institutional Records: Government agencies, research


institutions, and organizations often maintain databases or records that can be used
for research purposes.

d. Publicly Available Data: Data shared by individuals, organizations, or


communities on public platforms, websites, or social media can be accessed and
utilized for research.

e. Past Research Studies: Previous research studies and their findings can serve as
valuable secondary data sources. Researchers can review and analyze the data to
gain insights or build upon existing knowledge.

Data Collection Tools

Now that we’ve explained the various techniques, let’s narrow our focus even
further by looking at some specific tools. For example, we mentioned interviews as
a technique, but we can further break that down into different interview types (or
“tools”).

 Word Association

The researcher gives the respondent a set of words and asks them what comes to
mind when they hear each word.

 Sentence Completion

Researchers use sentence completion to understand what kind of ideas the


respondent has. This tool involves giving an incomplete sentence and seeing how
the interviewee finishes it.

 Role-Playing

Respondents are presented with an imaginary situation and asked how they would
act or react if it was real.

 In-Person Surveys

The researcher asks questions in person.

 Online/Web Surveys

These surveys are easy to accomplish, but some users may be unwilling to answer
truthfully, if at all.

 Mobile Surveys

These surveys take advantage of the increasing proliferation of mobile technology.


Mobile collection surveys rely on mobile devices like tablets or smartphones to
conduct surveys via SMS or mobile apps.

 Phone Surveys

No researcher can call thousands of people at once, so they need a third party to
handle the chore. However, many people have call screening and won’t answer.
 Observation

Sometimes, the simplest method is the best. Researchers who make direct
observations collect data quickly and easily, with little intrusion or third-party bias.
Naturally, it’s only effective in small-scale situations.

The Importance of Ensuring Accurate and Appropriate Data Collection

Accurate data collecting is crucial to preserving the integrity of research,


regardless of the subject of study or preferred method for defining data
(quantitative, qualitative). Errors are less likely to occur when the right data
gathering tools are used (whether they are brand-new ones, updated versions of
them, or already available).

Among the effects of data collection done incorrectly, include the following -

 Erroneous conclusions that squander resources


 Decisions that compromise public policy
 Incapacity to correctly respond to research inquiries
 Bringing harm to participants who are humans or animals
 Deceiving other researchers into pursuing futile research avenues
 The study's inability to be replicated and validated
When these study findings are used to support recommendations for public policy,
there is the potential to result in disproportionate harm, even if the degree of
influence from flawed data collecting may vary by discipline and the type of
investigation.

Let us now look at the various issues that we might face while maintaining the
integrity of data collection.

Issues Related to Maintaining the Integrity of Data Collection


In order to assist the errors detection process in the data gathering process, whether
they were done purposefully (deliberate falsifications) or not, maintaining data
integrity is the main justification (systematic or random errors).

Quality assurance and quality control are two strategies that help protect data
integrity and guarantee the scientific validity of study results.

Each strategy is used at various stages of the research timeline:

 Quality control - tasks that are performed both after and during data collecting
 Quality assurance - events that happen before data gathering starts
Let us explore each of them in more detail now.

Quality Assurance

As data collecting comes before quality assurance, its primary goal is "prevention"
(i.e., forestalling problems with data collection). The best way to protect the
accuracy of data collection is through prevention. The uniformity of protocol
created in the thorough and exhaustive procedures manual for data collecting
serves as the best example of this proactive step.

The likelihood of failing to spot issues and mistakes early in the research attempt
increases when guides are written poorly. There are several ways to show these
shortcomings:

 Failure to determine the precise subjects and methods for retraining or training
staff employees in data collecting
 List of goods to be collected, in part
 There isn't a system in place to track modifications to processes that may occur
as the investigation continues.
 Instead of detailed, step-by-step instructions on how to deliver tests, there is a
vague description of the data gathering tools that will be employed.
 Uncertainty regarding the date, procedure, and identity of the person or people in
charge of examining the data
 Incomprehensible guidelines for using, adjusting, and calibrating the data
collection equipment.
Now, let us look at how to ensure Quality Control.

Quality Control

Despite the fact that quality control actions (detection/monitoring and intervention)
take place both after and during data collection, the specifics should be
meticulously detailed in the procedures manual. Establishing monitoring systems
requires a specific communication structure, which is a prerequisite. Following the
discovery of data collection problems, there should be no ambiguity regarding the
information flow between the primary investigators and staff personnel. A poorly
designed communication system promotes slack oversight and reduces
opportunities for error detection.

Direct staff observation conference calls, during site visits, or frequent or routine
assessments of data reports to spot discrepancies, excessive numbers, or invalid
codes can all be used as forms of detection or monitoring. Site visits might not be
appropriate for all disciplines. Still, without routine auditing of records, whether
qualitative or quantitative, it will be challenging for investigators to confirm that
data gathering is taking place in accordance with the manual's defined methods.
Additionally, quality control determines the appropriate solutions, or "actions," to
fix flawed data gathering procedures and reduce recurrences.

Problems with data collection, for instance, that call for immediate action include:

 Fraud or misbehavior
 Systematic mistakes, procedure violations
 Individual data items with errors
 Issues with certain staff members or a site's performance
Researchers are trained to include one or more secondary measures that can be
used to verify the quality of information being obtained from the human subject in
the social and behavioral sciences where primary data collection entails using
human subjects.

For instance, a researcher conducting a survey would be interested in learning


more about the prevalence of risky behaviors among young adults as well as the
social factors that influence these risky behaviors' propensity for and frequency.
Let us now explore the common challenges with regard to data collection.
What are Common Challenges in Data Collection?

There are some prevalent challenges faced while collecting data, let us explore a
few of them to understand them better and avoid them.

1.Data Quality Issues

The main threat to the broad and successful application of machine learning is poor
data quality. Data quality must be your top priority if you want to make
technologies like machine learning work for you. Let's talk about some of the most
prevalent data quality problems in this blog article and how to fix them.

2.Inconsistent Data

When working with various data sources, it's conceivable that the same
information will have discrepancies between sources. The differences could be in
formats, units, or occasionally spellings. The introduction of inconsistent data
might also occur during firm mergers or relocations. Inconsistencies in data have a
tendency to accumulate and reduce the value of data if they are not continually
resolved. Organizations that have heavily focused on data consistency do so
because they only want reliable data to support their analytics.

3.Data Downtime

Data is the driving force behind the decisions and operations of data-driven
businesses. However, there may be brief periods when their data is unreliable or
not prepared. Customer complaints and subpar analytical outcomes are only two
ways that this data unavailability can have a significant impact on businesses. A
data engineer spends about 80% of their time updating, maintaining, and
guaranteeing the integrity of the data pipeline. In order to ask the next business
question, there is a high marginal cost due to the lengthy operational lead time
from data capture to insight.

Schema modifications and migration problems are just two examples of the causes
of data downtime. Data pipelines can be difficult due to their size and complexity.
Data downtime must be continuously monitored, and it must be reduced through
automation.
4.Ambiguous Data

Even with thorough oversight, some errors can still occur in massive databases or
data lakes. For data streaming at a fast speed, the issue becomes more
overwhelming. Spelling mistakes can go unnoticed, formatting difficulties can
occur, and column heads might be deceptive. This unclear data might cause a
number of problems for reporting and analytics.

What are the Key Steps in the Data Collection Process?

In the Data Collection Process, there are 5 key steps. They are explained briefly
below -

1. Decide What Data You Want to Gather

The first thing that we need to do is decide what information we want to gather.
We must choose the subjects the data will cover, the sources we will use to gather
it, and the quantity of information that we would require. For instance, we may
choose to gather information on the categories of products that an average e-
commerce website visitor between the ages of 30 and 45 most frequently searches
for.

2. Establish a Deadline for Data Collection

The process of creating a strategy for data collection can now begin. We should set
a deadline for our data collection at the outset of our planning phase. Some forms
of data we might want to continuously collect. We might want to build up a
technique for tracking transactional data and website visitor statistics over the long
term, for instance. However, we will track the data throughout a certain time frame
if we are tracking it for a particular campaign. In these situations, we will have a
schedule for when we will begin and finish gathering data.

3. Select a Data Collection Approach

We will select the data collection technique that will serve as the foundation of our
data gathering plan at this stage. We must take into account the type of information
that we wish to gather, the time period during which we will receive it, and the
other factors we decide on to choose the best gathering strategy.
4. Gather Information

Once our plan is complete, we can put our data collection plan into action and
begin gathering data. In our DMP, we can store and arrange our data. We need to
be careful to follow our plan and keep an eye on how it's doing. Especially if we
are collecting data regularly, setting up a timetable for when we will be checking in
on how our data gathering is going may be helpful. As circumstances alter and we
learn new details, we might need to amend our plan.

5. Examine the Information and Apply Your Findings

It's time to examine our data and arrange our findings after we have gathered all of
our information. The analysis stage is essential because it transforms unprocessed
data into insightful knowledge that can be applied to better our marketing plans,
goods, and business judgments. The analytics tools included in our DMP can be
used to assist with this phase. We can put the discoveries to use to enhance our
business once we have discovered the patterns and insights in our data.

What is data preparation?

Data preparation is the process of gathering, combining, structuring and organizing


data so it can be used in business intelligence (BI), analytics and data
visualization applications. The components of data preparation include data
preprocessing, profiling, cleansing, validation and transformation; it often also
involves pulling together data from different internal systems and external sources.

Purposes of data preparation

One of the primary purposes of data preparation is to ensure that raw data being
readied for processing and analysis is accurate and consistent so the results of BI
and analytics applications will be valid. Data is commonly created with missing
values, inaccuracies or other errors, and separate data sets often have different
formats that need to be reconciled when they're combined. Correcting data errors,
validating data quality and consolidating data sets are big parts of data preparation
projects.
Data preparation also involves finding relevant data to ensure that analytics
applications deliver meaningful information and actionable insights for business
decision-making. The data often is enriched and optimized to make it more
informative and useful -- for example, by blending internal and external data sets,
creating new data fields, eliminating outlier values and addressing imbalanced data
sets that could skew analytics results.

What are the benefits of data preparation?

Data scientists often complain that they spend most of their time gathering,
cleansing and structuring data instead of analyzing it. A big benefit of an effective
data preparation process is that they and other end users can focus more on data
mining and data analysis -- the parts of their job that generate business value. For
example, data preparation can be done more quickly, and prepared data can
automatically be fed to users for recurring analytics applications.

Done properly, data preparation also helps an organization do the following:

 ensure the data used in analytics applications produces reliable results;


 identify and fix data issues that otherwise might not be detected;
 enable more informed decision-making by business executives and operational
workers;
 reduce data management and analytics costs;
 avoid duplication of effort in preparing data for use in multiple applications;
and
 get a higher ROI from BI and analytics initiatives.

Effective data preparation is particularly beneficial in big data environments that


store a combination of structured, semistructured and unstructured data, often in
raw form until it's needed for specific analytics uses. Those uses include predictive
analytics, machine learning (ML) and other forms of advanced analytics that
typically involve large amounts of data to prepare. For example, in an article
on preparing data for machine learning, Felix Wick, corporate vice president of
data science at supply chain software vendor Blue Yonder, is quoted as saying that
data preparation "is at the heart of ML."

Steps in the data preparation process

Data preparation is done in a series of steps. There's some variation in the data
preparation steps listed by different data professionals and software vendors, but
the process typically involves the following tasks:

1. Data collection. Relevant data is gathered from operational systems, data


warehouses, data lakes and other data sources. During this step, data scientists,
members of the BI team, other data professionals and end users who collect
data should confirm that it's a good fit for the objectives of the planned
analytics applications.
2. Data discovery and profiling. The next step is to explore the collected data to
better understand what it contains and what needs to be done to prepare it for
the intended uses. To help with that, data profiling identifies patterns,
relationships and other attributes in the data, as well as inconsistencies,
anomalies, missing values and other issues so they can be addressed.
3. Data cleansing. Next, the identified data errors and issues are corrected to
create complete and accurate data sets. For example, as part of cleansing data
sets, faulty data is removed or fixed, missing values are filled in and
inconsistent entries are harmonized.
4. Data structuring. At this point, the data needs to be modeled and organized to
meet the analytics requirements. For example, data stored in comma-separated
values (CSV) files or other file formats has to be converted into tables to make
it accessible to BI and analytics tools.
5. Data transformation and enrichment. In addition to being structured, the data
typically must be transformed into a unified and usable format. For
example, data transformation may involve creating new fields or columns that
aggregate values from existing ones. Data enrichment further enhances and
optimizes data sets as needed, through measures such as augmenting and
adding data.
6. Data validation and publishing. In this last step, automated routines are run
against the data to validate its consistency, completeness and accuracy. The
prepared data is then stored in a data warehouse, a data lake or another
repository and either used directly by whoever prepared it or made available for
other users to access.

Data preparation can also incorporate or feed into data curation work that creates
and oversees ready-to-use data sets for BI and analytics. Data curation involves
tasks such as indexing, cataloging and maintaining data sets and their associated
metadata to help users find and access the data. In some organizations, data curator
is a formal role that works collaboratively with data scientists, business analysts,
other users and the IT and data management teams. In others, data may be curated
by data stewards, data engineers, database administrators or data scientists and
business users themselves.
What are the challenges of data preparation?

Data preparation is inherently complicated. Data sets pulled together from different
source systems are highly likely to have numerous data quality, accuracy and
consistency issues to resolve. The data also must be manipulated to make it usable,
and irrelevant data needs to be weeded out. As noted above, it's a time-consuming
process: The 80/20 rule is often applied to analytics applications, with about 80%
of the work said to be devoted to collecting and preparing data and only 20% to
analyzing it.

In an article on common data preparation challenges, Rick Sherman, managing


partner of consulting firm Athena IT Solutions, detailed the following seven
challenges along with advice on how to overcome each of them:

 Inadequate or nonexistent data profiling. If data isn't properly profiled, errors,


anomalies and other problems might not be identified, which can result in
flawed analytics.
 Missing or incomplete data. Data sets often have missing values and other
forms of incomplete data; such issues need to be assessed as possible errors and
addressed if so.
 Invalid data values. Misspellings, other typos and wrong numbers are examples
of invalid entries that frequently occur in data and must be fixed to ensure
analytics accuracy.
 Name and address standardization. Names and addresses may be inconsistent in
data from different systems, with variations that can affect views of customers
and other entities.
 Inconsistent data across enterprise systems. Other inconsistencies in data sets
drawn from multiple source systems, such as different terminology and unique
identifiers, are also a pervasive issue in data preparation efforts.
 Data enrichment. Deciding how to enrich a data set -- for example, what to add
to it -- is a complex task that requires a strong understanding of business needs
and analytics goals.
 Maintaining and expanding data prep processes. Data preparation work often
becomes a recurring process that needs to be sustained and enhanced on an
ongoing basis.

Data preparation tools and the self-service data prep market

Data preparation can pull skilled BI, analytics and data management practitioners
away from more high-value work, especially as the volume of data used in
analytics applications continues to grow. However, various software vendors have
introduced self-service tools that automate data preparation methods, enabling both
data professionals and business users to get data ready for analysis in a streamlined
and interactive way.

The self-service data preparation tools run data sets through a workflow to apply
the operations and functions outlined in the previous section. They also feature
graphical user interfaces (GUIs) designed to further simplify those steps. As
Donald Farmer, principal at consultancy TreeHive Strategy, wrote in an article on
self-service data preparation (linked to above), people outside of IT can use the
self-service software "to do the work of sourcing data, shaping it and cleaning it
up, frequently from simple-to-use desktop or cloud applications."

In a July 2021 report on emerging data management technologies, consulting firm


Gartner gave data preparation tools a "High" rating on benefits for users but said
they're still in the "early mainstream" stage of maturity. On the plus side, the tools
can reduce the time it takes to start analyzing data and help drive increased data
sharing, user collaboration and data science experimentation, Gartner said.

But, it added, some tools lack the ability to scale from individual self-service
projects to enterprise-level ones or to exchange metadata with other data
management technologies, such as data quality software. Gartner recommended
that organizations evaluate products partly on those features. It also cautioned
against looking at data preparation software as a replacement for traditional data
integration technologies, particularly extract, transform and load (ETL) tools.

Several vendors that focused on self-service data preparation have now been
acquired by other companies; Trifacta, the last of the best-known data prep
specialists, agreed to be bought by analytics and data management software
provider Alteryx in early 2022. Alteryx itself already supports data preparation in
its software platform. Other prominent BI, analytics and data management vendors
that offer data preparation tools or capabilities include the following:

 Altair
 Boomi
 Datameer
 DataRobot
 IBM
 Informatica
 Microsoft
 Precisely
 SAP
 SAS
 Tableau
 Talend
 Tamr
 Tibco Software

Data preparation software typically provides these capabilities.


DATA COLLECTION

In business analytics, data collection methods play a vital role in data gathering.
The techniques used in business analytics can be broadly classified into two main
types: qualitative and quantitative. Qualitative techniques are used to gather
descriptive data, while quantitative techniques are used to collect data that can be
analyzed statistically.

Qualitative data collection methods include interviews, focus groups, and


observations. Quantitative data collection methods include surveys, questionnaires,
and experiments.

Data collection is a process of gathering information from various sources. It can


be done manually or through automated means. The data can be gathered from
primary or secondary sources.

Primary data is the data that is collected directly from the source. It is collected
through surveys, interviews, focus groups, and observations.

Secondary data is the data that is already available and has been collected by
someone else. It can be gathered from sources like books, articles, websites, and
government reports.

The data collected through business analytics can be used to make decisions about
various aspects of the business, like marketing, product development, and human
resources. The data can also be used to improve the efficiency of business
processes.

7 BUSINESS ANALYTICS DATA COLLECTION METHODS

1. Surveys

Physical or digital questionnaires are used in surveys to gather qualitative as well


as quantitative data from participants. Internet surveys offer the chance for
widespread distribution and could also be reasonably priced. Using a free
application can make conducting a survey completely free, according to the US
Bureau of Labor Statistics.
When designing and conducting surveys, be aware of bias’s effects, which include:

 Collection prejudice:

It is possible to unintentionally craft biased survey questions. When constructing


questions, be mindful of this to ensure that your respondents respond truthfully and
aren’t influenced by your wording.

 Contextual bias:

Your respondents’ replies can be skewed toward what is socially acceptable


because they know you will read them. To get the entire picture, think about
combining survey data with behavioural information from other techniques of data
collecting.

2. Transactional Tracking

Keeping track of this information might help you better understand your consumer
base and make judgments about focused marketing campaigns.

The ability to save data as soon as it is collected is frequently provided by e-


commerce and point-of-sale platforms, making this a seamless data-collecting
technique that can be profitable in the form of customer insights.

3. Interviews and Focus Groups

You can employ both focus groups and interviews to collect qualitative and
quantitative data. Focus groups usually consist of multiple persons, whereas
interviews are normally conducted one-on-one. Real-time observation of their
interactions with your product and the recording of their emotions and inquiries
might yield insightful information.

Focus groups and interviews are data collection techniques that allow you to
inquire about individuals’ thoughts, drives, and emotions surrounding your brand
or product, as well as surveys. To avoid this, you can employ a facilitator to plan
and carry out interviews on your account.
4. Observation

Due to the candour, it provides, seeing users engage with your product or website
might be helpful for data collection. You can see in real time if your customer
experience is challenging or unclear.

However, organising observational sessions can take time and effort. You can
monitor a user’s involvement with a beta version of your website or product by
using a third-party service to capture users’ navigation across your site.

Observations give you the opportunity to examine how users engage with your
product or website directly. However, they are less accessible than other data
collection techniques. To enhance and build on areas of success, you can use the
qualitative and quantitative data collected from this.

5. Online Tracking

Using pixels and cookies, you can collect behavioural data. Both of these
programmes track users’ online activity across several websites and give
information about the material they are most interested in and interact with.

Additionally, you may monitor user activity on your company’s website, including
the most popular pages, whether or not visitors are perplexed while using it, and
how much time users spend on product pages. You can utilise this to enhance the
website’s look and facilitate users’ navigation to their desired location.

It’s frequently free and simple to set up to insert a pixel. The cost of implementing
cookies can be high, but the quality of the data you’ll get might make it
worthwhile. Once pixels and cookies are installed, they begin to collect data on
their own and require little to no upkeep.

It’s crucial to remember that tracking online activity may have ethical and legal
privacy concerns. Ensure you comply with regional and industry data privacy rules
before tracking users’ online activity.

6. Forms
Online forms are useful for collecting qualitative information about users,
particularly contact or demographic details. You can utilize them to gate content or
registration, such as for webinars and email newsletters, and they’re reasonably
cheap and easy to set up.

Afterwards, you may make use of this information to get in touch with potential
customers, develop demographic profiles of current clients, and carry out
remarketing activities like email workflows and content recommendations.

7. Social Media Monitoring

Monitoring follower interaction on your brand’s social media accounts is a simple


method to keep tabs on information about the motives and interests of your
audience. Although many social media platforms come with statistics, other third-
party social media sites provide more thorough, organised insights gleaned from
many channels.

You can use social media data to ascertain which topics are most significant to
your following. For instance, you might observe a sharp rise in engagements when
your business posts about its environmental initiatives.

What is Hypothesis Generation?

Hypothesis generation is an educated “guess” of various factors that are


impacting the business problem that needs to be solved using machine learning. In
framing a hypothesis, the data scientist must not know the outcome of the
hypothesis that has been generated based on any evidence.

“A hypothesis may be simply defined as a guess. A scientific hypothesis is an


intelligent guess.” – Isaac Asimov
Hypothesis generation is a crucial step in any data science project. If you skip this
or skim through this, the likelihood of the project failing increases exponentially.
Hypothesis Generation vs. Hypothesis Testing

This is a very common mistake data science beginners make.

Hypothesis generation is a process beginning with an educated guess whereas


hypothesis testing is a process to conclude that the educated guess is true/false or
the relationship between the variables is statistically significant or not.
This latter part could be used for further research using statistical proof. A
hypothesis is accepted or rejected based on the significance level and test score of
the test used for testing the hypothesis.

To understand more about hypothesis testing in detail, you can read about it here or
you can also learn it through this course.

How Does Hypothesis Generation Help?

Here are 5 key reasons why hypothesis generation is so important in data science:

 Hypothesis generation helps in comprehending the business problem as we dive


deep in inferring the various factors affecting our target variable
 You will get a much better idea of what are the major factors that are responsible
to solve the problem
 Data that needs to be collected from various sources that are key in converting
your business problem into a data science-based problem
 Improves your domain knowledge if you are new to the domain as you spend time
understanding the problem
 Helps to approach the problem in a structured manner

When Should you Perform Hypothesis Generation?

The million-dollar question – when in the world should you perform hypothesis
generation?

 The hypothesis generation should be made before looking at the dataset or


collection of the data
 You will notice that if you have done your hypothesis generation adequately, you
would have included all the variables present in the dataset in your hypothesis
generation
 You might also have included variables that are not present in the dataset

What is a Data Model?

Good data allows organizations to establish baselines, benchmarks, and goals to


keep moving forward. In order for data to allow this measuring, it has to be
organized through data description, data semantics, and consistency constraints of
data. A Data Model is this abstract model that allows the further building of
conceptual models and to set relationships between data items.

An organization may have a huge data repository; however, if there is no standard


to ensure the basic accuracy and interpretability of that data, then it is of no use. A
proper data model certifies actionable downstream results, knowledge of best
practices regarding the data, and the best tools to access it.

After understanding what is data modelling, let’s discuss its examples.


What is Data Modeling?

Data Modeling in software engineering is the process of simplifying the diagram or


data model of a software system by applying certain formal techniques. It involves
expressing data and information through text and symbols. The data model
provides the blueprint for building a new database or reengineering legacy
applications.

In the light of the above, it is the first critical step in defining the structure of
available data. Data Modeling is the process of creating data models by which data
associations and constraints are described and eventually coded to reuse. It
conceptually represents data with diagrams, symbols, or text to visualize the
interrelation.

Data Modeling thus helps to increase consistency in naming, rules, semantics, and
security. This, in turn, improves data analytics. The emphasis is on the need for
availability and organization of data, independent of the manner of its application.

Data Modeling Process

Data modeling is a process of creating a conceptual representation of data objects


and their relationships to one another. The process of data modeling typically
involves several steps, including requirements gathering, conceptual design, logical
design, physical design, and implementation. During each step of the process, data
modelers work with stakeholders to understand the data requirements, define the
entities and attributes, establish the relationships between the data objects, and
create a model that accurately represents the data in a way that can be used by
application developers, database administrators, and other stakeholders.

Levels Of Data Abstraction

Data modeling typically involves several levels of abstraction, including:


 Conceptual level: The conceptual level involves defining the high-level entities
and relationships in the data model, often using diagrams or other visual
representations.
 Logical level: The logical level involves defining the relationships and
constraints between the data objects in more detail, often using data modeling
languages such as SQL or ER diagrams.
 Physical level: The physical level involves defining the specific details of how
the data will be stored, including data types, indexes, and other technical details.

Data Modeling Examples

The best way to picture a data model is to think about a building plan of an
architect. An architectural building plan assists in putting up all subsequent
conceptual models, and so does a data model.

These data modeling examples will clarify how data models and the process of
data modeling highlights essential data and the way to arrange it.

1. ER (Entity-Relationship) Model

This model is based on the notion of real-world entities and relationships among
them. It creates an entity set, relationship set, general attributes, and constraints.

Here, an entity is a real-world object; for instance, an employee is an entity in an


employee database. An attribute is a property with value, and entity sets share
attributes of identical value. Finally, there is the relationship between entities.

2. Hierarchical Model

This data model arranges the data in the form of a tree with one root, to which
other data is connected. The hierarchy begins with the root and extends like a tree.
This model effectively explains several real-time relationships with a single one-
to-many relationship between two different kinds of data.
For example, one supermarket can have different departments and many aisles.
Thus, the ‘root’ node supermarket will have two ‘child’ nodes of (1) Pantry, (2)
Packaged Food.

3. Network Model

This database model enables many-to-many relationships among the connected


nodes. The data is arranged in a graph-like structure, and here ‘child’ nodes can
have multiple ‘parent’ nodes. The parent nodes are known as owners, and the child
nodes are called members.

4. Relational Model

This popular data model example arranges the data into tables. The tables have
columns and rows, each cataloging an attribute present in the entity. It makes
relationships between data points easy to identify.

For example, e-commerce websites can process purchases and track inventory
using the relational model.

5. Object-Oriented Database Model

This data model defines a database as an object collection, or recyclable software


components, with related methods and features.

For instance, architectural and engineering real-time systems used in 3D modeling


use this data modeling process.

6. Object-Relational Model

This model is a combination of an object-oriented database model and a relational


database model. Therefore, it blends the advanced functionalities of the object-
oriented model with the ease of the relational data model.

The data modeling process helps organizations to become more data-driven. This
starts with cleaning and modeling data. Let us look at how data modeling occurs at
different levels.
Model Validation

 Model validation is defined within regulatory guidance as “the set of processes


and activities intended to verify that models are performing as expected, in line
with their design objectives, and business uses.” It also identifies “potential
limitations and assumptions, and assesses their possible impact.”

 Generally, validation activities are performed by individuals independent of


model development or use. Models, therefore, should not be validated by their
owners as they can be highly technical, and some institutions may find it
difficult to assemble a model risk team that has sufficient functional and
technical expertise to carry out independent validation. When faced with this
obstacle, institutions often outsource the validation task to third parties.

 In statistics, model validation is the task of confirming that the outputs of a


statistical model are acceptable with respect to the real data-generating process.
In other words, model validation is the task of confirming that the outputs of a
statistical model have enough fidelity to the outputs of the data-generating
process that the objectives of the investigation can be achieved.

The Four Elements

Model validation consists of four crucial elements which should be considered:

1. Conceptual Design

The foundation of any model validation is its conceptual design, which needs
documented coverage assessment that supports the model’s ability to meet business
and regulatory needs and the unique risks facing a bank.
The design and capabilities of a model can have a profound effect on the overall
effectiveness of a bank’s ability to identify and respond to risks. For example, a
poorly designed risk assessment model may result in a bank establishing
relationships with clients that present a risk that is greater than its risk appetite, thus
exposing the bank to regulatory scrutiny and reputation damage.

A validation should independently challenge the underlying conceptual design and


ensure that documentation is appropriate to support the model’s logic and the
model’s ability to achieve desired regulatory and business outcomes for which it is
designed.

2. System Validation

All technology and automated systems implemented to support models have


limitations. An effective validation includes: firstly, evaluating the processes used
to integrate the model’s conceptual design and functionality into the organisation’s
business setting; and, secondly, examining the processes implemented to execute
the model’s overall design. Where gaps or limitations are observed, controls should
be evaluated to enable the model to function effectively.

3. Data Validation and Quality Assessment

Data errors or irregularities impair results and might lead to an organisation’s


failure to identify and respond to risks. Best practise indicates that institutions
should apply a risk-based data validation, which enables the reviewer to consider
risks unique to the organisation and the model.
To establish a robust framework for data validation, guidance indicates that the
accuracy of source data be assessed. This is a vital step because data can be derived
from a variety of sources, some of which might lack controls on data integrity, so
the data might be incomplete or inaccurate.

4. Process Validation

To verify that a model is operating effectively, it is important to prove that the


established processes for the model’s ongoing administration, including governance
policies and procedures, support the model’s sustainability. A review of the
processes also determines whether the models are producing output that is accurate,
managed effectively, and subject to the appropriate controls.

If done effectively, model validation will enable your bank to have every
confidence in its various models’ accuracy, as well as aligning them with the bank’s
business and regulatory expectations. By failing to validate models, banks increase
the risk of regulatory criticism, fines, and penalties.

The complex and resource-intensive nature of validation makes it necessary to


dedicate sufficient resources to it. An independent validation team well versed in
data management, technology, and relevant financial products or services — for
example, credit, capital management, insurance, or financial crime compliance — is
vital for success. Where shortfalls in the validation process are identified, timely
remedial actions should be taken to close the gaps.
Model Evaluation

 Model Evaluation is an integral part of the model development process. It helps


to find the best model that represents our data and how well the chosen model
will work in the future. Evaluating model performance with the data used for
training is not acceptable in data science because it can easily generate
overoptimistic and overfitted models. There are two methods of evaluating
models in data science, Hold-Out and Cross-Validation. To avoid overfitting,
both methods use a test set (not seen by the model) to evaluate model
performance.

 Hold-Out: In this method, the mostly large dataset is randomly divided to three
subsets:

1. Training set is a subset of the dataset used to build predictive models.

2. Validation set is a subset of the dataset used to assess the performance of model
built in the training phase. It provides a test platform for fine tuning model’s
parameters and selecting the best-performing model. Not all modelling
algorithms need a validation set.

3. Test set or unseen examples is a subset of the dataset to assess the likely future
performance of a model. If a model fit to the training set much better than it fits
the test set, overfitting is probably the cause.

 Cross-Validation: When only a limited amount of data is available, to achieve


an unbiased estimate of the model performance we use k-fold cross-validation.
In k-fold cross-validation, we divide the data into k subsets of equal size. We
build models ktimes, each time leaving out one of the subsets from training and
use it as the test set. If k equals the sample size, this is called “leave-one-out”.

Model evaluation can be divided to two sections:

 Classification Evaluation

 Regression Evaluation

What Is Data Interpretation?

Data interpretation refers to the process of using diverse analytical methods to


review data and arrive at relevant conclusions. The interpretation of data helps
researchers to categorize, manipulate, and summarize the information in order to
answer critical questions.

The importance of data interpretation is evident and this is why it needs to be done
properly. Data is very likely to arrive from multiple sources and has a tendency to
enter the analysis process with haphazard ordering. Data analysis tends to be
extremely subjective. That is to say, the nature and goal of interpretation will vary
from business to business, likely correlating to the type of data being analyzed.
While there are several types of processes that are implemented based on
individual data nature, the two broadest and most common categories are
“quantitative and qualitative analysis”.

Yet, before any serious data interpretation inquiry can begin, it should be
understood that visual presentations of data findings are irrelevant unless a sound
decision is made regarding scales of measurement. Before any serious data
analysis can begin, the scale of measurement must be decided for the data as this
will have a long-term impact on data interpretation ROI. The varying scales
include:

 Nominal Scale: non-numeric categories that cannot be ranked or compared


quantitatively. Variables are exclusive and exhaustive.
 Ordinal Scale: exclusive categories that are exclusive and exhaustive but with a
logical order. Quality ratings and agreement ratings are examples of ordinal scales
(i.e., good, very good, fair, etc., OR agree, strongly agree, disagree, etc.).
 Interval: a measurement scale where data is grouped into categories with orderly
and equal distances between the categories. There is always an arbitrary zero point.
 Ratio: contains features of all three.

For a more in-depth review of scales of measurement, read our article on data
analysis questions. Once scales of measurement have been selected, it is time to
select which of the two broad interpretation processes will best suit your data
needs. Let’s take a closer look at those specific methods and possible data
interpretation problems.

How To Interpret Data?

When interpreting data, an analyst must try to discern the differences between
correlation, causation, and coincidences, as well as many other biases – but he also
has to consider all the factors involved that may have led to a result. There are
various data interpretation methods one can use to achieve this.

The interpretation of data is designed to help people make sense of numerical data
that has been collected, analyzed, and presented. Having a baseline method for
interpreting data will provide your analyst teams with a structure and consistent
foundation. Indeed, if several departments have different approaches to interpreting
the same data while sharing the same goals, some mismatched objectives can
result. Disparate methods will lead to duplicated efforts, inconsistent solutions,
wasted energy, and inevitably – time and money. In this part, we will look at the
two main methods of interpretation of data: qualitative and quantitative analysis.

Qualitative Data Interpretation

Qualitative data analysis can be summed up in one word – categorical. With this
type of analysis, data is not described through numerical values or patterns, but
through the use of descriptive context (i.e., text). Typically, narrative data is
gathered by employing a wide variety of person-to-person techniques. These
techniques include:

 Observations: detailing behavioral patterns that occur within an observation group.


These patterns could be the amount of time spent in an activity, the type of activity,
and the method of communication employed.
 Focus groups: Group people and ask them relevant questions to generate a
collaborative discussion about a research topic.
 Secondary Research: much like how patterns of behavior can be observed, various
types of documentation resources can be coded and divided based on the type of
material they contain.
 Interviews: one of the best collection methods for narrative data. Inquiry responses
can be grouped by theme, topic, or category. The interview approach allows for
highly-focused data segmentation.

A key difference between qualitative and quantitative analysis is clearly noticeable


in the interpretation stage. The first one is widely open to interpretation and must
be “coded” so as to facilitate the grouping and labeling of data into identifiable
themes. As person-to-person data collection techniques can often result in disputes
pertaining to proper analysis, qualitative data analysis is often summarized through
three basic principles: notice things, collect things, and think about things.

After qualitative data has been collected through transcripts, questionnaires, audio
and video recordings, or the researcher’s notes, it is time to interpret it. For that
purpose, there are some common methods used by researchers and analysts.

 Content analysis: As its name suggests, this is a research method used to identify
frequencies and recurring words, subjects and concepts in image, video, or audio
content. It transforms qualitative information into quantitative data to help in the
discovery of trends and conclusions that will later support important research or
business decisions. This method is often used by marketers to understand brand
sentiment from the mouths of customers themselves. Through that, they can extract
valuable information to improve their products and services. It is recommended to
use content analytics tools for this method as manually performing it is very time-
consuming and can lead to human error or subjectivity issues. Having a clear goal
in mind before diving into it is another great practice for avoiding getting lost in
the fog.
 Thematic analysis: This method focuses on analyzing qualitative data such as
interview transcripts, survey questions, and others, to identify common patterns
and separate the data into different groups according to found similarities or
themes. For example, imagine you want to analyze what customers think about
your restaurant. For this purpose, you do a thematic analysis on 1000 reviews and
find common themes such as “fresh food”, “cold food”, “small portions”, “friendly
staff”, etc. With those recurring themes in hand, you can extract conclusions about
what could be improved or enhanced based on your customer’s experiences. Since
this technique is more exploratory, be open to changing your research questions or
goals as you go.
 Narrative analysis: A bit more specific and complicated than the two previous
methods, narrative analysis is used to analyze stories and discover the meaning
behind them. These stories can be extracted from testimonials, case studies, and
interviews as these formats give people more space to tell their experiences. Given
that collecting this kind of data is harder and more time-consuming, sample sizes
for narrative analysis are usually smaller, which makes it harder to reproduce its
findings. However, it still proves to be a valuable technique in cases such as
understanding customers' preferences and mindsets.
 Discourse analysis: This method is used to draw the meaning of any type of visual,
written, or symbolic language in relation to a social, political, cultural, or historical
context. It is used to understand how context can affect the way language is carried
out and understood. For example, if you are doing research on power dynamics,
using discourse analysis to analyze a conversation between a janitor and a CEO
and draw conclusions about their responses based on the context and your research
questions is a great use case for this technique. That said, like all methods in this
section, discourse analytics is time-consuming as the data needs to be analyzed
until no new insights emerge.
 Grounded theory analysis: The grounded theory approach aims at creating or
discovering a new theory by carefully testing and evaluating the data available.
Unlike all other qualitative approaches on this list, grounded theory analysis helps
in extracting conclusions and hypotheses from the data, instead of going into the
analysis with a defined hypothesis. This method is very popular amongst
researchers, analysts, and marketers as the results are completely data-backed,
providing a factual explanation of any scenario. It is often used when researching a
completely new topic or with little knowledge as this space to start from the
ground up.

Quantitative Data Interpretation

If quantitative data interpretation could be summed up in one word (and it really


can’t) that word would be “numerical.” There are few certainties when it comes to
data analysis, but you can be sure that if the research you are engaging in has no
numbers involved, it is not quantitative research as this analysis refers to a set of
processes by which numerical data is analyzed. More often than not, it involves the
use of statistical modeling such as standard deviation, mean and median. Let’s
quickly review the most common statistical terms:

 Mean: a mean represents a numerical average for a set of responses. When dealing
with a data set (or multiple data sets), a mean will represent a central value of a
specific set of numbers. It is the sum of the values divided by the number of values
within the data set. Other terms that can be used to describe the concept are
arithmetic mean, average and mathematical expectation.
 Standard deviation: this is another statistical term commonly appearing in
quantitative analysis. Standard deviation reveals the distribution of the responses
around the mean. It describes the degree of consistency within the responses;
together with the mean, it provides insight into data sets.
 Frequency distribution: this is a measurement gauging the rate of a response
appearance within a data set. When using a survey, for example, frequency
distribution, it can determine the number of times a specific ordinal scale response
appears (i.e., agree, strongly agree, disagree, etc.). Frequency distribution is
extremely keen in determining the degree of consensus among data points.

Typically, quantitative data is measured by visually presenting correlation tests


between two or more variables of significance. Different processes can be used
together or separately, and comparisons can be made to ultimately arrive at a
conclusion. Other signature interpretation processes of quantitative data include:

 Regression analysis: Essentially, it uses historical data to understand the


relationship between a dependent variable and one or more independent variables.
Knowing which variables are related and how they developed in the past allows
you to anticipate possible outcomes and make better decisions going forward. For
example, if you want to predict your sales for next month you can use regression to
understand what factors will affect them such as products on sale, and the launch
of a new campaign, among many others.
 Cohort analysis: This method identifies groups of users who share common
characteristics during a particular time period. In a business scenario, cohort
analysis is commonly used to understand customer behaviors. For example, a
cohort could be all users who have signed up for a free trial on a given day. An
analysis would be carried out to see how these users behave, what actions they
carry out, and how their behavior differs from other user groups.
 Predictive analysis: As its name suggests, the predictive method aims to predict
future developments by analyzing historical and current data. Powered by
technologies such as artificial intelligence and machine learning, predictive
analytics practices enable businesses to identify patterns or potential issues and
plan informed strategies in advance.
 Prescriptive analysis: Also powered by predictions, the prescriptive method uses
techniques such as graph analysis, complex event processing, and neural networks,
among others, to try to unravel the effect that future decisions will have in order to
adjust them before they are actually made. This helps businesses to develop
responsive, practical business strategies.
 Conjoint analysis: Typically applied to survey analysis, the conjoint approach is
used to analyze how individuals value different attributes of a product or service.
This helps researchers and businesses to define pricing, product features,
packaging, and many other attributes. A common use is menu-based conjoint
analysis in which individuals are given a “menu” of options from which they can
build their ideal concept or product. Through this analysts can understand which
attributes they would pick above others and drive conclusions.
 Cluster analysis: Last but not least, cluster is a method used to group objects into
categories. Since there is no target variable when using cluster analysis, it is a
useful method to find hidden trends and patterns in the data. In a business context
clustering is used for audience segmentation to create targeted experiences, and in
market research, it is often used to identify age groups, geographical information,
and earnings, among others.

You might also like