0% found this document useful (0 votes)
350 views78 pages

Introduction to Data Mining & Analytics

The document outlines the syllabus for Unit I of a Data Mining and Analytics course at MIT School of Computing, focusing on key concepts such as data types, data mining techniques, and the knowledge discovery process. It covers various data types, the importance of data analytics in decision-making, and the methodologies involved in data mining, including data cleaning, integration, and presentation. Additionally, it highlights the applications of data mining across different industries, emphasizing its role in improving operations and customer service.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
350 views78 pages

Introduction to Data Mining & Analytics

The document outlines the syllabus for Unit I of a Data Mining and Analytics course at MIT School of Computing, focusing on key concepts such as data types, data mining techniques, and the knowledge discovery process. It covers various data types, the importance of data analytics in decision-making, and the methodologies involved in data mining, including data cleaning, integration, and presentation. Additionally, it highlights the applications of data mining across different industries, emphasizing its role in improving operations and customer service.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MIT Art Design and Technology University

MIT School of Computing, Pune

21BTCS005 –Data Mining and Analytics

Class - T.Y. (SEM V), AIA

Unit - I Introduction to Data Mining and Analytics


Dr. Ranjana Kale
Prof. Dr. Vipul Dalal
Prof. Dr. Amol Bhosle
Prof. Jyoti Gavhane
Prof. Swati Kadam

AY 2025-2026 SEM-V
Unit I - Syllabus

Unit I – Introduction to Data Mining and Analytics


09 hours
Key Concepts: Data, Elements, Data types. Overview of data mining and
analytics. Introduction to data mining systems, Knowledge Discovery Process
(KDD), data mining techniques, applications of data mining, Ethics in Data
Mining and Data Privacy. Data Analysis types: Descriptive and diagnostic,,
prescriptive and Predictive, Operational data and strategic data, applications of
Data Analytics
Unit I: Introduction to Data Mining and
Analytics
■ Key Concepts-
■ Data
■ Data elements
■ Data types/ Attributes
■ Overview of Data mining
■ Overview of Data Analytics
Data, Elements, Data types.
• Data : a collection of facts, observations, or representations of
objects, typically organized in a structured or unstructured format.
• These facts can be numbers, words, measurements, observations,
or descriptions of things.
Data elements
■ Individual units of information that make up a dataset.
■ They represent specific attributes or fields within a record, such as a name, phone
number, or purchase date.
■ the building blocks for data analysis and understanding, providing the raw material for
uncovering patterns and insights.
■ Data elements are the smallest named items that convey meaningful information.
■ Examples include a customer's name, address, product ID, or transaction amount.
■ Data elements are often defined in a data dictionary, which provides details about their
meaning, data type, and other properties.
Data Types
• Nominal data
• Binary
• Ordinal data
• Numeric
• Discrete data
• Continuous data

[Link]
Data Types
⚫ Nominal: categories, states, or “names of things”
or symbols
⚪ Also called as categorical
⚪ Hair_color = {auburn, black, blond, brown, grey, red, white}
⚪ marital status, occupation, ID numbers, zip codes
⚪ Not quantitative
⚪ Represent names with numbers
E.g. 0 for black, 1 for brown, and so on…
⚪ Makes no sense to calculate mean or median
⚪ Mode is the major of central tendency

7
Data Types
• Binary
• Nominal attribute with only 2 states (0 and 1)
• Symmetric binary: both outcomes equally important
• e.g., gender
• Asymmetric binary: outcomes not equally important.
• e.g., medical test (positive vs. negative)
• Convention: assign 1 to most important outcome
(e.g., HIV positive)

8
⚫ Ordinal
⚪ Values have a meaningful order (ranking) but magnitude between successive
values is not known.
⚪ Grades- A+, A, A-, B+,B,B-,…..
⚪ Size = {small, medium, large}, professional rankings
⚪ Often used in surveys for rating
⚪ How customers are satisfied
0-very dissatisfied
1- somewhat dissatisfied
2-neutral
3-satisfied
4-very satisfied

⚪ Central tendency cab be represented by mode and


median but not mean

9
Numeric
• Quantity (integer or real-valued)
• Interval-scaled
• Measured on a scale of equal-sized units
• Values have order
• E.g., temperature in C˚or F˚, calendar dates
• No true zero-point
• Ratio-scaled
• Inherent zero-point
• We can speak of values as being an order of magnitude larger than
the unit of measurement (10 K˚ is twice as high as 5 K˚).
• e.g., temperature in Kelvin, length, counts, monetary
quantities

10
Discrete vs. Continuous
• Discrete
• Has only a finite or countably infinite set of values
• E.g., zip codes, profession, or the set of words in a collection of
documents
• Sometimes, represented as integer variables
• Note: Binary attributes are a special case of discrete
attributes
• Continuous
• Has real numbers as attribute values
• E.g., temperature, height, or weight
• Practically, real values can only be measured and
represented using a finite number of digits
• Continuous attributes are typically represented as
floating-point variables
11
Overview of data mining and analytics
The Explosive Growth of Data: from terabytes to petabytes
Data collection and data availability
Automated data collection tools, database systems, Web, computerized society
Major sources of abundant data
Business: Web, e-commerce, transactions, stocks, …
Science: Remote sensing, bioinformatics, scientific simulation, …
Society and everyone: news, digital cameras, YouTube

We are drowning in data, but starving for knowledge!


“Necessity is the mother of invention”—Data mining—Automated analysis of
massive data sets-Data Analytics
Definition of Data Mining

The nontrivial extraction of implicit, previously unknown, and


potentially useful information

13
What Is Data Mining?
Data mining (knowledge discovery in databases):
Extraction of interesting (non-trivial, implicit, previously unknown
and potentially useful) information or patterns from data in large
databases
Alternative names and their “inside stories”:
Knowledge discovery(mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data
dredging(data fishing), information harvesting, business
intelligence, etc.
What is not data mining?
(Deductive) query processing.
Expert systems or small machine learning/statistical programs
Process of semi-automatically analyzing large databases to
find patterns that are:

valid: hold on new data with some certainty


novel: non-obvious to the system
useful: should be possible to act on the item
understandable: humans should be able to interpret the
pattern

also known as knowledge discovery in databases (KDD)

15
Types of Data
• Relational databases
• Data warehouses
• Transactional Databases
• Advanced database systems
• Object-relational
• Spatial and Temporal Time-series
• Multimedia Data Mining
• Text Mining
• Web Mining
Overview of Data Analytics
• Data analytics is an important field that involves the process of collecting, processing,
and interpreting data to uncover insights and help in making decisions.
• The main benefits of data-driven decisions are that they are made up by observing past
trends which have resulted in beneficial results.
• data analytics is the process of manipulating data to extract useful trends and hidden
patterns that can help us derive valuable insights to make business predictions.
• Understanding Data Analytics
• Data analytics encompasses a wide array of techniques for analyzing data to gain
valuable insights that can enhance various aspects of operations. By scrutinizing
information, businesses can uncover patterns and metrics that might otherwise go
unnoticed, enabling them to optimize processes and improve overall efficiency.
• For instance, in manufacturing, companies collect data on machine runtime, downtime,
and work queues to analyze and improve workload planning, ensuring machines operate
at optimal levels.
The Role of Data Analytics

• Data analytics plays a pivotal role in enhancing operations, efficiency, and


performance across various industries by uncovering valuable patterns and
insights. Implementing data analytics techniques can provide companies with a
competitive advantage. The process typically involves four fundamental steps:
• Data Mining : This step involves gathering data and information from diverse
sources and transforming them into a standardized format for subsequent
analysis. Data mining can be a time-intensive process compared to other steps
but is crucial for obtaining a comprehensive dataset.
• Data Management : Once collected, data needs to be stored, managed, and
made accessible. Creating a database is essential for managing the vast amounts
of information collected during the mining process. SQL (Structured Query
Language) remains a widely used tool for database management, facilitating
efficient querying and analysis of relational databases.
• Statistical Analysis :
• In this step, the gathered data is subjected to statistical analysis to
identify trends and patterns.
• Statistical modeling is used to interpret the data and make predictions
about future trends.
• Open-source programming languages like Python, as well as specialized
tools like R, are commonly used for statistical analysis and graphical
modeling.
• Data Presentation :
• The insights derived from data analytics need to be effectively
communicated to stakeholders.
• This final step involves formatting the results in a manner that is
accessible and understandable to various stakeholders, including
decision-makers, analysts, and shareholders.
• Clear and concise data presentation is essential for driving informed
decision-making and driving business growth.
Usage of Data Analytics
• There are some key domains and strategic planning techniques in which
Data Analytics has played a vital role:
• Improved Decision-Making - If we have supporting data in favour of a
decision, then we can implement them with even more success
probability. For example, if a certain decision or plan has to lead to better
outcomes then there will be no doubt in implementing them again.
• Efficient Operations - Data Analytics can help us understand what is the
demand of the situation and what should be done to get better results
then we will be able to streamline our processes which in turn will lead to
efficient operations.
• Effective Marketing - Market segmentation techniques have been
implemented to target this important factor only in which we are
supposed to find the marketing techniques which will help us increase
our sales and leads to effective marketing strategies.
Usage of Data Analytics

• Better Customer Service - Churn modeling is the best example of this in which we try to
predict or identify what leads to customer churn and change those things accordingly so,
that the attrition of the customers is as low as possible which is a most important factor in any organization.

Churn modeling is used by companies to figure out why customers stop using their services
(this is called customer churn).
By studying patterns in customer behavior, the company tries to predict which customers
might leave in the future.
Once they know this, they can make changes—like improving service or offering discounts—
to keep more customers happy and reduce the number who leave.
This is very important because keeping existing customers is often cheaper
and more valuable than finding new ones.
Knowledge Discovery (KDD) Process

■ Data mining—core of Pattern Evaluation


knowledge discovery
process
Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration
Databases
Process of KDD
■ Data Cleaning: To remove noise & inconsistency

remove unwanted data


■ Data Integration: To combine multiple data sources. (bring to common
formats)

■ Data Selection: To retrieve the relevant data from databases for e.g. credit
card customer profiling look towards only transactions
■ Data Transformation: To summarize & aggregate

• where data are transformed and consolidated into forms appropriate for
mining by performing summary or aggregation operations
■ Data Mining: an essential process where intelligent methods are applied
To extract data patterns

■ Pattern Evaluation: To evaluate discovered patterns.

■ Knowledge Presentation: To represent discovered knowledge to users.


Data Mining: Confluence of Multiple
Disciplines
Database
Technology Statistics

Machine Data Visualization


Learning
Mining

Pattern
Recognition Other
Algorithm Disciplines
Why Not Traditional Data Analysis?
■ Tremendous amount of data
■ Algorithms must be highly scalable to handle such as tera-bytes of data
■ High-dimensionality of data
■ Micro-array may have tens of thousands of dimensions
■ High complexity of data
■ Data streams and sensor data
■ Time-series data, temporal data, sequence data
■ Structure data, graphs, social networks and multi-linked data
■ Heterogeneous databases and legacy databases
■ Spatial, spatiotemporal, multimedia, text and Web data
■ Software programs, scientific simulations
■ New and sophisticated applications
Data Mining Techniques
• techniques to extract meaningful patterns and insights from large
datasets.
• classification, clustering, association rule learning, regression,
anomaly detection, and sequential pattern mining.
• Each technique serves a different purpose, from categorizing data
to identifying relationships and predicting future trends
Data Mining Techniques
■ Multidimensional concept description: Characterization and discrimination
■ Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions
• Association:
• looks for patterns where certain items or conditions tend to appear together in a
dataset.
• used in market basket analysis to see which products are often bought together.
• Frequent patterns, association, correlation vs. causality
■ Milk->Bread [0.9%, 85%]
■ Classification and prediction
■ Classification builds models to sort data into different categories.
■ The model is trained on data with known labels and is then used to predict labels for
unknown data
■ E.g., classify countries based on (climate), or classify cars based on (gas mileage)
■ Predict some unknown or missing numerical values

Class Fruit Class Vegetables


■ Cluster analysis
■ Class label is unknown: Group data to form new classes, e.g., cluster houses to
find distribution patterns
■ Maximizing intra-class similarity & minimizing interclass similarity
■ Outlier analysis
■ Outlier: Data object that does not comply with the general behavior of the data
■ Noise or exception? Useful in fraud detection, rare events analysis
■ Trend and evolution analysis
■ Trend and deviation: e.g., regression analysis
■ Sequential pattern mining: e.g., digital camera -> large SD memory
■ Periodicity analysis
■ Similarity-based analysis
■ Other pattern-directed or statistical analyses
Applications of Data Mining
Data analysis and decision support
Market analysis and management
Target marketing, customer relationship management (CRM), market basket analysis, cross
selling,
market segmentation (A clothing retailer might offer winter wear in colder climates and lighter
clothing in warmer climates.)
Risk analysis and management
Forecasting, customer retention (restaurant might offer discounts to customers based on their
frequency of visits or preferences), improved underwriting, quality control, competitive analysis
Fraud detection and detection of unusual patterns (outliers)
Other Applications
Text mining (news group, email, documents) and Web mining
Stream data mining
Bioinformatics and bio-data analysis
31
Applications of Data Mining
• Data mining is a young discipline with wide and diverse applications
• There is still a nontrivial gap between general principles of data mining and domain-specific,
effective data mining tools for particular applications
• Some application domains
Biomedical and DNA data analysis
Financial data analysis
Retail industry
Telecommunication industry
Biomedical Data Mining and DNA Analysis

• DNA sequences: 4 basic building blocks (nucleotides): adenine (A),


cytosine (C), guanine (G), and thymine (T).
• Gene: a sequence of hundreds of individual nucleotides arranged in a
particular order
• Humans have around 100,000 genes
• Tremendous number of ways that the nucleotides can be ordered and
sequenced to form distinct genes
• Semantic integration of heterogeneous, distributed genome databases
• Current: highly distributed, uncontrolled generation and use of a
wide variety of DNA data
• Data cleaning and data integration methods developed in data
mining will help
DNA Analysis: Examples
• Similarity search and comparison among DNA sequences
• Compare the frequently occurring patterns of each class (e.g., diseased and
healthy)
• Identify gene sequence patterns that play roles in various diseases
• Association analysis: identification of co-occurring gene sequences
• Most diseases are not triggered by a single gene but by a combination of genes
acting together
• Association analysis may help determine the kinds of genes that are likely to
co-occur together in target samples
• Path analysis: linking genes to different disease development stages
• Different genes may become active at different stages of the disease
• Develop pharmaceutical interventions that target the different stages separately
• Visualization tools and genetic data analysis
Data Mining for Financial Data Analysis

• Financial data collected in banks and financial institutions are


often relatively complete, reliable, and of high quality
• Design and construction of data warehouses for
multidimensional data analysis and data mining
• View the debt and revenue changes by month, by region, by sector, and by other
factors
• Access statistical information such as max, min, total, average, trend, etc.

• Loan payment prediction/consumer credit policy analysis


• feature selection and attribute relevance ranking
• Loan payment performance
• Consumer credit rating
Financial Data Mining
• Classification and clustering of customers for targeted
marketing
• multidimensional segmentation by nearest-neighbor, classification, decision
trees, etc. to identify customer groups or associate a new customer to an
appropriate customer group
• Detection of money laundering and other financial crimes
• integration of from multiple DBs (e.g., bank transactions, federal/state crime
history DBs)
• Tools: data visualization, linkage analysis, classification, clustering tools, outlier
analysis, and sequential pattern analysis tools (find unusual access sequences)
Data Mining for Retail Industry

• Retail industry: huge amounts of data on sales, customer


shopping history, etc.
• Applications of retail data mining
• Identify customer buying behaviors
• Discover customer shopping patterns and trends
• Improve the quality of customer service
• Achieve better customer retention and satisfaction
• Enhance goods consumption ratios
• Design more effective goods transportation and distribution policies
Data Mining in Retail Industry: Examples
• Design and construction of data warehouses based on the
benefits of data mining
• Multidimensional analysis of sales, customers, products, time, and region

• Analysis of the effectiveness of sales campaigns


• Customer retention: Analysis of customer loyalty
• Use customer loyalty card information to register sequences of purchases of particular
customers
• Use sequential pattern mining to investigate changes in customer consumption or
loyalty
• Suggest adjustments on the pricing and variety of goods

• Purchase recommendation and cross-reference of items


Data Mining for Telecomm. Industry (1)

• A rapidly expanding and highly competitive industry and a great


demand for data mining
• huge data sets of customers, network and call data.
• Understand the business involved
• Identify telecommunication patterns
• spotting the defects in a network to isolate the faults.
• Catch fraudulent activities
• Make better use of resources
• Improve the quality of service
Data Mining for Telecomm. Industry (2)
• Multidimensional analysis of telecommunication data
• Whenever a call starts in the telecommunication network, the details of the call are recorded.
• The date and instant of time in which it happens, the duration of call along with the time when it ends.
• Since all the data of a call is collected in real-time, it is ready to be processed with data mining techniques.
• But we should segregate data from the customer level not from isolated single phone call levels.
• Thus, by efficient extraction of data, one can find the customer calling pattern.
Some of the data that help to find the pattern are
• average time duration of calls
• Time in which the call took place (Daytime/Night-time)
• The average number of calls on weekdays
• Calls generated with varied area code
• Calls generated per day, etc.
Data Mining for Telecomm. Industry (3)
•Fraudulent pattern analysis and the identification of unusual patterns
• Identify potentially fraudulent users and their atypical usage patterns
• Detect attempts to gain fraudulent entry to customer accounts
• Discover unusual patterns which may need special attention

•Multidimensional association and sequential pattern analysis


• Find usage patterns for a set of communication services by customer group, by month,
etc.
• Promote the sales of specific services
• Improve the availability of particular services in a region

•Use of visualization tools in telecommunication data analysis


Ethics in Data Mining and Data Privacy
• Due to improvements in data collection and warehousing
technologies, businesses are amassing ever-increasing volumes of
customer data.
• However, as this information is collected, privacy concerns, the
transformation of raw data into useful information and misuse of
data are also increasing.
• Today’s data leaders face ethical challenges as they navigate a
contentious legal and financial environment.
• Privacy: It means respecting an individual's data with confidentiality and
consent.
• Fairness and Bias: Ensuring fairness in data-driven processes and addressing
biases that may arise in algorithms, preventing discrimination against certain
groups.
• Accountability: Holding individuals and organizations accountable for their
actions and decisions based on data.
• Security: Implementing robust security measures sensitive data and protects
them from unauthorized access and breaches.
• Data Quality: Ensures the accuracy of the data , completeness and the reliability
of the data to prevent any misinformation.
Ethical Concerns in Data Mining
• Transparency:
• Customers should have a certain amount of visibility into and control over how their
data is collected and used.
• Companies should be forthcoming with their data collection and use practices and ask
permission before acting rather than asking for forgiveness after the fact.
• However, transparency with opt-in or opt-out procedures is not sufficient.
• Customers should be presented with and asked to explicitly consent to specific
language around data access and usage in order to make informed choices.
• Mass broadcasts of fine print opt-in messages are not solving today’s data collection
and usage transparency concerns.
Ethical Concerns in Data Mining
• Data Accuracy and Bias: Data mining algorithms may produce
inaccurate or biased results if the data used for analysis is
incomplete, outdated, or biased itself. This can lead to unfair
treatment or discrimination against certain groups or individuals
• Governance:
• Even in the European Union (EU) , where the GDPR(General Data
Protection Regulation) offers a more comprehensive legal framework
for data practices, control within companies is just as essential to
protecting consumer data.
• There must be leaders assigned to policy development, supervision
and enforcement. Without proper governance, ethical lapses and
legal troubles are inevitable.
Data Privacy
• Data mining often involves collecting and analyzing large amounts of personal data, which can
raise concerns about individuals' privacy rights. This data can include sensitive information such
as health records, financial transactions, and browsing history. Respecting data privacy laws and
obtaining informed consent are foundational principles:
• Informed Consent: Clearly communicating to individuals how their data will be used, ensuring
they understand and agree to its usage.
• Anonymization: Stripping personally identifiable information whenever possible to protect
individual identities.
• Compliance: Adhering to legal frameworks such as GDPR, HIPAA, or CCPA to ensure lawful
and ethical data handling.
• GDPR (General Data Protection Regulation). It's a European Union law focused on protecting the personal
data of individuals within the EU and EEA.
• HIPAA stands for the Health Insurance Portability and Accountability Act. It is a US federal law enacted in
1996. The main purpose of HIPAA is to protect the privacy and security of individuals' health
information. CCPA typically refers to the California Consumer Privacy Act. This is a state law in
California that grants consumers more control over the personal information that businesses collect about
them.
Data Analysis types: Descriptive and
diagnostic
• Data is collected, processes and analyzed
• Data Analytics - is the practice of examining raw data to identify
trends, draw conclusions, and extract meaningful information.
• It involves various techniques and tools to process and transform
data into valuable Insights that can be used for decision-making.
Types of data analysis/ Analytics:
1. Predictive (forecasting)
2. Descriptive (business intelligence and data mining)
3. Prescriptive (optimization and simulation)
4. Diagnostic analytics
Predictive analytics
• It turn the data into valuable, actionable information.
• It uses data to determine the probable outcome of an event or a
likelihood of a situation occurring.
• Predictive analytics is a branch of data science that leverages statistical
techniques, machine learning algorithms, and historical data to make
data-driven predictions about future outcomes..
• 1. Define a Problem:
• clearly expressing the challenge that the organization aims to focus using data analysis.
• A well- defined problem statement helps determine the appropriate predictive analytics approach to employ.
• 2. Gather and Organize Data:
• collecting and preparing relevant information and data from various sources like databases, data
warehouses, external data providers, APIs, logs, surveys, and more that can be used to build and train
predictive models.
• 3. Pre-process Data:
• Raw data collected from different sources is rarely in an ideal state for analysis. So, before developing a
predictive models, data need to be pre-processed properly.
• Pre-processing involves cleaning the data to remove any kind of anomalies, handling missing data points and
addressing outliers that could be caused by errors or input or transforming the data , which can be used for
further analysis.
• Pre-processing ensures that data is of high quality and now the data is ready for model development
4. Develop Predictive Models:
• Data scientists or data analysts leverage a range of tools or techniques to develop a predictive
models based on the problem statement and the nature of the datasets.
• machine learning algorithms, regression models , decisions trees, neural networks are much
among the common techniques for this.
• These models are trained on the prepared data to identify correlations and patterns that can be
used for making predictions.
5. Validate and Deploy Results:
• After building the predictive model, validation is the critical steps to assess the accuracy and
reliability of predictions.
• Data scientists rigorously evaluate the model's performance against known outcomes or test
datasets.
• If required, modifications are implemented to improve the accuracy of the model.
• Once the model achieve satisfactory outcomes it can be deployed to deliver predictions to
stakeholders.
• This can be done through applications, websites or data dashboards, making the insights easily
accessible to decision makers or stakeholders.
Predictive Analytics Techniques:
• Predictive analytical models leverage historical data to anticipate future events or outcomes, employing
several distinct types:
• Classification Models: These predict categorical outcomes or categorize data into predefined groups.
Examples include Logistic Regression, Decision Trees, Random Forest, and Support Vector Machine.
• Regression Models: Used to forecast continuous outcome variables based on one or more independent
variables. Examples include Linear Regression, Multiple Regression, and Polynomial Regression.
• Clustering Models: These group similar data points together based on shared characteristics or patterns.
Examples comprise K-Means Clustering and Hierarchical Clustering.
• Time Series Models and forecasting : Designed to predict future values by analyzing patterns in historical
time-dependent data. Examples include Autoregressive Integrated Moving Average
(ARIMA) and Exponential Smoothing Models.
• Neural Networks Models: Advanced predictive models capable of discerning complex data patterns and
relationships. Examples encompass Feed Forward Neural Networks, Recurrent Neural Networks,
and Convolutional Neural Networks.
Basic Cornerstones of Predictive Analytics
• Predictive modeling
• Decision Analysis and optimization
• Transaction profiling
Why Predictive Analytics is important?
• Predictive analytics is important for several reasons:
• Informed Decision-Making: By anticipating future trends and outcomes, businesses and
organizations can make more strategic decisions. Imagine being able to predict customer churn
(when a customer stops using your service) or equipment failure before it happens. This allows
for proactive measures to retain customers or prevent costly downtime.
• Risk Management: Predictive analytics helps identify and mitigate potential risks. For example,
financial institutions can use it to detect fraudulent transactions, while healthcare providers can
predict the spread of diseases.
• Optimization and Efficiency: Predictive models can optimize processes and resource
allocation. Businesses can forecast demand and optimize inventory levels, or predict equipment
maintenance needs to avoid disruptions.
• Personalized Experiences: Predictive analytics allows for personalization and customization.
Retailers can use it to recommend products to customers based on their past purchases and
browsing behavior.
• Innovation and Competitive Advantage: Predictive analytics empowers organizations to
identify new opportunities and develop innovative products and services. By understanding
customer needs and market trends, businesses can stay ahead of the competition.
Applications of Predictive Analytics
Applications of Predictive Analytics in Business
• Customer Relationship Management (CRM): Predicting customer churn (customer leaving),
recommending products based on past purchases, and personalizing marketing campaigns.
• Supply Chain Management: Forecasting demand for products, optimizing inventory levels, and
predicting potential disruptions in the supply chain.
• Fraud Detection: Identifying fraudulent transactions in real-time for financial institutions and
e-commerce platforms.
Applications of Predictive Analytics in Finance
• Credit Risk Assessment: Predicting the likelihood of loan defaults to make informed lending decisions.
• Stock Market Analysis: Identifying trends and patterns in stock prices to inform investment strategies.
• Algorithmic Trading: Using models to automate trading decisions based on real-time market data.
Applications of Predictive Analytics in Healthcare
• Disease Outbreak Prediction: Identifying potential outbreaks of infectious diseases to enable early
intervention.
• Personalized Medicine: Tailoring treatment plans to individual patients based on their genetic makeup
and medical history.
• Readmission Risk Prediction: Identifying patients at high risk of being readmitted to the hospital to
improve patient care and reduce costs.
Descriptive Analytics
• looks at data and analyze past event for insight as to how to approach future events.
• It looks at past performance and understands the performance by mining historical
data to understand the cause of success or failure in the past.
• Almost all management reporting such as sales, marketing, operations, and finance
uses this type of analysis.
• This process helps decision-makers understand what occurred in the past using
historical data.
• Then, they can use the data findings to find opportunities and challenges looking
forward.
• Common algorithm types used in descriptive analytics include clustering algorithms,
decision trees, association rules, regression analysis, and time series analysis.
• These methods, among others, help categorize measures of frequency, central
tendency, dispersion, position, contingency tables, and scatter plots
• Compilation and Summary: The goal of descriptive analytics is to offer an overview of the data at
a high level. To get important metrics and statistics, such as mean, median, mode, range, and
standard deviation, this frequently requires combining the data.
• Fiction Creation: Descriptive analytics can include the creation of descriptions that offer a logical
and contextualized explanation of the data, in addition to visuals. When communicating findings
to those in the audience who might not be familiar with the complexities of the data, this can be
especially helpful.’
• Interpretation: To obtain significant knowledge, analysts interpret the outcomes of descriptive
analytics. This involves knowing the effects of the trends and patterns seen in the data. While
interpretation provides the foundation for more in-depth analyses that investigate "why" and
"what might happen in the future," descriptive analytics concentrates on the "what happened"
topic.
Advantages of Descriptive Analytics
• Descriptive analytics helps organizational workflows by making difficult
concepts easy for everyone and simplifying the distribution of information.
• Data-driven decision making: It provides well-informed decision-making based
on facts rather than gut instincts by evaluating and simplifying data.
• Presents data clearly: Descriptive analytics simplifies complex data, making it
easy to understand through reports and visualizations like charts and graphs.
• Convenient to Realize: Data that has been summarized and graphically
represented is easier to clarify and evaluate for a larger audience.
• Identifies Relevant Data Points: It offers straightforward metrics that give an
accurate estimation of important data points.
• Simple and cost-effective: Descriptive analytics is simple to use and just
requires basic arithmetic knowledge for execution.
• Efficient with tools: With the aid of tools like Python or MS Excel, which make
things fast and easy.
Disadvantages of Descriptive Analytics
• Inability of Cause Analysis: The main goal of descriptive analytics is to explain
historical events. It doesn't explore the root causes or reasons for the patterns
that are seen.
• Analysis Simplicity: The reach of descriptive analytics is restricted to basic
analyses that look at the relationships between a small number of variables.
• Doesn't Explain Why: History offers lessons for future generations, by offering
facts, but causes and predictions are not provided to the readers.
• Inappropriate for Making Decisions in Real Time: Normally, descriptive
analytics involves getting summary information at intervals intervals and this
might not be the best option for decision- making when the time matter. In
many situations, fast responsiveness is vital, therefore, sometimes only relying
on the descriptive analytics might drag you behind.
• Lack of ability to handle unstructured data: Structured and well-organized
datasets are better suited for descriptive analytics. while analyzing
semi-structured or unstructured data, such as text, photos, or multimedia, it
could make challenging to offer insightful analysis.
Applications of Descriptive Analytics
• Financial Performance Evaluation: For instance, in the past; descriptive analytics was often used to
appraise and assess a specific firm's previous performances. Lots of organizations can detect trends,
patterns and possibilities for a change by tracking key performance indicators (KPI's) at different periods of
time. This awareness helps in the construction and building of business operations with all the required
strategic planning.
• Marketing and Analysis of Customer Behavior: However, Companies should analyze and understand the
customers' behavior. Firms need descriptive analytics to weight historical data on consumer interactions,
purchasing patterns, and preferences.
• Friction Analysis in Business Processes: Descriptive analytics is applied descriptive approaches in business
learning and development, and to detect and reduce friction in business processes. All the blockades or
imparing of efficiency restraining processes from moving will be called friction. Organizations can easily
pinpoint the bottlenecks of their business processes by looking at historical data over workflow delays
using of resources and process’s time.
• Social Networking Analytics: In order to analyze user involvement, content performance, and audience
demographics, descriptive analytics is used in social media. It assists businesses in customizing their social
media plans according on past performance.
• Human Resources Management: HR uses descriptive analytics to analyze their staff. It aids businesses in
the analysis of previous information on worker performance, turnover rates, training effectiveness, and
other HR indicators.
• Crime and Fraud Detection
Prescriptive Analytics
• Prescriptive Analytics is the area of Business Analytics dedicated to searching out the best
solution for day-to-day occurring problems. It is directly related to the other two comparable
processes, i.e. Descriptive and Predictive Analytics.
• Prescriptive Analytics can be defined as a type of data analytics that uses algorithms and
analysis of raw data to achieve better and more effective decisions for a long and short span of
time.
• It suggests strategy over possible scenarios, accumulated statistics, and past/present
databases collected through the consumer community.
• Prescriptive Analytics not only anticipates what will happen and when to happen but also why it
will happen.
• Further, Prescriptive Analytics can suggest decision options on how to take advantage of a
future opportunity or mitigate a future risk and illustrate the implication of each decision
option.
• For example, Prescriptive Analytics can benefit healthcare strategic planning by using analytics
to leverage operational and usage data combined with data of external factors such as
economic data, population demography, etc.
Prescriptive Analytics Approach
• Step 1 Data Collection: Gather data for a customer's locations, their requirement,
company warehouses, and transportation
• Step 2 Mathematical Modeling: We will create mathematical models that will handle
supply chain data like customer location, time, warehouse location, and routes, we will
also finalize an optimization function that will minimize company cost and delivery time
• Step 3 Optimization: We will use an optimization approach like linear programming or
differential calculus to solve mathematical models and find optimal locations.
• Step 4 Scenario Analysis: We will perform a scenario analysis for our assumptions
variables about the models.
• Step 5 Decision Support: Based on our data modeling and business knowledge that we
got from the raw data we will create dashboards and visualization graphs that will
stakeholders in taking decisions.
• Step 6 Implementation: The Final and most important part after doing all the five steps
is to implement it with changes that maximizes the company's revenues
Advantages of Prescriptive Analytics

• Effortlessly map Business analysis to declare out steps necessary to


avoid failure and achieve success.
• An accurate and Comprehensive form of data aggregation and
analysis also reduces human error and bias.
• Helping in decision-making threads related to problems rather than
jumping to unreliable conclusions based on instincts.
• Removing immediate uncertainties helps in the prevention of fraud,
limits risk, increases efficiency, and creates logical customers.
Difference between Descriptive vs. Predictive vs. Prescriptive Analytics
Feature Descriptive Analytics Predictive Analytics Prescriptive Analytics
Understand what Forecast what might Recommend actions to
Purpose
happened in the past. happen in the future. achieve desired outcomes.

Decision-making and
Focus Historical data analysis. Future trends and patterns.
optimization.

Future events and Future actions and


Time Frame Past events and trends.
probabilities. recommendations.

Predicting future sales Recommending product


Summarizing sales data
Examples based on market trends pricing strategies to
from the previous month.
and historical data. maximize profits.

Reporting tools,
Statistical models, machine Optimization algorithms,
Tools dashboards, data
learning algorithms. decision support systems.
visualization.

Prescriptive performance
Descriptive statistics: Predictive accuracy
Key Metrics metrics: ROI, cost-benefit
mean, median, mode, etc. metrics: RMSE, MAE, etc.
analysis, etc.
Feature Descriptive Analytics Predictive Analytics Prescriptive Analytics

Offers actionable
Decision Provides insights for Guides future actions and
recommendations to
Support informed decision-making. strategies.
achieve specific goals.

Suggesting personalized
Predicting customer churn
Example Analyzing website traffic to marketing campaigns
to anticipate and prevent
Application understand user behavior. based on customer
losses.
segmentation.

Optimal decision-making
Historical understanding Future prediction and risk
Objective and performance
and trend analysis. assessment.
improvement.

Anticipating future Maximizing outcomes and


Historical insights for
Impact scenarios for proactive efficiency through informed
strategy refinement.
decision-making. actions.

Historical data sets, future


Data Historical data sets, future
Historical data sets. predictors, decision
Requirements predictors.
variables.
Diagnostic Analytics
• In this analysis, we generally use historical data over other data to answer any question or for
the solution of any problem.
• We try to find any dependency and pattern in the historical data of the particular problem.
• The main purpose of diagnostic analytics is to find the root causes behind trends or problems
in the data. It goes beyond just describing what’s happening and it helps businesses
understand the occurrence of certain events. diagnostic analytics helps to achieve:
I. Identify Root Causes: It helps businesses understand data in a better way, also to identify
the key factors of these specific outcomes which leads to clear, actionable insights.
II. Solve Problems: By identifying root causes, the companies can choose targeted solutions to
resolve issues and improve performance.
III. Inform Future Decisions: Understanding past events and their causes helps businesses to
make data-driven decisions and develop smarter strategies
• Common techniques used for Diagnostic Analytics are:
1. Data discovery
2. Data mining
3. Correlations
• Diagnostic Analytics plays an important role in today’s world of data in helping
businesses to understand not just what happened but also why it happened.
Think of it like solving a mystery and asking questions like “Why did sales fall?”,
“Why are customers leaving?” or “What caused this system breakdown?” By
understanding data, businesses can identify main reasons behind these issues
and can take action to resolve them.
Key Steps in Diagnostic Analytics
• Identify the Anomaly: Detect irregularities in data by using sources such as website logs,
customer feedback and financial records. This helps in identifying issues that requires further
investigation.
• Data Collection: Collect data from various sources which include transaction records, surveys,
system logs or other sources that provide important information to understand the situation
better.
• Data Exploration: Explore the collected data to identify trends, patterns and correlations.
Techniques like statistical analysis and data visualization help us to find insights that explain the
anomaly.
• Pattern Identification: Using data analysis methods like machine learning and correlation analysis
helps to detect recurring patterns or trends. This step helps in merging the anomalies to potential
causes.
• Root Cause Analysis: Check the identified patterns to find the main cause of the issue. This step
helps to answers questions like whether the cause is due to operational issues, external factors or
system issues.
• Testing and Confirmation: Check the hypothesis using various tests or simulations. For example
testing whether a new website feature caused a decline in user engagement or if it was due to a
marketing change.
Benefits of Diagnostic Analytics
• Deeper Insights: It helps businesses to find hidden patterns and trends which provides a
clearer understanding of data. This help them to make informed decisions based on
facts rather than assumptions.
• Improved Problem-Solving: By identifying root causes, businesses can follow targeted
solutions that helps in solving problems.
• Optimized Processes: It highlights inefficiencies in workflows which allows businesses
to streamline processes. This helps in improving productivity, faster delivery and better
resource utilization.
• Enhanced Decision-Making: With data driven insights, businesses can make more
informed and strategic decisions. This helps in minimizing risks and ensures that all the
actions align with long-term goals.
• Risk Reduction: Early detection of issues helps businesses to avoid risks before they
increases. By taking timely measures, companies can prevent disruptions and avoid
costly mistakes.
• Customer Satisfaction: It help businesses to understand customer needs and
preferences which helps in creating personalized experiences, improving satisfaction
and stronger customer loyalty.
Applications of Diagnostic Analytics
• Human Resources: It helps to understand employee behavior and turnover
easily and allows companies to identify issues like low salaries or poor work
culture and improve retention.
• Healthcare: By analyzing re-admission rates and treatment outcomes it helps
hospitals to improve patient care, optimize discharge plans and increase
operational efficiency.
• Manufacturing: It identifies causes of machine downtime and operational
inaccuracies which helps manufacturers to streamline production processes and
reduce costs.
• Information Technology(IT): It helps in identifying and resolving network issues
which enhances system performance and ensures smoother IT operations.
• Retail: It provides insights of customer behavior, optimizing marketing
strategies, inventory management and customer experience which helps in
higher sales.
Operational data and strategic data
Operational Data:
• Operational data refers to the real-time, transactional data that drives a
business's day-to-day operations.
• This data is crucial for ensuring the smooth and efficient functioning of an
organization and is often stored in operational databases designed for
high-speed access and frequent updates.
Examples of Operational Data:
• Customer information (names, addresses, contact details)
• Sales records (products purchased, quantities, prices)
• Inventory levels (current stock, reorder points)
• Financial transactions (payments received, expenses incurred)
• Log data from IT systems (error logs, system usage)
• Customer service interactions (support tickets, chat logs)
Key Characteristics of Operational Data:
• Operational Data:
• Real-time and transactional:
• Operational data captures events and transactions as they occur, providing a current
view of the business.
• Focus on day-to-day operations:
• It supports the core functions and processes that keep the business running, such as
order fulfillment, customer service, and inventory management.
• High-volume and write-intensive:
• Operational systems often handle a large volume of data updates (writes) and require
fast and reliable data storage.
• Data consistency and integrity:
• Operational databases prioritize data accuracy and consistency to ensure reliable
information for business operations
Strategic Data:
• Strategic data for a data warehouse refers to the specific data elements and information that are most
crucial for supporting an organization's long-term goals and strategic decision-making.
• Provides insights for long-term planning and strategic decision-making.
• Analyzes historical data to identify trends, patterns, and opportunities.
• Characteristics: Aggregated, historical, and focused on patterns and trends.
• Used for tasks like strategic planning, product development, marketing strategy, and forecasting.
Examples of Strategic Data:
• Customer Data: Information about customer demographics, purchasing behavior, preferences, and
interactions across different channels.
• Sales Data: Detailed sales figures, including product performance, sales trends, and regional
performance.
• Financial Data: Revenue, costs, profitability, and other financial metrics that are relevant to strategic
planning.
Key Aspects of Strategic Data:
• Alignment with Business Goals:
• Strategic data directly supports the organization's overall objectives, such as increasing
market share, improving customer satisfaction, or optimizing operations.
• Focus on Key Performance Indicators (KPIs):
• It identifies and includes data related to the most important KPIs that measure progress
towards strategic goals.
• Historical and Predictive Data:
• Strategic data includes both historical data for analysis of past performance and
predictive data to forecast future trends and outcomes.
• Cross-Functional Data:
• It often integrates data from various departments and systems within the organization,
providing a holistic view of the business.
• High Quality and Consistency:
• Strategic data must be accurate, reliable, and consistent to ensure that analysis and
decisions are based on trustworthy information
Operational data vs strategic data
Feature Operational Data Strategic Data
Purpose Support day-to-day Guide long-term
operations planning
Focus Immediate actions Overall business
performance
Timeframe Real-time, short-term Historical, long-term

Granularity Detailed, transactional Aggregated, focused on


trends
Examples Sales transactions, Sales trends, market
customer interactions analysis
Applications of Data Analytics
• Retail : To study sales patterns, consumer behavior, and inventory
management, data analytics can be applied in the retail sector. Data
analytics can be used by retailers to make data-driven decisions
regarding what products to stock, how to price them, and how to
best organize their stores.
• Healthcare : Data analytics can be used to evaluate patient data,
spot trends in patient health, and create individualized treatment
regimens. Data analytics can be used by healthcare companies to
enhance patient outcomes and lower healthcare expenditures.
• Finance : In the field of finance, data analytics can be used to
evaluate investment data, spot trends in the financial markets, and
make wise investment decisions. Data analytics can be used by
financial institutions to lower risk and boost the performance of
investment portfolios.
Applications of Data Analytics
• Marketing : By analyzing customer data, spotting trends in consumer
behavior, and creating customized marketing strategies, data analytics
can be used in marketing. Data analytics can be used by marketers to
boost the efficiency of their campaigns and their overall impact.
• Manufacturing : Data analytics can be used to examine production data,
spot trends in production methods, and boost production efficiency in
the manufacturing sector. Data analytics can be used by manufacturers to
cut costs and enhance product quality.
• Transportation : To evaluate logistics data, spot trends in transportation
routes, and improve transportation routes, the transportation sector can
employ data analytics. Data analytics can help transportation businesses
cut expenses and speed up delivery times.
Thank You

You might also like