0% found this document useful (0 votes)
4 views

Predictive-Analytics (1)

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Predictive-Analytics (1)

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Quantitative Analysis

Predictive Statistics
Overview
• Analysts engage in three types of tasks: 1) descriptive, 2) predictive
and 3) prescriptive.
• Predictive methods include:
• Classification, to predict which class an individual record will
occupy (e.g., will a particular customer buy?)
• Prediction, to predict a numerical outcome for an individual record
(e.g., how much will that customer spend?)
• Now that data is plentiful, data mining enables more accurate
prediction.
Predictive analytics
• is a branch of advanced analytics that utilizes data,
stat i st i ca l a l go r i t h m s , a n d m a c h i n e l e a r n i n g
techniques to identify the likelihood of future
outcomes based on historical data.
• primary aim is to forecast trends, behavior, and events
to guide decision-making processes effectively.
Importance and Applications
• Financial Services: Banks and financial institutions use
predictive analytics for credit scoring, fraud detection, and risk
assessment.
• Healthcare: Healthcare providers employ predictive analytics
for patient diagnosis, disease prevention, and personalized
treatment plans.
• Marketing: Marketers utilize predictive analytics to forecast
customer behavior, optimize advertising campaigns, and
enhance customer segmentation strategies.
Importance and Applications
• Manufacturing: Manufacturers apply predictive analytics
for predictive maintenance, quality control, and supply
chain optimization.
• E-commerce: E-commerce platforms leverage predictive
analytics for product recommendations, dynamic pricing,
and customer churn prediction.
PERFORMANCE MEASURES
• The ultimate goal of data analysis is to predict the future.
• To classify new instances (e.g., whether a registered voter
will vote)
• We measure predictive accuracy by instances correctly
classified
• For numerical predictions (e.g., number of votes received by
the winner in each state)
• Accuracy measured by differences between predicted and
actual outcomes
Predictive Modeling Techniques
• k-Nearest Neighbor
• Naïve Bayes
• Classification and Prediction Trees
• Multiple Linear Regression
• Logistic Regression
• Neural Networks
• Time series analysis
K-NEAREST NEIGHBOR METHOD
• Bases classification of a new case on records most similar to the new
case
• E.g., by Pandora’s Music Genome Project to identify songs that
appeal to a user
• Answers three major questions:
• How to define similarity between records?
• How many neighboring records to use?
• What classification or prediction rule to use?
THE NAÏVE BAYES METHOD
• Similar to k-Nearest Neighborhood but, restricted to situations in which
all predictor variables are categorical
• Example: Spam filtering, based on categorical values Word Appeared
and Word did not Appear.
CLASSIFICATION AND PREDICTION TREES
• Based on the observation that there are subsets of records in
a database that contain mostly 1s or 0s
• Identify the subsets, and we can classify a new record based
on majority outcome in the subset it most resembles
• Example: Predict purchasing behavior of individuals for
whom we know three variables, being age, income and
education
Process of CLASSIFICATION AND PREDICTION
TREES
• First describe the approach for classification (with numerical predictors) then
how to use predictor variables for classification, as follows:
1. Pick a predictor variable.
2. Sort its values from low to high.
3. Define a set of split points as midpoints between each pair of values.
4. For each split point, divide records into above/below split.
5. Evaluate homogeneity of records in each subset (extent to which records
are mostly 1s or 0s)
6. Repeat for all split points for this variable
7. Choose split point that gives most homogeneous subsets
8. Repeat for all variables
9. Split on the variable with highest homogeneity
10. Repeat for each subset of records
LINEAR REGRESSION
statistical technique for modeling the
relationship between a dependent
variable (also known as the target
variable) and one or more
independent variables (also known as
predictor variables or features).
It assumes a linear relationship
between the independent variables and
the dependent variable.
Types of Linear Regression
• Simple Linear Regression: Involves only one independent
variable. The relationship between the independent
variable and the dependent variable is modeled using a
straight line.
• Multiple Linear Regression: Involves more than one
independent variable. The relationship between the
independent variables and the dependent variable is
modeled using a linear equation with multiple
coefficients.
MULTIPLE LINEAR REGRESSION
• One of the most widely-used tools from classical statistics
• Used widely in natural and social sciences, more often for explanatory than
predictive modeling
– To determine if specific variables influence outcome variable
• Answers questions like:
– Do the data support the claim that women are paid less than men in
comparable jobs?
– Is there evidence that price discounts and rebates lead to higher long-term
sales?
– Do data support the idea that firms that outsource manufacturing overseas
have higher profits?
Key Concepts of Linear Regression
• Dependent Variable (Y): The variable that you want to predict or explain. It is denoted as Y
and is typically continuous.
• Independent Variables (X): The variables used to predict the dependent variable. There
can be one or more independent variables denoted as X₁, X₂, ..., Xᵣ.
• Linear Relationship: Linear regression assumes that the relationship between the
independent variables and the dependent variable is linear, which means that a change in
the independent variables results in a proportional change in the dependent variable.
• Parameters (Coefficients): Linear regression estimates the coefficients (β₀, β₁, β₂, ..., βᵣ) of
the linear equation that best fits the data. These coefficients represent the slope of the
relationship between each independent variable and the dependent variable.
• Intercept (β₀): The intercept represents the value of the dependent variable when all
independent variables are zero.
LOGISTIC REGRESSION
• A statistical approach to classification of categorical
outcome variables
• Similar to multiple linear regression, but can be used when
the outcome has more than two values
• Uses data to produce a probability that a given case will
fall into one of two classes (e.g., flights that leave on
time/delayed, companies that will/will not default on
bonds, employees who will/will not be promoted)
NEURAL NETWORKS
• An outgrowth of research within artificial
intelligence into how the brain works
• Used for classification and prediction
• Applied to extremely wide variety of areas
(e.g., from financial applications to controlling
robots)
• In finance:
• To predict bankruptcy of firms
• To trade on currency, stock or bond
markets
• To predict credit card fraud
• Complex and difficult to understand but high
predictive accuracy
Time Series Analysis
• statistical modeling and forecasting that focuses on analyzing
data points collected, recorded, or observed sequentially over
time.
• Key concepts:
• Temporal Structure: Time series data has a natural temporal ordering,
where each data point is associated with a specific time index.
• Components: Time series data can typically be decomposed into
various components:
• Trend: The long-term direction or tendency of the data over time.
• Seasonality: Patterns that repeat at fixed intervals, such as daily,
weekly, or yearly cycles.
Temporal Analysis

• The study of time and data is known as "temporal analysis." It involves


looking at data and events across different time measurements, like
hours, days, weeks, months, or years.
Decomposition Process
1. Data Visualization: First, let's visualize the monthly sales data to
get an overview of its pattern and identify any trends or seasonality
2. Seasonal Decomposition: We'll use the STL method to decompose
the time series into its trend, seasonal, and irregular components.
3. Analysis and Interpretation: Once decomposed, we'll analyze and
interpret each component to understand the underlying patterns in
the data.
4. Forecasting: Finally, we can use the decomposed components to
forecast future sales trends.
STL stands for "Seasonal-Trend decomposition
using LOESS,"
• a popular method for
decomposing time
series data into its
underlying
components: trend,
seasonal, and irregular.
• Seasonal: This component captures patterns that repeat at fixed
intervals, such as daily, weekly, or yearly cycles.
• Trend: The trend component represents the long-term direction
or tendency of the data. It captures gradual changes in the data
over time, such as increasing or decreasing values.
• Decomposition: Decomposition refers to the process of breaking
down the time series into its constituent parts, namely trend,
seasonal, and irregular components.
• LOESS: LOESS stands for "Locally Weighted Scatterplot Smoothing," a
non-parametric regression technique used to estimate the trend and
seasonal components of the time series.

You might also like