FBAS Notes
FBAS Notes
INTRODUCTION
Chapter 1 DECISION MAKING
Defined as the following process:
Three developments spurred recent explosive growth 1. Identify and define the problem.
in the use of analytical methods in business 2. Determine the criteria that will be used to
applications: evaluate alternative solutions.
3. Determine the set of alternative solutions.
First development: 4. Evaluate the alternatives.
- Technological advances—scanner technology, 5. Choose an alternative.
data collection through e-commerce, Internet
social networks, and data generated from Common approaches to making decisions include:
personal electronic devices—produce 1. Tradition
incredible amounts of data for businesses. 2. Intuition
- Businesses want to use these data to improve 3. Rules of thumb
the efficiency and profitability of their 4. Using the relevant data available
operations, better understand their customers,
price their products more effectively, and gain
a competitive advantage. Managers’ responsibility:
To make (1) strategic, (2) tactical,
Second development: or (3) operational decisions.
- Ongoing research has resulted in numerous
methodological developments, including: 1) Strategic decisions:
● Advances in computational - Involve higher-level issues concerned with the
approaches to effectively handle and overall direction of the organization.
explore massive amounts of data. - Define the organization’s overall goals and
● Faster algorithms for optimization and aspirations for the future.
simulation.
● More effective approaches for 2) Tactical decisions:
visualizing data. - Concern how the organization should achieve
the goals and objectives set by its strategy.
Third development: - Are usually the responsibility of mid level
- The methodological developments were paired management.
with an explosion in computing power and
storage capability. 3) Operational decisions:
- Better computing hardware, parallel - Affect how the firm is run from day to day.
computing, and cloud computing have enabled - Are the domain of operations managers, who
businesses to solve big problems faster and are the closest to the customer.
more accurately than ever before.
BUSINESS ANALYTICS
- Scientific process of transforming data into II. PREDICTIVE Analytics
insight for making better decisions. - Consists of techniques that use models
- Used for data-driven or fact-based decision constructed from past data to predict the future
making, which is often seen as more objective or ascertain the impact of one variable on
than other alternatives for decision making. another.
- Survey data and past purchase behavior may
Tools of business analytics can aid decision making be used to help predict the market share of a
by: new product.
- Creating insights from data.
- Improving our ability to more accurately Techniques used in Predictive Analytics:
forecast for planning. - Linear regression
- Helping us quantify risk. - Time series analysis
- Yielding better alternatives through analysis - Data mining: used to find patterns or
and optimization. relationships among elements of the data in a
large database
Categorization of Analytical METHODS and - Simulation: involves the use of probability
MODELS and statistics to construct a computer model to
study the impact of uncertainty on a decision.
I. DESCRIPTIVE Analytics:
Encompasses the set of techniques that describes
what has happened in the past III. PRESCRIPTIVE Analytics:
Examples: - Indicates a best course of action to take
- Data queries - Provide a forecast or prediction, but do not
- Reports provide a decision.
- Descriptive statistics
- Data visualization (including data dashboards) - Prescriptive Model: A forecast or prediction
- Data-mining techniques combined with a rule.
- Basic what-if spreadsheet models - Rule-based Model: Prescriptive models that
rely on a rule or set of rules.
Data query: A request for information with certain - Simulation optimization: Combines the use
characteristics from a database of probability and statistics to model
uncertainty with optimization techniques to find
Data dashboards: Collections of tables, charts, maps, good decisions in highly complex and highly
and summary statistics that are updated as new data uncertain settings.
become available.
Uses of dashboards: Decision analysis:
- To help management monitor specific aspects - Used to develop an optimal strategy
of the company’s performance related to their when a decision maker is faced with
decision-making responsibilities. several decision alternatives and an
- For corporate-level managers, daily data uncertain set of future events.
dashboards might summarize sales by region, - Employs Utility theory: assigns
current inventory levels, and other values to outcomes based on the
company-wide metrics. decision maker’s attitude toward risk,
- Front-line managers may view dashboards loss, and other factors.
that contain metrics related to staffing levels,
local inventory levels, and short-term sales
forecasts.
Optimization models: Models that give the best
Data mining: The use of analytical techniques for decision subject to constraints of the situation.
better understanding patterns and relationships that - Portfolio models
exist in large data sets. - Supply network design models
- Price-markdown models
Examples:
- Cluster analysis
- Sentiment analysis
Examples:
a) Portfolio models
- Finance field
- Use historical investment return data to
determine the mix of investments that yield the
highest expected return while controlling or
limiting exposure to risk.
c) Price-markdown models
- Retailing field
- Use historical data to yield
revenue-maximizing discount levels and the
timing of discount offers when goods have not
sold as planned.
BIG DATA 1) Volume
Any set of data that is too large or too complex to be - Because data is collected electronically, we
handled by standard data-processing techniques and are able to collect more of it.
typical desktop software. - To be useful, this data must be stored, and this
storage has led to vast quantities of data.
IBM describes the phenomenon of big data through
the four V’s: 2) Velocity
1. Volume - Real-time capture and analysis of data present
2. Velocity unique challenges both in how data is stored
3. Variety and the speed with which that data can be
4. Veracity analyzed for decision making.
- Represents opportunities. 3) Variety
- Presents challenges in terms of data storage - More complicated types of data are now
and processing, security, and available available and are proving to be of great value
analytical talent. to businesses.
● Text data: collected by monitoring what is
being said about a company’s products or
Terabytes to exabytes of existing data
to process services on social media platforms.
Volume
● Audio data: collected from service calls.
Data at ● Video data: collected by in-store video
Rest cameras and used to analyze shopping
behavior.
- Analyzing information generated by these
nontraditional sources is more complicated in
Streaming data, milliseconds to part because of the processing required to
seconds to respond transform the data into a numerical form that
Velocity
can be analyzed.
Data in
Motion 4) Veracity
- How much uncertainty is in the data.
- Inconsistencies in units of measure and the
Structured, unstructured, text, lack of reliability of responses in terms of bias
Variety multimedia also increase the complexity of the data.
Data in
Many
Forms
- Variables: Row 1
- Observations: Row 2 & 3 *Modifying Data in Excel
There is variation within the observations
Creating Distributions from Data
Types of Data
Frequency Distributions for Categorical Data
Population and Sample Data Frequency Distribution
Population: All elements of interest - A summary of data that shows the number
Sample: Subset of the population (frequency) of observations in each of several
● Random Sampling: nonoverlapping classes
- A sampling method to gather a - Typically referred to as bins, when dealing
representative sample of the with distributions (used to be called class)
population data; objective; procedural
- We assume that data is random Raw Data - ungrouped data
Grouped Data - raw data after grouping
Quantitative and Categorical Data
- Data can only be one of the two
Quantitative Data
Data on which numeric and arithmetic operations, such
as addition, subtraction, multiplication, and division,
can be performed.
Categorical Data
Data on which arithmetic operations cannot be
performed
Sample Frequency Distribution Lower limit - Upper limit = Class (bin) Width
(33 - 12) / 5 (number of rows)
Soft Drink Frequency
= 4.2
Coca-cola 19
i) Year-End Audit Times (Days)
Diet Coke 8
12 14 19 18
Dr. Pepper 5 15 15 18 17
Total 32 20 27 22 23
Measures of Variability
Range
Measures of Location Subtracting the smallest value from the largest value in
Mean (Arithmetic Mean) a data set
Average value for a variable - Drawback: Range is based on only 2 of the
- The mean is denoted by x̄ observations; thus is highly influenced by
- n = sample size extreme values.
- x1 = value of variable for x for the 1st
observation Variance
- x2 = value of variable x for the 2nd observation A measure of variability that utilizes all the data
- xi = value of variable x for the ith observation - It is based on the deviation about the mean,
which is the difference between the value of
each observation (xi) and the mean
- The deviations about the mean are squared
Basically: Add all then divide by no. of observations. while computing the variance
Median
Value in the middle when the data are arranged in
ascending order
- Odd = Middle value
- Even = Average of 2 middle values For population variance, replace s² with σ²
Example:
Class size data: 46, 54, 42, 46, 32
x̄ = 44
s=8
Coefficient of Variation = (8/44 x 100)%
= 18.2%
MODIFYING DATA IN EXCEL Steps to determine the nth percentile
1. Arrange data in ascending order
To sort the automobiles by March 2010 sales:
2. Compute
Step 1: Select cells A1:F21
Step 2: Click the Data tab in the Ribbon. - Lp = p/100 (n+1)
Step 3: Click Sort in the Sort & Filter group. ○ p = value to find
Step 4: Select the check box for My data has headers. ○ Lp = pth percentile
Step 5: In the first Sort by dropdown menu, select Sales (March
3. Interpret
2010).
Step 6: In the Order dropdown menu, select Largest to Smallest. - If there is a decimal A (11.05), then the
Step 7: Click OK. result is that it is B (.05, 5%) between
the value in position (11) and (12)
4. The value in the (11th) position is X, the value
To identify the automobile models in Table 2.2 for which sales had in the 12th position is Y
decreased from March 2010 to March 2011:
5. Solve
Step 1: Starting with the original data shown in Figure 2.3, select pth percentile = X + B(Y-X)
cells F1:F21.
Step 2: Click on the Home tab in the Ribbon.
Step 3: Click Conditional Formatting in the Styles group.
Quartiles
Step 4: Select Highlight Cells Rules, and click Less Than from the
dropdown menu. When the data is divided into four equal parts:
Step 5: Enter 0% in the Format cells that are LESS THAN: box. - Each part contains approximately 25% of the
Step 6: Click OK observations.
- Division points are referred to as quartiles.
👉
within 3 standard deviations. variables.
👉
68% 1 - Not affected by the units of measurement for x
👉
95% 2 and y.
all 3
Outliers
- Extreme values in a data set.
- Can be identified using standardized values
(z-scores).
- Any data value with a z-score less than –3 or
greater than +3 is an outlier.
- Such data values can then be reviewed to
determine their accuracy and whether they
belong in the data set.
Covariance
- A descriptive measure of the linear association
between 2 variables
If Population Covariance:
DATA VISUALIZATION - In large tables, vertical lines or light shading
- Reduces the cognitive load, or the amount of can be useful to help the reader differentiate
effort necessary to accurately and efficiently the columns and rows.
process the information being communicated - To highlight the differences among locations,
by a data visualization. the shading could be done for every other row
instead of every other column.
Preattentive attributes – features that can be used in
a data visualization to reduce the cognitive load Alignment
required by the user to interpret the visualization. - Numerical values should be right-aligned
Include attributes such as - Text values should be left-aligned.
● color, size, shape, and length among others.
Crosstabulation
For creating effective tables and charts for data - A tabular summary of data for two variables.
visualization is the idea of the data-ink ratio. - The left and top margin labels define the
classes for the two variables.
- A crosstabulation in Microsoft Excel is known
Data-ink as a PivotTable.
- Ink used in a table or chart that is necessary to
convey the meaning to the audience
CHARTS (graphs)
Non-data-ink - Visual methods for displaying data.
- Ink used in a table or chart that serves no - Examples: scatter charts, line charts,
useful purpose in conveying the data to the sparklines, bar charts, and column charts.
audience
Scatter chart is a graphical presentation of the
relationship between two quantitative variables.
Data-ink ratio
- measures the proportion of ink used in a table Line charts
or chart that is necessary to convey the - Similar to scatter charts, but a line connects
meaning to the audience (known as “data-ink”) the points in the chart.
to the total ink used for the table or chart. - Very useful for time series data collected over
- Low Data-ink ratio: a period of time (minutes, hours, days, years,
○ Labels for axes etc.).
○ Removing unnecessary gridlines - Time series plot A line chart for time series
○ Increasing data-ink ratio by addling data.
labels to axes
○ Removing unnecessary lines and Bar charts and column charts
labels - Provide a graphical summary of categorical
data using the preattentive attribute of length
to convey relative amounts. Very helpful in
TABLES should be used when making comparisons between categorical
1. Specific numerical values. variables.
2. Comparisons between different values and not
just relative comparisons. Bar charts – horizontal bars
3. Values have different units or very different Column charts – vertical bars
magnitudes.
Pie charts
Table Design Principles - used to compare categorical data.
- keep in mind the data-ink ratio and avoid the - Rely on the preattentive attribute of size to
use of unnecessary ink in tables. convey information to the user.
- Avoid using vertical lines in a table unless
they are necessary for clarity.
- Horizontal lines are generally necessary only
for separating column titles from data values
or when indicating that a calculation has taken
place.
For Multiple Variables: Parallel-coordinates Plot
Bubble chart - A helpful chart for examining data with more
- A graphical means of visualizing three than two variables, which includes a different
variables in a two-dimensional graph vertical axis for each variable.
- A preferred alternative to a 3-D graph. - Each observation in the data set is
Scatter-chart matrix represented by drawing a line on the
- Allows the reader to easily see the parallel-coordinates plot connecting each
relationships among multiple variables. vertical axis.
- Each scatter chart in the matrix is created in - The height of the line on each vertical axis
the same manner as for creating a single represents the value taken by that observation
scatter chart. for the variable corresponding to the vertical
- Each column and row in the scatter-chart axis.
matrix corresponds to one categorical variable
Data Mining
The process of discovering patterns and knowledge in
large amounts of data; it focuses on analyzing and
finding insights
Data Mining
- The process of sifting through large volumes
of data to uncover patterns, trends, and
insights that are not immediately obvious.
- Context: It has gained immense popularity due to its
applications in various industries, especially following
high-profile cases like the Cambridge Analytica scandal,
which highlighted the power and potential misuse of data
mining techniques.
Core Techniques in Data Mining D. Anomaly Detection
Purpose – To identify unusual or unexpected patterns
A. Classification that may indicate problems or opportunities.
Purpose – Categorize data into predefined classes
(e.g., “probably pregnant” vs. “probably not pregnant”). Process:
- Anomalies can be detected through statistical
Process: methods that look for deviations from the norm
- Data attributes for each instance are quantified (e.g., unusual spending patterns).
(e.g., purchasing patterns).
- Labels are assigned to the data based on Example:
known outcomes (e.g., baby registries). The IRS and credit card companies use
- Algorithms analyze these labeled examples to anomaly detection to identify potential fraud.
identify patterns.