0% found this document useful (0 votes)
256 views

Introduction - Ch.1: Data Mining For Business Analytics in R

The document provides an introduction to data mining for business analytics using R. It discusses key concepts like business analytics, data mining, big data, data science, machine learning methods, and terminologies. It also provides examples of how different organizations use analytics and discusses some differences between statistics and machine learning.

Uploaded by

hasan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
256 views

Introduction - Ch.1: Data Mining For Business Analytics in R

The document provides an introduction to data mining for business analytics using R. It discusses key concepts like business analytics, data mining, big data, data science, machine learning methods, and terminologies. It also provides examples of how different organizations use analytics and discusses some differences between statistics and machine learning.

Uploaded by

hasan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Introduction – Ch.

Data Mining for Business Analytics in R


Shmueli, Bruce, Yahav, Patel & Lichtendahl

© Galit Shmueli and Peter Bruce 2017 (rev. Sep 10 2019)


ML and AI
Sectors Adopting AI
Business Analytics
⚫ BA: the practice and art of bringing quantitative data to
bear on decision making.
⚫ Example: Washington Post
⚫ tracking reading time, location, etc.
⚫ BI: data visualization and reporting
⚫ what happened (descriptive analytics)
⚫ BA includes BI as well as statistical models and data
mining algorithms to
⚫ explore data
⚫ explain the relationships between measurements
⚫ predict/forecast future values
BA Example
⚫ Credit scoring
⚫ predictive modeling
⚫ evaluate credit trust-worthiness thru prior data and
other factors

⚫ Future purchases
⚫ association rule
⚫ after finishing work, stop by Wal-Mart to buy diaper
and beer
Data Mining
⚫ BA methods that focus more on advanced data
analytics
⚫ statistical and machine learning methods
Data Mining - 2

⚫ It comes in different terms:


⚫ predictive modeling
⚫ predictive analytics
⚫ machine learning
⚫ DM is at the confluence of the fields of statistics and
machine learning
⚫ Difference?
⚫ STAT: inferences from a sample of population
⚫ ML: predications at an individual level
Big Data

⚫ A large set of data that is often characterized with four


Vs:
⚫ Volume: the amount of data
⚫ Velocity: the speed at which data is being generated
and changed.
⚫ Variety: the different types of data being generated
⚫ Veracity: the data is being generated by organic
distributed process
Big Data - 2
⚫ Challenge VS Opportunities
⚫ consider 15 variables with 5000 records - the size of a
dot, used for traditional statistical analysis
⚫ While big data generated by Wal-Mart, the size of
football field

⚫Using big data:


⚫ Telenor – a Norwegian mobile phone service company
reduced turnover by 37%
⚫ Allstate - a US insurance company increases its
prediction accuracy for injury liability thru vehicle type
BD funny example
Data Science
⚫ Data science is a mix of skills in:
⚫ statistics
⚫ machine learning (programming)
⚫ business

⚫Questions to ask:
⚫ Which methods to use
⚫ problem
⚫ data
⚫ How methods work
⚫ Requirements, strength and weaknesses
⚫ How to assess performance
ML Methods

⚫ Why different methods coexist:


⚫ advantages vs disadvantages
⚫ Usefulness of a method:
⚫ size of the dataset
⚫ the types of patterns exist in data
⚫ how noisy the data
⚫ the particular goal of analysis

⚫The norm is to use several different methods and select the


most useful method for the goal specified and the one
produces a better prediction accuracy
Terminologies
⚫ ML vs STAT:
⚫ Output (target) variable vs dependent (response)
variable
⚫ Algorithm: a specific procedure used to implement a
particular data mining
⚫Observation - case, instance, sample, example, record,
pattern, row
⚫ Predictor – feature, input variable, independent
variable, field, attribute
⚫ Response: dependent variable, output variable, target
variable, outcome variable
Road maps to the book
Q1: One difference between statistics and
machine learning is that:
A. STAT uses large datasets, while ML use traditional
datasets

B. ML needs higher computation power, while STAT works


with regular power.

C. STAT produce outcomes at an individual level, while


ML focuses on the average effect of population sample
Q2: When a large volume of data gets produced
from an authentic source, we call it:

A. Velocity

B. Variety

C. Veracity
Q3: In ML learning, we call observation as:

A. Sample

B. Feature

C. Attribute

You might also like