0% found this document useful (0 votes)
22 views

DM 01 Introduction

This document provides an introduction to data mining. It defines data mining as the process of discovering patterns in large data sets. The goals of data mining are to extract useful information and knowledge from large amounts of data. It notes that while there is a lot of data available, extracting meaningful and useful information from it can be challenging. The document then discusses different data mining techniques for discovering various types of patterns, such as classification, clustering, association rule learning, and outlier detection. It also covers considerations for what makes a pattern interesting and the importance of data preparation and selection for successful knowledge discovery.

Uploaded by

Andra Nugraha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

DM 01 Introduction

This document provides an introduction to data mining. It defines data mining as the process of discovering patterns in large data sets. The goals of data mining are to extract useful information and knowledge from large amounts of data. It notes that while there is a lot of data available, extracting meaningful and useful information from it can be challenging. The document then discusses different data mining techniques for discovering various types of patterns, such as classification, clustering, association rule learning, and outlier detection. It also covers considerations for what makes a pattern interesting and the importance of data preparation and selection for successful knowledge discovery.

Uploaded by

Andra Nugraha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Introduction - #1

PAC355 (3 sks)
DATA MINING

Nurdin Bahtiar, MT
1.1 What Motivated Data Mining?
Why Is It Important? (cont’d)

 We are data rich, but information poor.


 “We are drowning in information, but
starving for knowledge” (John Naisbitt)
1.2 So, What Is Data Mining?

Data mining:
Searching for knowledge (interesting patterns)
in your data.

 Simply stated, data mining refers to extracting or “mining”


knowledge from large amounts of data.
 “Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge from huge
amount of data”
 Alternative names:
Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data dredging,
information harvesting, business intelligence, etc.
1.2 So, What Is Data Mining?

 Data mining as a step in the process


of knowledge discovery.
1.2 So, What Is Data Mining?

Knowledge discovery as a process is consists of an iterative sequence of


the following steps:
1. Data cleaning (to remove noise and inconsistent data)
2. Data integration (where multiple data sources may be combined)
3. Data selection (where data relevant to the analysis task are retrieved
from the database)
4. Data transformation (where data are transformed or consolidated
into forms appropriate for mining by performing summary or
aggregation operations, for instance)
5. Data mining (an essential process where intelligent methods are
applied in order to extract data patterns)
6. Pattern evaluation (to identify the truly interesting patterns
representing knowledge based on some interestingness measures)
7. Knowledge presentation (where visualization and knowledge
representation techniques are used to present the mined knowledge to
the user).
1.2 So, What Is Data Mining?

Architecture of a typical data mining system.


1.3 Data Mining – On What Kind of Data?

 In principle, data mining should be applicable to any kind of


data repository, as well as to transient data, such as data
streams.

 Thus the scope of our examination of data repositories will


include relational databases, data warehouses, transactional
databases, advanced database systems, flat files, data
streams, and the World Wide Web.
1.4 Data Mining Functionalities - What
Kinds of Patterns Can Be Mined?

 In general, data mining tasks can be classified into two categories:


descriptive and predictive.
1. Descriptive mining tasks characterize the general properties of
the data in the database.
2. Predictive mining tasks perform inference on the current data in
order to make predictions.
 Data mining functionalities and the kinds of patterns they can
discover, are:
(1) Characterization and Discrimination,
(2) Mining Frequent Patterns, Associations, and Correlations,
(3) Classification and Prediction,
(4) Cluster Analysis,
(5) Outlier Analysis,
(6) Evolution Analysis.
1.5 Are All of the Patterns Interesting?

 A data mining system has the potential to generate thousands


or even millions of patterns, or rules.
 “are all of the patterns interesting?”
Typically not - only a small fraction of the patterns potentially
generated would actually be of interest to any given user.
 “What makes a pattern interesting?
Can a data mining system generate all of the interesting
patterns? Can a data mining system generate only interesting
patterns?”
1.5 Are All of the Patterns Interesting?

 A pattern is interesting if it is:


(1) Easily understood by humans,
(2) Valid on new or test data with some degree of certainty,
(3) Potentially useful, and
(4) Novel.
 A pattern is also interesting if it validates a hypothesis that the user
sought to confirm. An interesting pattern represents knowledge.
1.6 Classification of Data Mining Systems

Data mining systems can be categorized according to various criteria, as


follows:

 Kinds of databases mined


Relational, data warehouse, transactional, stream, object-
oriented/relational, active, spatial, time-series, text, multi-media,
heterogeneous, legacy, WWW.
 Kinds of knowledge mined
Characterization, discrimination, association, classification,
clustering, trend/deviation, outlier analysis, etc.
 Kinds of techniques utilized
Machine learning, statistics, visualization, etc.
 Applications adapted
Retail, telecommunication, banking, fraud analysis, bio-data mining,
stock market analysis, text mining, web mining, etc.
1.6 Classification of Data Mining Systems

Data mining as a confluence of multiple disciplines.


Berbagai Penerapan Data Mining

Analisa pasar dan manajemen


Menebak target pasar, melihat pola beli pemakai dari waktu ke waktu, Cross Market Analysis,
Profil Customer, Identifikasi Kebutuhan Customer, Menilai loyalitas customer, Informasi
summary
Analisa Perusahaan dan Manajemen Resiko
Merencanakan Keuangan dan Evaluasi Aset, Merencanakan Sumber Daya (Resource Planning),
Memonitor Persaingan (Competition)
Telekomunikasi
Digunakan untuk melihat jutaan transaksi yang masuk dengan tujuan menambah layanan
otomatis.
Keuangan
Data mining digunakan untuk mendeteksi transaksi-transaksi keuangan yang mencurigakan yg
susah dilakukan jika menggunakan analisis standar.
Asuransi
Australian Health Insurance Commision menggunakan data mining untuk mengidentifikasi
layanan kesehatan dan berhasil menghemat satu juta dollar pertahun.
Olahraga
IBM Advanced Scout menggunakan data mining untuk menganalisis statistik permainan NBA
dalam rangka competitive advantage untuk tim New York Knicks.
Astronomi
Jet Propulsion Laboratory (JPL) di Pasadena dan Pulomar Observatory menemukan 22 quasar
dengan bantuan data mining.
End of File

You might also like