0% found this document useful (0 votes)
16 views

Data Mining and Data Warehousing

This document provides an introduction to data mining. It defines key terms like data, information, knowledge, data mining, and knowledge discovery in databases. It describes common data mining techniques and tools. It also lists several applications of data mining in areas like marketing, healthcare, finance, and manufacturing. Finally, it discusses some popular data mining tools like RapidMiner and WEKA.

Uploaded by

aarthi dev
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Data Mining and Data Warehousing

This document provides an introduction to data mining. It defines key terms like data, information, knowledge, data mining, and knowledge discovery in databases. It describes common data mining techniques and tools. It also lists several applications of data mining in areas like marketing, healthcare, finance, and manufacturing. Finally, it discusses some popular data mining tools like RapidMiner and WEKA.

Uploaded by

aarthi dev
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Data mining and Data

Warehousing
Unit – I Introduction to Datamining

Handled by
Mrs. E.Aarthi , Department of Computer Science.
Introduction
The major reason that data mining has attracted a great deal of
attention in information industry in recent years.

Due to the wide availability of huge amounts of data and the imminent
need for turning such data into useful information and knowledge.
Introduction
Data mining has numerous applications across various industries, including
marketing, healthcare, finance, and manufacturing.
In marketing, data mining is used to identify customer preferences and behaviors,
and to develop targeted marketing campaigns.
In healthcare, data mining is used to analyze patient data and develop
personalized treatment plans. In finance, data mining is used to detect fraudulent
transactions and assess credit risk.
Introduction
commonly used techniques in data mining include clustering, classification,
regression, and association rule mining

Data mining can be performed using a variety of tools and software packages,
including Python, R, SAS, and Tableau.
Data Storage
Data storage became easier as the availability of large amounts of computing
power at low cost.

The cost of processing power and storage is falling, made data cheap.

Introduction of new machine learning methods for knowledge representation


based on logic programming etc. in addition to traditional statistical analysis of
data.
Data Storage

The data storage bits/bytes are calculated as follows:


 1 byte = 8 bits
 1 kilobyte (K/KB) = 2 ^ 10 bytes = 1,024 bytes
 1 megabyte (M/MB) = 2 ^ 20 bytes = 1,048,576 bytes
 1 gigabyte (G/GB) = 2 ^ 30 bytes = 1,073,741,824 bytes
 1 terabyte (T/TB) = 2 ^ 40 bytes = 1,099,511,627,776 bytes
 1 petabyte (P/PB) = 2 ^ 50 bytes = 1,125,899,906,842,624 bytes
 1 exabyte (E/EB) = 2 ^ 60 bytes = 1,152,921,504,606,846,976 bytes
 1 zettabyte (Z/ZB) =1 000 000 000 000 000 000 000 bytes
 1 yottabyte (Y/YB) =1 000 000 000 000 000 000 000 000 bytes
Information usages
Business operations
Decision Making
OLTP
OLAP
Data Mining – Data archeology, Data harvesting
Knowledge Discovery in Databases(KDD)
Data Mining Definitions

Data
Data are the raw facts, figures, numbers, or text that can be
processed by a computer.
Today, organizations are gathering massive and growing amounts of
data in different formats and different databases.
The operational or transactional data contains the day-to-day
operation data (such inventory data, on-line shopping data), non-
operational data, and metadata i.e. data about data.
Information

The arrangements, relations, or associations among all


types of data can deliver information. Which products
are selling when are based upon the analysis of sales
transactions by considering a retail idea.
Knowledge
Information can be converted into knowledge.
Data together in large data repositories develop “data
tombs”.
 Data tombs are converted into “golden nuggets” of
knowledge with the use of data mining tools .
Golden nuggets mean “small but valuable facts”.
Supermarket sales information can be analyzed because of
marketing efforts to deliver knowledge of consumer
purchasing habits.
Data Mining definition
Extraction of interesting information or patterns from
data in large databases is known as data mining
Data Mining Definitions
According to William J. Frawley, Gregory Piatetsky-Shapiro
and Christopher J. Matheus ‘Data Mining, or Knowledge
Discovery in Databases (KDD) is the nontrivial extraction of
implicit, previously unknown, and potentially useful
information from data. This encompasses a number of
different technical approaches, such as clustering, data
summarization, learning classification rules, finding
dependency networks, analyzing changes, and detecting
anomalies’
Data Mining Definitions
According to Marcel Holshemier and Arno Siebes “Data mining
is the search for relationships and global patterns that exist in large
databases but are ‘hidden’ among the vast amount of data, such as
a relationship between patient data and their medical diagnosis.
These relationships represent valuable knowledge about the
database and the objects in the database and, if the database is a
faithful mirror, of the real world registered by the database”.

Data Mining
• Data mining refers to “using a variety of techniques to identify
nuggets of information or decision-making knowledge in bodies of
data, and extracting these in such a way that they can be put to use in
the areas such as decision support, prediction, forecasting and
estimation.
• The data is often voluminous, but as it stands of low value as no direct
use can be made of it; it is the hidden information in the data that is
useful
• Many people treat data mining as a synonym for another popularly
used term, Knowledge Discovery in Databases", or KDD.
• Data mining is also called as mining of knowledge from data,
extraction of knowledge, data/arrangement analysis
KDD
Data mining is the process of discovering patterns,
trends, and insights from large datasets. It is an
interdisciplinary field that draws upon techniques from
statistics, machine learning, database management, and
other areas to extract useful information from data.
Essential step in the process of knowledge discovery in
databases.
KDD
Knowledge discovery as a process is depicted in following figure and consists of an iterative sequence of the following steps:

 Data cleaning: to remove noise or irrelevant data


 Data integration: where multiple data sources may be combined
 Data selection: where data relevant to the analysis task are retrieved from the database
 Data transformation: where data are transformed or consolidated into forms appropriate for mining by performing
summary or aggregation operations
 Data mining: an essential process where intelligent methods are applied in order to extract data patterns
 Pattern evaluation to identify the truly interesting patterns representing knowledge based on some interestingness measures
 Knowledge presentation: where visualization and knowledge representation techniques are used to present the mined
knowledge to the user.
DATA MINING TOOLS
RAPID MINER
RapidMiner is an integrated enterprise artificial intelligence
framework that offers AI solutions to positively impact businesses.
It is used as a data science software platform for data extraction, data
mining, deep learning, machine learning, and predictive analytics.
It is widely used in many businesses and commercial applications as
well as in various other fields such as research, training, education,
rapid prototyping, and application development.
 All major machine learning processes such as data preparation, model
validation, results from visualization, and optimization can be carried
out by using RapidMiner.
RAPID MINER PRODUCTS
RAPID MINER STUDIO
RAPID MIINER AUTO MODEL
RAPID MINER TURBO PREP
WEKA
Weka is data mining software that uses a collection of machine learning
algorithms. These algorithms can be applied directly to the data or called from the
Java code

.The algorithms can either be applied directly to a dataset or called from your own
Java code

. Weka contains tools for data pre-processing, classification, regression,


clustering, association rules, and visualization. It is also well-suited for
developing new machine learning schemes.
WEKA an open-source software that provides tools for data preprocessing,
implementation of several Machine Learning algorithms, and visualization tools
so that you can develop machine learning techniques and apply them to real-world
data mining problems
APPLICATIONS OF DATA MINING
Here is the list of areas where data mining is widely used
 Financial Data Analysis
 Retail Industry
 Telecommunication Industry
 Biological Data Analysis
 Other Scientific Applications
 Intrusion Detection
 Social media app mining.

You might also like