0 ratings0% found this document useful (0 votes) 13 views8 pagesDMBI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
novel, potentially useful, and ultimately
tandable patterns in data.
Boal is to distinguish between unprocessed data
ng that may not be obvious but is valuable or
he overall process of finding and interpreting patterns
om data involves the repeated app! of the
Syllabus w.e- academic year 21-22)(M6-103)
representations for the data is found.
5. Data Mining
‘An essential process where intelligent methods are
applied to extract data patterns.
Deciding which model and parameter may be
appropriate.
6. Pattern Evaluation
To identify the truly interesting patterns representing
knowledge based on interesting measures.
7. Knowledge Presentation
‘Visualization and knowledge representation techniques
are used to present mined knowledge to users.
Visualizations can be in form of graphs, charts or table.
Tech-Neo Publications... SACHIN SHAH VentureBin 4: 45, 45, 71, 72, 73, 75
the data by bin mean
We take average of each bin and replace.
value by mean value in corresponding bin. :
e Bin 1: 13.83, 13.83, 13.83, 13.83, 13.83, 13,
© Bin 2: 20.16, 20.16, 20.16, 20.16, 20.16, 20.16
© Bin 3: 30.67, 30.67, 30.67, 30.67, 30.67, 30.67
* Bin 4: 63.50, 63.50, 63.50, 63.50, 63.50, 63.50
Smooth the data by bin median
We replace each value in the bin by its corres
median value. Each bin contains 6 data values. SO
average of two middle valu
itas median,
ies in corresponding bin
Bin 1: 14, 14,14, 14, 14, 14
Bin 2: 20, 20, 20, 20, 20, 20
Bin 3: 27, 27, 27,27, 27,27
Bin 4:
i, in 4: 71.50, 71.50, 71.50, 71.50, 71.50, 71.50
Tech-Neo Publications... SACHIN SHAH
sealysis by Clustering
salary of college
(1826)Fig. 2.6.1 : Outlier Ani
&. 2.6.3: Develop a model to
Tegression.methods.
"
Data preprocessing is a data mining technique that
involves transforming raw data into an understandable
format. Real-world
inconsistent, lacking in certain bel
data is often incomplete,
Vviors or trends, and
is likely to contain many errors.
Data preprocessing is a proven method of resolving
such issues. Data preprocessing prepares raw data for
further processing
Data
applications such as customer relationship management
Preprocessing is used in database-driven
and rule-based applications (like neural networks)
ML)
Preprocessing is critical to encode the dataset in a form |
In Machine Leaming Processes, data |
that could be interpreted and parsed by the algorithm.
Data goes througi during
a series of steps
(1) Data Cleaning 2) Data Inte,
ration
(3) Data Transformation (4) Data Reduction
5)_Data Discretization (6) _Data Sampling
(2) Data Cleaning : Data
Processes such as filling in missing values or deletis
is cleansed through
rows with missing data, smoothing the noisy data, or
resolving the inconsistencies in the data. Smoothing
sld-New Syllabus we academic year 21-22)(M6-103)
place, and all the dependencies are logical,
(4) Data Reduction : When the volume of
hhuge, databases can become slower, costly to a¢o
and challenging to properly store. Data reduction
nt a reduced representation of the data
data warehouse. There are various methods to
data. For example, once a subset of relevant attri
is chosen for its significance, anything below a
level is discarded, Encoding mechanisms can be
to reduce the size of data as well. If all original d
labelled as lossless. If some data is lost, then
a lossy reduction,
aims to pres
Aggregation can also be usel
example, to condense
countless. transactions.
Single weekly or monthly value. significantly re¢
Data could My '
discretized to replace raw values with interval |
This step involves the reduction of a nuniber 6 %
of & continuous attribute by dividing the
trios ae 3
(6) Data Sampling : Sometimes, due Na
Storage OF memory constraints, a dataset is t
the number of data objects.
(5) Data Discretization :
‘0 complex to be worked with. Sampling &
can be used to select and work with just a subf ‘
dataset 4
\\
\
Provided that it has approximately
Properties of the original one,
Le rech-neo Publications. SACHINdimensional table itself in a star schema.
When the dimensional table contains less | When. dimensional. table
‘number of rows, we can go for Star schema, number of rows with
information and space is
we can choose snowflake
store space.
Work best in any data warehouse/ data mart. | Better for small data ward
mart. 7
6.4 Factless Fact Table
data warehouse factless fact table is a fact that does not have any measures stored in it. This table will
s from different dimension tables. The fac
are two types of factless fact tables:
Event capturing factless fact
Coverage table — Describing condition