Pre Processing
Pre Processing
Preprocessing
By
E.Sivasankar
NITT
Binning
Binning methods smooth a sorted data value by consulting
its neighborhood. The sorted values are distributed into a
no. of buckets or bins.
In smoothing by bin means each value of the bin is
replaced by the mean value of the bin.
In smoothing by bin boundaries, the minimum and
maximum values in a given bin are identified as the bin
boundaries. Each bin value is then replaced by the closest
boundary value.
Y1
Y1’ y=x+1
X1 x
3.Data compression
Here encoding mechanisms are used to reduce
the data set size.
4.Discretization and concept hierarchy
generation
Here raw data values for attributes are replaced
by ranges or higher conceptual level.
39
Data Compression
Original Data
Compressed
Data
lossless
Original Data os sy
l
Approximated
X2
Y1
Y2
X1
W O R
SRS le random
i m p ho ut
( s e wi t
l
samp ment)
p l a ce
re
SRSW
R
Raw Data
February 14, 2025 57
Sampling
country
15 distinct values