0% found this document useful (0 votes)
169 views

Data Normalization

Data normalization is a process that scales attribute values within a smaller range to make data easier to analyze and understand. It is necessary when attributes have values on different scales, which can dilute the effectiveness of important attributes or lead to poor data models. There are several normalization methods, including decimal scaling, min-max normalization, and z-score normalization, which rescale values based on mean and standard deviation. Normalization transforms data to fall within a common range and can improve performance of machine learning algorithms.

Uploaded by

Ruchira Saha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
169 views

Data Normalization

Data normalization is a process that scales attribute values within a smaller range to make data easier to analyze and understand. It is necessary when attributes have values on different scales, which can dilute the effectiveness of important attributes or lead to poor data models. There are several normalization methods, including decimal scaling, min-max normalization, and z-score normalization, which rescale values based on mean and standard deviation. Normalization transforms data to fall within a common range and can improve performance of machine learning algorithms.

Uploaded by

Ruchira Saha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Data Normalization

Data Normalization
Data normalization makes data easier to classify and understand. It is used to
scale the data of an attribute so that it falls in a smaller range

Need of Normalization?
• Normalization is generally required when multiple attributes are there but attributes
have values on different scales, this may lead to poor data models while performing
data mining operations.
• Otherwise, it may lead to a dilution in effectiveness of an important equally important
attribute(on lower scale) because of other attribute having values on larger scale.
• Heterogenous data with different units usually needs to be normalized. Otherwise, data
has the same unit and same order of magnitude it might not be necessary with
normalization.
• Unless normalized at pre-processing, variables with disparate ranges or varying
precision acquire different driving values.
2. Data Transformation: Data Normalization contd..
Example

Chart for Raw Data

Chart for Normalized Data


2. Data Transformation: Data Normalization contd..
Methods of Data Normalization:
a. Decimal Scaling
b. Min-Max Normalization
c. z-Score Normalization(zero-mean Normalization)

There are several approaches in normalisation which can be used in


deep learning models.

Batch Normalization
Layer Normalization
Group Normalization
Instance Normalization
Weight Normalization
2. Data Transformation: Data Normalization contd..
a. Decimal Scaling Normalization
- It normalizes by moving the decimal point of values of the data.
- To normalize the data by this technique, we divide each data value by the
maximum absolute value of data set.
- The data value, vi, of data is normalized to v'i by using the formula

[where j is the smallest integer such that max(|v'i|)<1.]

In this technique, the computation is generally scaled in terms of decimals. It means that the
result is generally scaled by multiplying or dividing it with pow(10,k).

Example:
- Normalize the input data is: - 15, 121, 201, 421, 561, 601, 850
- Step 1: Maximum value in given data(m): 850 and hence maximum absolute value is
1000
- Step 2: Divide the given data by 1000 (i.e j=3)
2. Data Transformation: Data Normalization contd..
b. Min-Max Normalization (Linear Transformation)
- Minimum and maximum value from data is fetched and each value is
replaced according to the following formula.

Where - A is the attribute data(col)


- v and v’ is the old and new value of each entry in data
- min(A), max(A) are the minimum and maximum of A
- new_max(A), new_min(A) is the max and min value of the
required range(i.e boundary value) respectively.

Example
Input:- 10, 15, 50, 60
Normalized to range 0 to 1.
Here min=10, max= 60, new_min=0, new_max=1
Output:- 0, 0.1, 0.8, 1
2. Data Transformation: Data Normalization contd..
c. z-Score Normalization (zero-mean Normalization)
- Values are normalized based on mean and standard deviation of the data A.
- It is also called Standard Deviation method.
- Unstructured data can be normalized using z-score parameter,

where - - : mean
- S is the standard deviation.
- v and v’ is the old and new value of each data

Input:- 10, 15, 50, 60


n
1
mean  x 
n x
i 1
i  33.75
2

Output:-
SD 0.9515,
 (Xi  X )
x  0.7512, 0.6510, 1.0517
n 1

You might also like