ML Assignment
ML Assignment
Lossy compression:
K- means clustering actually comes under lossy compression category.
Lossy compression is the method of compression that eliminates the data which is not
noticeable. To give the photo an even smaller size, lossy compression discards some
parts of a photo which are less important. The compressed file cannot be restored in its
exact original form. In this type of compression data quality is compromised and the size
of data changes. Lossy compression is used mainly for images, audio and, video
compression.
We will be using the K-Means Clustering technique for image compression which is a
type of Transform method of compression. Using K-means clustering, we will perform
quantization of colors present in the image which will further help in compressing the
image.
The initial dimension of an image is 750*1000 pixels. For each pixel, the image has 3-
dimension representing RGB intensity values. The RGB intensity values range from 0 to
255. Since intensity value has 256 values (2**8), so the storage required to store each
pixel value is 3*8 bits.
Finally, the initial size of the image is (750*1000*3*8) bits.
Total number of color combination equals (256*256*256) ( equal to 16,777,216). As the
human eye is not able to perceive so many numbers of colors at once, so the idea is to
club similar colors together and use fewer colors to represent the image.
We will be using k-Means clustering to find k number of colors which will be
representative of its similar colors. These k-colors will be centroid points from the
algorithm. Then we will replace each pixel value with its centroid points. The color
combination formed using only k values will be very less compared to the total color
combination. We will try different values of k and observe the output image.
If k=64 then the final size of the output image will be (750*1000*6 + 64*3*8) bits, as the
intensity value ranges to 2**6.
If k=128 then the final size of the output image will be (750*1000*7 + 128*3*8) bits, as
the intensity value ranges to 2**7.
Hence it is observed that the final size of the image is reduced to a great extent from the
original image.
ALGORITHM CODE:
from skimage import io
from sklearn.cluster import KMeans
import numpy as np
Code Explanation:
Image Input (lines 6–8): Load the image from the disk.
Reshape Input Image (line 15): The size of the input image is (rows, cols, 3),
flatten all the pixel values to a single dimension of size (rows*cols) and the
dimension of each pixel is 3 representing RGB values. The size of the flatten
image will be (rows*cols, 3).
Clustering (lines 18–19): Implement the k-Means clustering algorithm to find k-
centroid points that represent its surrounding color combination.
Replace each pixel with its centroid points (lines 22–23): All the color
combination of (rows*cols) number of pixels is now represented by its centroid
points. Replace the value of each of the pixels with its centroid point.
Reshape Compressed Image (line 26): Reshape the compressed image of
(rows*cols, 3) dimensions to original (rows, cols, 3) dimensions.
Output Compressed Image (lines 29–31): Display the output image and store it to
disk.
Results and Conclusion:
Dimensions of all the compressed images (compressed_image_k.png) are the
same as that of input images (original_image.png).
The size of the compressed image decreases as k decreases.
For the value of k=32, 64, 128, 256, the output compressed images seem
reasonably good and lose colors, and are not visible to a human eye. The size of
the compressed image decreases by almost 3 times compared to the original
image for k=32.
For the value of k=16, 8, the output compressed images lose a lot of colors and
the lossy compression is visible to a human eye.
For the value of k=4, the output compressed images lose almost all the colors,
and some content of the image is also lost.
Thus, we can easily compress an image using K-means clustering algorithm by selecting
the number of clusters. The number of colors present in the compressed image depends
on the number of clusters selected.