Color Quantization for Image Reduction using K-Means Clustering Algorithm

Keshav Tangri
Analytics Vidhya
Published in
4 min readJul 5, 2020

--

Introduction :

There’s a saying that ‘A picture is worth a thousand words’[1]. In today’s era of technology , we are all surrounded by images. From identifying people from images in official records, to uploading pictures on Instagram, Changing profile picture on whatsapp, facebook, using pictures to understand space galaxies, storing memories etc. , images got us covered from all sides.

The Images actually use a big amount of disk space photos from an event can sum upto a couple of GigaBytes. That’s where Image Compression comes in play. Image compression basically means reducing the disk size of the image without much of the effect on its quality. One such Image compression approach is Color Quantization.

Color Quantization :

Color Quantization is a process in which the number of distinct colours in an image is reduced , so as to make the resulting image similar to the original image but with a reduced size [2]. One such approach to reduce the colors of the image is via K-Means Clustering method. The algorithm is explained below

K-Means Clustering

This algorithm comes in play for unsupervised data (unlabelled data), which means that given some set of points , we would like to find the structure in the data. The algorithm that helps to form the clusters of similar data points is known as clustering algorithm. K-Means Clustering is a technique in which the data points are segregated into K different clusters. Each cluster is uniquely identified by the centroid of the cluster. This algorithm groups similar colour values into K clusters and each pixel value (in the final output image)is replaced by the value of the centroid of the cluster to which it belongs.

K Means cluster the unstructured data(left) into 2 clusters (right) (Image Courtesy : Andre Ng ML Course[3])

After randomly initializing the K centroids, the iterative algorithm of K-Means have mainly 2 steps:

  1. Cluster Assignment : In this we allot the centroid index to the pixel values such that the euclidean distance[2] between the pixel and the centroid is minimum. The distance between 2 points (a1,b1,c1) and (a2,b2,c2) is given by : sqrt((a2-a1)² + (b2-b1)² + (c2-c1)² ) [
  2. Moving Centroids : In this, we calculate the new centroids, by taking the average of all the pixel values that is alloted to a given centroid.

How the size is reduced ?

A coloured image has 3 different channels , i.e Red(R) , Green(G), Blue(B) or RGB and each pixel uses 24 bits (8 bits (unsigned integers fro 0 to 255) each for R,G & B ) to represent a color. In this color quantization technique , say we reduce the number of colours to 32 and thus the size of image can be reduced as we can store only these 32 selected RGB values and thus for each pixel we can store the index of these colour which will take only 5 (2⁵ = 32)bits instead 24 bits. After this, we will replace the pixel values with the 32 selected colors [3].

In depth explanation of size reduction is beautifully presented by Satyam Kumar in his story[4].

Results and Conclusion

Now we will see the outcome of running the K-Means Algorithm. See the below Images when compressed using K-Means with number of colours ranging as 2,4,8,16,32 and 64. The Implementation for Color Quantization using K-Means is given here. I will discuss the implementation in the next story where how to implement it from scratch in python will be discussed and in depth explanation will be discussed. If you like the story , do check out my LinkedIn.

Original Image (Image Courtesy : apprentice_fotografo)
Number of Clusters (K) = 2
Number of Clusters (K) = 4
Number of Clusters (K) = 8
Number of Clusters (K) = 16
Number of Clusters (K) = 32
Number of Clusters (K) = 64

References

[1] https://en.wikipedia.org/wiki/A_picture_is_worth_a_thousand_words

[2] https://en.wikipedia.org/wiki/Color_quantization

[3] https://coursera.org/share/f5010265b57b1dd841eb6dde1c554e4b

[4]https://towardsdatascience.com/image-compression-using-k-means-clustering-aa0c91bb0eeb

--

--

Keshav Tangri
Analytics Vidhya

Deep Learning Enthusiast, Full Stack Native Android App Dev + Web Developer, Software Developer - Python