K-Means Clustering is one of the algorithms that solves the well-known clustering problem. It follows a simple way to classify data into clusters; basic approach is to define K centroids for each cluster. It is an algorithm which has adapted to many problem domains. It is used for unlabelled or uncategorized data.
The goal of the algorithm is to find groups of data and number of groups are represented by variable K. The concept of classification is used in supervised learning whereas clustering is used in unsupervised learning where we don't have any information about a dependent variable.
In the algorithm, we try to group similar data while keeping the groups as far from each other as possible. It is an iterative process of clustering which keeps iterating until it reaches the best solution.
In a class of globular cluster, it produces better and tighter clusters. For a small value of K, it computes at a faster rate as for a large number of variable. If the variables are huge, then the algorithm computationally works faster than hierarchical clustering. This is a versatile algorithm and can be used for any type of grouping whereas it has certain drawbacks such as if the initial partition is different then the result might also produce different final cluster. It also works poorly with clusters of different size and density.
This algorithm is used by most of the search engines such as Google, Yahoo which clusters web pages on the basis of similarity between the category, by reducing the computational time for users. It is also used in Python and R.
K-means clustering: how it works