Cluster Analysis
Data Analysis | Data Mining
Usage:
Identify groups of similar items or patterns within a dataset. It is also useful for unsupervised machine learning.
Goal:
Partition a set of data points into meaningful and homogeneous subgroups (clusters).
Objects within the same cluster are more similar to each other than to those in other clusters.
Objectives:
Grouping similar data points – data points are grouped based on their similarity or proximity in the feature space.
Discovering inherent patterns – clustering helps reveal inherent structure or patterns in the data that might not be immediately apparent.
Reducing complexity – clustering can simplify the analysis by grouping similar objects together, making it easier to understand and interpret comple datasets.
Methods/Algorithms:
K-means
It divides the dataset into K clusters, with each cluster represneted by its centroid.
The sum of the squared distances between the objects and their assigned cluster mean is minimised.
Optimisation process is to find the best set of centroids that minimises the sum of squred distances between each data point and its closest centroid. The process is repeated multiple times until convergence, resulting in the optimal clustering solution.
Elbow method: It helps to find the optimum number of clusters in the K-Means algorithm.
—————
Python code sample
—————
Hierarchical Clustering
It builds a hierarchical tree-like structure of clusters either from the bottom up (agglomerative) or from the top down (divisive) to represent the relationships between clusters.
—————
Python code sample
—————
DBSCAN (Density-based Spatial Clustering of Applications with Noise) – clusters data points based on their density within the feature space.
Mean shift – shifts data points towards the mode of the data distribution to find dense regions.
Agglomerative Clustering – merges the closest data points or clusters until a stopping criterion is met.
[Promotion]
To learn more about clustering and other machine learning algorithms (both supervised and unsupervised) check out the following courses-
Machine Learning Certification Course for Beginners
Applied Machine Learning Course
Certified AI & ML Blackbelt+ Program