# COMMON CLUSTERING ALGORITHMS (PART 3)

Jun 02, 2021
Misc

## Mean Shift

##### Steps

- Randomly pick up a data point as centroid.
- Calculate the Euclidean distance between the centroid and other data points.
- If the distance is less than the bandwidth, add the data point to the cluster
*c*._{1}

- If the distance is less than the bandwidth, add the data point to the cluster
- Calculate the average of all data points within the cluster
*c*, and shift the cluster to the average. Update the new centroid._{1}- Repeat until converge.

- If
*c*is overlapped with_{1}*c*, another cluster, only the cluster with more points is preserved._{2} - Repeat until all data points are assigned to a cluster.

##### Pros

- Single parameter (bandwidth)
- Does not assume any prior shape on data clusters
- Robust to outliers

##### Cons

- Output depends on the bandwidth
- Time complexity of
*O(n*^{2})

## OPTICS

Similar to DBSCAN, OPTICS has following parameters:

: The minimum number of points to form a dense region;*MinPts*: The maximum distance between two data points to be considered as a cluster.*Eps (Optional)*

Besides, there are more definitions:

: A point p is a core point if at least MinPts points are found within its Eps-neighborhood;*Core point*: The minimum Eps to make a distinct point a core point;*Core distance*: The reachability distance of a core point*Reachability distance**p*with regards to another data point*o*equals to*max(coreDist(p), dist(o, p))*

##### Steps

- Initialize a ordered list for storing the final sorted points;
- Choose an unprocessed point from the dataset, mark it as processed and put it in the ordered list:
- If it’s a core point, then initiate another priority queue and iterate through all its neighbors:
- For each unprocessed neighbor point, calculate its reachability distance from the core point:
- If it’s not in the priority queue, insert the neighbor point with its reachability distance;
- If it’s already in the priority queue and the new reachability distance is smaller, update the reachability distance of the neighbor point;

- After iterating all unprocessed neighbor points, iterate through the priority queue, mark each point as processed, put it in the ordered list and update its neighbor points’ reachability distance

- For each point
*p*in the ordered list, denote its reachability distance as`r_p`

and its core distance as`c_p`

:

Below is a pseudocode of step 2 from Wikipedia:

##### OPTICS vs DBSCAN

- DBSCAN is sensitive on radius since Eps is fixed, which means it could be difficult to handle unbalanced density;
- OPTICS costs more on memory due to its usage of priority queue.