A cluster refers to a set of instances or data-points. HC can either be agglomerative (bottom-up approach) or divisive (top-down approach). The distance between each instance is calculated using some dissimilarity function. The distance between clusters is calculated using some linkage criterion. Each step of HC produces a new cluster-set, i.e., a set of clusters, from the cluster-set of the previous step.
- Supports the following linkage criteria, used to consider the dissimilarity between clusters:
- Complete (farthest neighbor), average (UPGMA), centroid, minimum energy, single (nearest neighbor), Ward’s minimum variance method.
- Provides the following external clustering evaluation criteria, used to evaluate the quality of a given cluster-set when each data-point has associated a certain label / class:
- Purity, normalized mutual information, accuracy, precision, recall, F-measure.
- Provides the following internal clustering evaluation criteria, used to select the optimal number of clusters when no ground truth is available:
- Silhouette coefficient, Dunn index, Davies-Bouldin index, Calinski-Harabasz index, modified Gamma statistic, Xie-Beni index, within-between ratio, I-index, Xu index, RMSSD, R-squared.
- CSV export
- To export the result of clustering to a comma-separated values (CSV) file.
- D3.js export
- Export the result of clustering to a Json file that contains the hierarchical structure of the clustering procedure that can be loaded into Dendrogram Viewer to produce a dendrogram.