CLUSTER_NUMBER Function
Computes cluster membership for a hierarchical cluster tree.
Usage
result = CLUSTER_NUMBER(clson, crson, k)
Input Parameters
clson—Vector of length npt – 1 containing the left son cluster numbers, where npt is the number of data points to be clustered. Cluster npt + i is formed by merging clusters clson(i – 1) and crson(i – 1).
crson—Vector of length npt – 1 containing the right son cluster numbers. Cluster npt + i is formed by merging clusters clson(i – 1) and crson(i – 1).
k—Desired number of clusters.
Returned Value
result—Long vector of length npt containing the cluster membership of each observation.
Output Keywords
Obs_per_clusterArray of length K containing the number of observations in each cluster.
Discussion
Given a fixed number of clusters (K) and the cluster tree (vectors crson and clson) produced by the hierarchical clustering algorithm (see CLUSTER_HIERARCHICAL Procedure) CLUSTER_NUMBER determines the cluster membership of each observation. It first determines the root nodes for the K distinct subtrees forming the K clusters and then traverses each subtree to determine the cluster membership of each observation. CLUSTER_NUMBER also returns the number of observations found in each cluster.
Example 1
In the following example, cluster membership for K = 2 clusters is found for the displayed cluster tree. The output vector iclus contains the cluster numbers for each observation.
; Set up the input data
k = 2
iclson = [5, 6, 4, 7]
icrson = [3, 1, 2, 8]
 
; Call the CLUSTER_NUMBER routine
iclus = CLUSTER_NUMBER(iclson, icrson, k, Obs_per_cluster=nclus)
 
PRINT,""
PRINT,"         OUTPUT"
PRINT,"       ----------"
PRINT,""
PRINT,"          iclus"
PRINT,iclus, Format="(5I5)"
Output
         OUTPUT
       ----------
 
          iclus
   1   2   1   2   1
Example 2
This example illustrates the typical usage of CLUSTER_NUMBER. The Fisher iris data is clustered. First the distance between the irises are computed using DISSIMILARITIES. The resulting distance matrix is then clustered using CLUSTER_HIERARCHICAL. The cluster membership for 5 clusters is then obtained via CLUSTER_NUMBER using the output from CLUSTER_HIERARCHICAL. The need for 5 clusters can be obtained either by theoretical means or by examining a cluster tree. The cluster membership for each of the iris observations is printed.
data = STATDATA(3)
k = 5
 
dist = DISSIMILARITIES(data,Index=[1,2,3,4])
 
RANDOMOPT,Set=4
 
FOR i=0L, 149 DO  BEGIN & $
  FOR j=i+1, 149 DO BEGIN & $
    r = RANDOM(1, /Uniform) & $
    dist(i,j) = MAX([0.0, dist(i,j) + .001 * r]) & $
    dist(j,i) = dist(i,j) & $
  ENDFOR & $
  dist(i,i) = 0. & $
ENDFOR
 
CLUSTER_HIERARCHICAL, dist, Clevel=clevel, Clson=clson, $
   Crson=crson
iclus = CLUSTER_NUMBER(clson, crson, k, Obs_per_cluster=nclus)
 
PRINT,""
PRINT,"     OUTPUT"
PRINT,"   ----------"
PRINT,""
PRINT,"      ICLUS"
PRINT,iclus,Format='(5I3)'
PRINT,""
PRINT,"          NCLUS"
PRINT,nclus,Format='(5I4)'
Output
     OUTPUT
   ----------
 
      ICLUS
  5  5  5  5  5
  5  5  5  5  5
  5  5  5  5  5
  5  5  5  5  5
  5  5  5  5  5
  5  5  5  5  5
  5  5  5  5  5
  5  5  5  5  5
  5  5  5  5  5
  5  5  5  5  5
  2  2  2  2  2
  2  2  1  2  2
  1  2  2  2  2
  2  2  2  2  2
  2  2  2  2  2
  2  2  2  2  2
  2  2  2  2  2
  2  2  2  2  2
  2  2  2  1  2
  2  2  2  1  2
  2  2  2  2  2
  2  3  2  2  2
  2  2  2  2  2
  2  2  4  2  2
  2  2  2  2  2
  2  2  2  2  2
  2  4  2  2  2
  2  2  2  2  2
  2  2  2  2  2
  2  2  2  2  2
 
          NCLUS
   4  93   1   2  50