CLUSTBOX Function
Standard Library function that partitions a data set into boxes.
enabled.
Usage
clst = CLUSTBOX(data)
Input Parameters
data—An (m, n) array of m points in n-space.
Returned Value
clst—An (m, n) index array. A box containing the data is partitioned into uniform boxes labeled b(i1, ..., in), where index ij ranges from 0 to the number of partitions along dimension j. The point data(i, *) is contained in the box indexed by clst (i, *).
Keywords
Rnge—(Input and Output) An (2, n) array specifying a range for each dimension. The default range is the same as that of the data. If used as an input, Rnge must encompass the default range.
Nbxs—(Input and Output) An n-element integer vector designating the number of boxes along each dimension of the range. If undefined on input then each element is set to the same number m, where m is the largest power of two yielding data in each box. Irregular data domains are accommodated by admitting empty boxes but ensuring some degree of connectivity in the nonempty boxes (see the Irrg and Kcon keywords). The power of two solution can be refined using the Fine, Bsqr, and Refi keywords.
Nmin—If Nbxs is undefined on input then Nmin specifies a required minimum number of data points in each nonempty box. Default is 1.
Axes—(Output) An n-element list containing the partition coordinate vector for each dimension. The axes demarcate the box boundaries.
Fine—If set to any scalar and Nbxs is undefined on input, the default power of two solution Nbxs(*)=2p is refined with a binary search that attempts to find the largest integer q such that the partition Nbxs(*)=q has at least Nmin points per nonempty box. By default this keyword is not set.
Bsqr—If set to any scalar and Nbxs is undefined on input then the solution for Nbxs requires that the number of boxes along each dimension is proportional to the extent of the data in that dimension: Nbxs = q * (Rnge(1,*) – Rnge(0,*))/max(Rnge(1,*) – Rnge(0,*)) where q is the largest integer obtained in the search for the finest partition. By default this keyword is not set.
Refi—A 2-element integer array. Refi invokes keyword Fine which produces a partition defined by integer q=max(Nbxs). Finer partitions are sought by incrementing the current solution r (initially r = q) in steps of Refi(0) until an uninterrupted sequence of Refi(1) steps fails to produce a better solution. Sometimes it is useful to express Refi(0) as a fraction of q. If Refi(0) is a floating point value in the range (0,1) then the algorithm replaces it with integer Refi(0)*q and proceed as before.
Irrg—If set to any scalar, then the clustering algorithm adapts to irregular domains by allowing empty boxes but ensuring that the nonempty boxes form a face-connected subset of the partition. For looser measures of connectivity, see the Kcon keyword. By default this keyword is not set.
Kcon—A positive integer (≤ n) defining connectivity for the algorithm invoked by keyword Irrg: Two boxes are connected if they share a common boundary point, and if regarded as unit cubes their centroids are within sqrt(Kcon) of each other. Kcon=1 by default, which implies connected boxes share a common face.
Verb—If set to any scalar, four parameters are printed for each iteration on max(Nbxs): the current solution and its associated average number of points per nonempty box, the previous attempt, and elapsed clock time. By default this keyword is not set.
Discussion
The CLUSTBOX function partitions an n-dimensional point set into n-dimensional boxes of uniform size. The boxes can be specified by the user or can be computed according to the distribution of the points. This type of partitioning is useful as a quick way to define neighborhoods in a point set.
Example
note | This example requires PV-WAVE Advantage. |
; create random data on an ellipse and plot it
MATH_INIT
RANDOMOPT, Set=2
data = RANDOM( 200, /Sphere, Param=2 )
data(*,1) = 0.5 * data(*,1)
posi = [ 0.1, 0.1, 0.9, 0.9 ]
WINDOW, 0, Xsize=1000, Ysize=500
PLOT, data(*,0), data(*,1), Psym=2, Syms=0.2, Posi=posi, Xs=2, $
Ys=2
; cluster data with refinement and with boxes as square as
; possible set Irrg since the data domain is not rectangular
c = CLUSTBOX( data, Rnge=rn, Nbxs=nb, Axes=ax, /Fine, /Bsqr, $
/Irrg )
; display the box-partition
FOR i=0L,nb(0) DO PLOTS, ax(0,[i,i]), rn(*,1)
FOR i=0L,nb(1) DO PLOTS, rn(*,0), ax(1,[i,i])
For an extensive set of interactive examples, refer to the file <RW_DIR>/wave/lib/user/examples/clustbox_ex.pro.
In this demo, the data can be input directly or can be randomly generated with controls over quantity, dimensionality, and range. Also, the random data can be confined to an ND elliptical annulus with control over the radii. Randomly generated data is returned to the user so that it can be used again if desired. For 1D and 2D data, the data is plotted along with the partition boxes. For data of any dimensionality, the clustering statistics are printed to the console. The demo provides a keyword interface to each CLUSTBOX keyword so the user can compare the effects of different keywords. All CLUSTBOX output is returned to the caller of the demo procedure.