SORTDATA Function
Sorts observations by specified keys, with option to tally cases into a multiway frequency table.
Usage
result = SORTDATA(x, n_keys)
Input Parameters
x—One- or two-dimensional array containing the observations to be sorted.
n_keys—Number of columns of x on which to sort. The first n_keys columns of x are used as the sorting keys. (Exception: See keyword Indices_Keys).
Returned Value
result—The sorted array.
Input Keywords
Double—If present and nonzero, double precision is used.
Indices_Keys—One-dimensional array of length n_keys giving the column numbers of x which are to be used in the sort. Default: Indices_Keys(*) = 0, 1, ..., n_keys – 1
Frequencies—One-dimensional array containing the frequency for each observation in x. Default: Frequencies (*) = 1
Ascending—If present and nonzero, the sort is in ascending order. (Default) Keywords Ascending and Descending cannot be used together.
Descending—If present and nonzero, the sort is in descending order. Keywords Ascending and Descending cannot be used together.
Output Keywords
Permutation—Named variable into which a one-dimensional array containing the rearrangement (permutation) of the observations (rows) is stored.
Table_N—Named variable into which a one-dimensional array of length n_keys, containing in its ith element (i = 0, 1, ..., (n_keys – 1)) the number of levels or categories of the ith classification variable (column), is stored. Keywords Table_N, Table_Values, and Table_Bal must be used together.
Table_Values—Named variable into which an array of length Table_N(0) + Table_N(1) + ... + Table_N(n_keys – 1), containing the values of the classification variables, is stored. The first Table_N(0) elements of Table_Values contain the values for the first classification variable. The next Table_N(1) contain the values for the second variable. The last Table_N(n_keys – 1) positions contain the values for the last classification variable. Keywords Table_N, Table_Values, and Table_Bal must be used together.
Table_Bal—Named variable into which an array of length Table_N(0) + Table_N(1) + ... + Table_N(n_keys – 1), containing the frequencies in the cells of the table to be fit, is stored. Empty cells are included in Table_Bal, and each element of Table_Bal is nonnegative. The cells of Table_Bal are sequenced so that the first variable cycles through its Table_N(0) categories one time, the second variable cycles through its Table_N(1) categories Table_N(0) times, the third variable cycles through its Table_N(2) categories Table_N(0) × Table_N(1) times, etc., up to the n_keys-th variable, which cycles through its Table_N(n_keys – 1) categories:
Table_N(0) + Table_N(1) + Table_N(n_keys – 2)
times. Keywords Table_N, Table_Values, and Table_Bal must be used together.
N_List_Cells—Named variable into which the number of nonempty cells is stored. Keywords N_List_Cells, List_Cells, and Table_Unbal must be used together.
List_Cells—Named variable into which the two-dimensional array of length N_List_Cells × n_keys containing, for each row, a list of the levels of n_keys corresponding classification variables that describe a cell, is stored. Keywords N_List_Cells, List_Cells, and Table_Unbal must be used together.
Table_Unbal—Named variable into which the one-dimensional array of length N_List_Cells containing the frequency for each cell is stored. Keywords N_List_Cells, List_Cells, and Table_Unbal must be used together.
N_Cells—Named variable into which the a one-dimensional array containing the number of observations per group is stored. A group contains observations (rows) in x that are equal with respect to the method of comparison. The first N_Cells (0) rows of the sorted x are in group number 1. The next N_Cells (1) rows of the sorted x are in group number 2, etc. The last N_Cells(N_ELEMENTS(N_Cells) – 1) rows of the sorted x are in group number N_ELEMENTS(N_Cells).
Discussion
Function SORTDATA can perform both a key sort and/or tabulation of frequencies into a multiway frequency table.
Sorting
Function SORTDATA sorts the rows of real matrix x using particular columns in x as the keys. The sort is algebraic with the first key as the most significant, the second key as the next most significant, etc. When x is sorted in ascending order, the resulting sorted array is such that the following is true:
For
i = 0, 1, ..., N_ELEMENTS (
x(*, 0)) – 2,
x(1,
Indices_Keys(0))
≤ x(
i + 1,
Indices_Keys(0))
For
k = 1, ...,
n_keys – 1, if
x(1,
Indices_Keys(
j)) =
x(
i + 1,
Indices_Keys(
j)) for
j = 0, 1, ...,
k – 1, then
x(1,
Indices_Keys(
j)) =
x(
i + 1,
Indices_Keys(
k))
The observations also can be sorted in descending order.
The rows of x containing the missing value code NaN in at least one of the specified columns are considered as an additional group. These rows are moved to the end of the sorted x.
The sorting algorithm is based on a quicksort method given by Singleton (1969) with modifications by Griffin and Redish (1970) and Petro (1970).
Frequency Tabulation
Function SORTDATA determines the distinct values in multivariate data and computes frequencies for the data. This function accepts the data in the matrix x but performs computations only for the variables (columns) in the first n_keys columns of x (Exception: see optional keyword Indices_Keys). In general, the variables for which frequencies should be computed are discrete; they should take on a relatively small number of different values. Variables that are continuous can be grouped first. The function FREQTABLE can be used to group variables and determine the frequencies of groups.
When Table_N, Table_Values, and Table_Bal are specified, SORTDATA fills the vector Table_Values with the unique values of the variables and tallies the number of unique values of each variable in the vector Table_Bal. Each combination of one value from each variable forms a cell in a multiway table. The frequencies of these cells are entered in Table_Bal so that the first variable cycles through its values exactly once and the last variable cycles through its values most rapidly. Some cells cannot correspond to any observations in the data; in other words, “missing cells” are included in the Table_Bal table and have a value of zero.
When N_List_Cells, List_Cells, and Table_Unbal are specified, the frequency of each cell is entered in Table_Unbal so that the first variable cycles through its values exactly once and the last variable cycles through its values most rapidly. All cells have a frequency of at least 1, i.e., there is no “missing cell.” The array List_Cells can be considered “parallel” to Table_Unbal because row i of List_Cells is the set of n_keys values that describes the cell for which row i of Table_Unbal contains the corresponding frequency.
Example 1
The rows of a 10 × 3 matrix x are sorted in ascending order using Columns 0 and 1 as the keys. There are two missing values (NaNs) in the keys. The observations containing these values are moved to the end of the sorted array.
f = MACHINE(/Float)
c0 =[1.0, 2.0, 1.0, 1.0, 2.0, 1.0, f.NaN, 1.0, 2.0, 1.0]
c1 =[1.0, 1.0, 1.0, 1.0, f.NaN, 2.0, 2.0, 1.0, 2.0, 1.0]
c2 =[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 9.0]
x = [ [c0], [c1], [c2] ]
PM, x, Title = 'Unsorted Matrix'
; PV-WAVE prints the following:
; Unsorted Matrix
; 1.00000 1.00000 1.00000
; 2.00000 1.00000 2.00000
; 1.00000 1.00000 3.00000
; 1.00000 1.00000 4.00000
; 2.00000 NaN 5.00000
; 1.00000 2.00000 6.00000
; NaN 2.00000 7.00000
; 1.00000 1.00000 8.00000
; 2.00000 2.00000 9.00000
; 1.00000 1.00000 9.00000
PM, SORTDATA(x, 2), Title = 'Sorted Matrix'
; PV-WAVE prints the following:
; Sorted Matrix:
; 1.00000 1.00000 1.00000
; 1.00000 1.00000 9.00000
; 1.00000 1.00000 3.00000
; 1.00000 1.00000 4.00000
; 1.00000 1.00000 8.00000
; 1.00000 2.00000 6.00000
; 2.00000 1.00000 2.00000
; 2.00000 2.00000 9.00000
; NaN 2.00000 7.00000
; 2.00000 NaN 5.00000
Example 2
This example uses the same data as the previous example. The permutation of the rows is output using the keyword Permutation.
f = MACHINE(/Float)
; Fill up a matrix, including some missing values.
c0 =[1.0, 2.0, 1.0, 1.0, 2.0, 1.0, f.NaN, 1.0, 2.0, 1.0]
c1 =[1.0, 1.0, 1.0, 1.0, f.NaN, 2.0, 2.0, 1.0, 2.0, 1.0]
c2 =[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 9.0]
x = [ [c0], [c1], [c2] ]
; Output the unsorted matrix.
PM, x, Title = 'Unsorted Matrix'
; PV-WAVE prints the following:
; Unsorted Matrix
; 1.00000 1.00000 1.0000
; 2.00000 1.00000 2.00000
; 1.00000 1.00000 3.00000
; 1.00000 1.00000 4.00000
; 2.00000 NaN 5.00000
; 1.00000 2.00000 6.00000
; NaN 2.00000 7.00000
; 1.00000 1.00000 8.00000
; 2.00000 2.00000 9.00000
; 1.00000 1.00000 9.00000
; Use SORTDATA to sort x.
y = SORTDATA(x, 2, Permutation = permutation)
PM, y, Title = 'Sorted Matrix:'
; PV-WAVE prints the following:
; Sorted Matrix:
; 1.00000 1.00000 1.00000
; 1.00000 1.00000 9.00000
; 1.00000 1.00000 3.00000
; 1.00000 1.00000 4.00000
; 1.00000 1.00000 8.00000
; 1.00000 2.00000 6.00000
; 2.00000 1.00000 2.00000
; 2.00000 2.00000 9.00000
; NaN 2.00000 7.00000
; 2.00000 NaN 5.00000
; Print the permutation vector.
PM, permutation, Title = 'Permutation Matrix:'
; Print the permutation vector.
; Permutation Matrix:
; 0
; 9
; 2
; 3
; 7
; 5
; 1
; 8
; 6
; 4
z = x(permutation, *)
; Use the permutation vector to sort the data.
PM, z, Title = 'Sorted Matrix'
; Print the permutation vector.
; Sorted Matrix
; 1.00000 1.00000 1.00000
; 1.00000 1.00000 9.00000
; 1.00000 1.00000 3.00000
; 1.00000 1.00000 4.00000
; 1.00000 1.00000 8.00000
; 1.00000 2.00000 6.00000
; 2.00000 1.00000 2.00000
; 2.00000 2.00000 9.00000
; NaN 2.00000 7.00000
; 2.00000 NaN 5.00000