Querying a Table
To query a table usually means to subset the data in it. The QUERY_TABLE function returns a new table containing your query results, usually a subset of the original table.
QUERY_TABLE lets you:
*Rearrange a table and rename columns.
*Remove duplicate rows from a table.
*Summarize related groups of data with functions that add, average, count, and perform other calculations.
*Sort columns of data into ascending or descending order.
*Subset a table using Boolean and relational operators to retrieve specific ranges of data.
Restoring a Sample Table
The phone_data table described in this chapter is available in a save file in the WAVE_DATA directory. To restore this file, use the RESTORE command:
RESTORE, !Data_Dir+'phone_example.sav'
If you restore this file, you can practice using most of the commands described in this chapter.
QUERY_TABLE Function
The complete syntax (usage) of the QUERY_TABLE function is:
result = QUERY_TABLE(table, 
' [Distinct] * | col1 [alias] [, ..., coln [alias]]
[Where cond]
[Group By colg1 [,... colgn]] |
[Order By colo1 [direction][,..,colon [direction]]] ' )
Note that the second parameter is one string and must be enclosed in quotes.
For a description of the function’s syntax, see the PV‑WAVE Reference.
Rearranging a Table
One of the simplest uses of QUERY_TABLE is to rearrange and/or rename the columns of an existing table (a table already created with the BUILD_TABLE function). To create a new table from phone_data containing only the phone extensions, area code, and phone number of each call made, you could enter:
new_table = QUERY_TABLE(phone_data, 'EXT, AREA, NUMBER')
Here is a portion of the resulting table:
 
EXT
AREA
NUMBER
311
215
2155554242
358
303
5553869
320
214
2145559893
289
303
5555836
248
617
6175551999
332
614
6145555553
 
note
You can print or plot data from a table. For information on printing table data, see "Formatting and Printing Tables". For information on plotting table data, see "Plotting Table Data".
Renaming Columns
The following command is similar to the previous one, except that aliases (Extension and Area_Code) are used to rename two of the columns:
new_table = QUERY_TABLE(phone_data, $
'EXT Extension, AREA Area_Code, NUMBER')
You can see that these new names are in effect with the INFO command:
INFO, /Structure, new_table
** Structure TABLE_QT_2, 3 tags, 16 length:
 
EXTENSION
LONG
311
AREA_CODE
LONG
215
NUMBER
STRING
'2155554242'
Using the Distinct Qualifier
The Distinct qualifier removes duplicate rows from the columns specified in the QUERY_TABLE command. For example, the following command returns the unique dates appearing in the table:
dates = QUERY_TABLE(phone_data, 'Distinct DATE') 
The result is a one-column table containing unique dates on which data were gathered. All duplicate dates have been filtered out of the result.
PRINT, dates
  {901002}   {901003}   {901004}
 
note
The same basic result can be accomplished with the UNIQUE function, described in the PV‑WAVE Reference. UNIQUE returns the unique elements of any one-dimensional array. When used to find unique elements of a table column, data-structure notation must be used to specify the column (for more information, see "Tables and Structures"). For example:
dates = UNIQUE(phone_data.DATE)
 
Summarizing Data with Group By
The Group By clause sorts the table into rows grouped by common values in specified columns. Used with calculation functions, Group By lets you produce summaries of data associated with each grouping. For example, you can find the total cost of all calls made from each extension:
new_tbl = QUERY_TABLE(phone_data, 'EXT, Sum(COST) Group By EXT')
Or, you can find the number of calls made on each date:
new_tbl = QUERY_TABLE(phone_data, $
'DATE, Count(NUMBER) Group By DATE')
Or, you can obtain the total duration from each extension on each date (a multiple grouping):
tbl = QUERY_TABLE(phone_data, $
'EXT, DATE, Sum(DUR) Group By EXT, DATE')
 
note
The GROUP_BY function performs the same basic operation as the Group By clause of QUERY_TABLE, but with a more compact syntax. For detailed information on GROUP_BY, see the PV-WAVE Reference.
Calculation Functions Used with Group By
Group By is used in conjunction with one or more calculation functions, such as Sum and Count. These functions, shown in Table 9-1: Calculation Functions, operate on the lowest-level grouping to produce the desired result.
 
Calculation Functions  
Function
Description
Phone_Data Applications
Sum( )
Returns the total of the values in the group.
total duration: Sum(DUR)
total cost: Sum(COST)
Count( )
Returns the number of items in the group.
how many calls made:
Count(NUMBER)
Min( )
Returns the smallest element in the group.
first date: Min(DATE)
Max( )
Returns the largest
element in the group.
last day: Max(DATE)
Avg( )
Returns the average of the values in the group.
average cost: Avg(COST)
average duration: Avg(DUR)
 
note
These functions are described further in the description the QUERY_TABLE function in the PV‑WAVE Reference.
Using More than One Calculation Function
More than one calculation function can be placed in a single QUERY_TABLE command. For example, you can create a table showing the total cost and total duration of calls made from each phone extension for the period of time the data were collected.
cost_sum = QUERY_TABLE(phone_data, $
'EXT, Sum(COST), Sum(DUR) Group By EXT')
This produces the new table, called cost_sum containing the columns EXT, SUM_COST, and SUM_DUR. The cost and duration columns are renamed, by default, with the prefix SUM_. This prevents confusion with the existing table columns that are already named COST and DUR.
A portion of the resulting table is shown below. The values in the SUM_COST and SUM_DUR columns represent the total cost and total duration of calls made from each extension.
 
EXT
SUM_COST
SUM_DUR
0
0.00000
4.49000
248
0.350000
1.31000
289
0.00000
16.2300
311
5.78000
21.4000
320
4.71000
17.4400
331
1.02000
3.77000
The INFO command shows the basic structure of this new table:
INFO, /structure, cost_sum
** Structure TABLE_GB_2, 3 tags, 12 length:
 EXT LONG 0
 SUM_COST FLOAT 0.370000
 SUM_DUR FLOAT 592.140
 
note
You could rename the columns in the previous command by adding an alias after the column names. For example, Total_Cost and Total_Time are aliases in the following function:
cost_sum = QUERY_TABLE(phone_data, $
'EXT, Sum(COST) TOTAL_COST, Sum(DUR) '+$
'TOTAL_TIME Group By EXT')
Multiple Groupings
Finally, you can specify more than one column in the Group By clause. For example, you can obtain a grouping by extension and by date. The result is a “group within a group”.
The following command produces such a table:
tbl = QUERY_TABLE(phone_data, $
'EXT, DATE, Sum(DUR) Group By EXT, DATE')
For more information on producing multiple groupings, see the description of QUERY_TABLE in the
Sorting Data with Order By
The Order By clause is used to sort a table. Order By sorts columns into ascending or descending order.
Suppose you want to rearrange the phone data table so that it is sorted by extension, in ascending order (ascending order is the default). You can do this with the following command:
ext_sort = QUERY_TABLE(phone_data, '* Order By EXT')
The asterisk (*) before Order By is a wildcard character that pulls all the columns in phone_data into the resulting table.
 
note
The ORDER_BY function performs the same basic operation as the Order By clause of QUERY_TABLE, but with a more compact syntax. For detailed information on ORDER_BY, see the PV-WAVE Reference.
A portion of the resulting table is shown following this section. Note that the EXT column is sorted in ascending order.
 
DATE
TIME
DUR
INIT
EXT
COST
AREA
NUMBER
901004
95300
1.36
JAT
0
0.00
303
480320
901004
94700
0.80
JAT
0
0.00
303
480320
901002
91600
2.33
JAT
0
0.00
303
480344
901002
94800
1.31
RLD
248
0.35
617
6174941999
901002
94800
16.2
TDW
289
0.00
303
2955836
Sorting in Descending Order
Use the Desc qualifier to sort a column in descending order. For example, the previous table can be further refined by sorting the COST field in descending order:
cost_sort = QUERY_TABLE(phone_data, $
'EXT, COST, DATE Order By EXT, COST Desc')
This command produces a subsetted table with the COST column sorted in descending order (as specified with the Desc qualifier) within each group of extensions. The following table illustrates part of the new table organization, where extensions are sorted first, and then cost is sorted within each primary grouping of extensions:
 
EXT
COST
DATE
370
0.12
901003
370
0.00
901004
379
0.52
901004
379
0.41
901003
418
0.27
901003
Subsetting a Table with the Where Clause
To produce a subset of data in a table, use the QUERY_TABLE function in conjunction with a Where clause. A Where clause begins with the word Where and is followed by Boolean (AND, OR, NOT) and/or relational operators (<, >, <>, =, >=, <=) that describe how the data is to be subsetted. See the PV‑WAVE Reference for more information on these operators.
You can use relational operators (EQ, GE, GT, LE, LT, and NE in a Where clause instead of the SQL-style operators listed above.
For example, to create a subset of the phone_data table that only contains calls made on one particular day:
new_table = QUERY_TABLE(phone_data, '* Where DATE = 901002')
The asterisk (*) before Where is a wildcard character that pulls all the columns in phone_data into the resulting table.
Here is a portion of the resulting table—only rows with date 901002 are included:
 
DATE
TIME
DUR
INIT
EXT
COST
AREA
NUMBER
901002
093200
21.40
TAC
311
5.78
215
2155554242
901002
094700
1.05
2
358
0.0
303
5553869
901002
094700
17.44
1
320
4.71
214
2145559893
901002
094800
16.23
2
289
0.0
303
5555836
901002
094800
1.31
1
248
.35
617
6175551999
To find the calls made on 901002 with a duration of greater than 10 minutes, enter:
new_table = QUERY_TABLE(phone_data, $
'* Where DATE = 901002 AND DUR > 10.0')
The resulting subset is illustrated in the following table. All rows contain dates 90102 and durations greater than 10.0.
 
DATE
TIME
DUR
INIT
EXT
COST
AREA
NUMBER
901002
093200
21.40
TAC
311
5.78
215
2155554242
901002
094700
17.44
EBH
320
4.71
214
2145559893
901002
094800
16.23
TDW
289
0.0
303
5555836
 
note
If you are familiar with SQL, you will see that this Where clause is similar to the Where clause in the SQL SELECT command.
Using Strings in Where Clauses
The Where clause lets you filter strings in a number of different ways. In the simplest case, you want to find information related to a single string, such as a set of initials. For example, to find the calls made by the person with the initials TAC, you can enter:
res = QUERY_TABLE(phone_data, '* Where INIT = "TAC" ')
Note that the string must be enclosed in quotes inside the function call. Also note that double quotation marks are used to delimit TAC. This is because apostrophes were used to delimit the entire QUERY_TABLE string parameter.
 
note
If the string is passed into the function as a variable parameter, as explained in the section "Passing Variable Parameters into Table Functions", then the quotes are unnecessary.
In a more complex case, you can use relational and Boolean operators to filter the strings in a column to find a particular subset of strings. For example, the following command uses relational and Boolean operators to filter the INIT column, which contains the initials of callers:
res = QUERY_TABLE(phone_data, $
'* Where (INIT >= "B") AND (INIT < "D") ')
The result of this query is a new table containing information on the calls made by people whose initials begin with the letter B and C.
Passing Variable Parameters into Table Functions
Any string or numeric constant used in the QUERY_TABLE function can be passed in as a variable parameter. This means that you can use variables for numeric and string values that are used in the QUERY_TABLE function. For example, you can create a string variable called name and use it in the QUERY_TABLE function:
name = 'TAC'
tbl = QUERY_TABLE(phone_data, '* Where INIT = name')
Because name is a variable and not an actual string, you do not have to enclose it in double quotes inside the function call.
The command shown in the previous section that finds the calls made on 901002 with a duration of greater than 10 minutes can also be written with variable parameters in place of actual values:
day = 901002
calldur = 10.0
new_table = QUERY_TABLE(phone_data, $
'* Where DATE = day AND DUR > calldur')
 
note
If the variable name and the column name in a comparison are the same, the result of the comparison simply returns “true” for all cases, and the desired comparison may not be made. The following example is similar to the previous example, except the day variable is changed to date, which is also a column name.
date = 901002
calldur = 10.0
new_table = QUERY_TABLE(phone_data, $
' * Where DATE = date AND DUR > calldur')
In this QUERY_TABLE call, DATE = date returns “true” for all cases, rather than only for cases where the date is 901002. The comparison DATE = 901002 is not made as might be expected. Thus, try to choose column names that are different from the variable names.
Using the In Operator
The In operator provides another means of filtering data in a table. This operator tests for membership in a set (one-dimensional array) of values. For example, the following array contains a subset of the initials found in the INIT column of the phone_data table:
nameset = ['TAC', 'BWD', 'TDW', 'RLD']
The following QUERY_TABLE call produces a new table that contains information only on the members of nameset:
res = QUERY_TABLE(phone_data, '* Where INIT In nameset')
Combining Multiple Clauses in a Query
You can place more than one clause in a QUERY_TABLE call to produce more complicated and specific queries. Once you understand the basic parts of QUERY_TABLE, combining these parts into more complex queries is a straightforward process.
Within the QUERY_TABLE function, the Group By and Order By functions are mutually exclusive. That is, you cannot place both Group By and Order By in the same QUERY_TABLE call.
Example
The following command produces a table that:
*includes only calls with a duration of more than one minute.
*includes only calls with an area code not equal to 303 (out-of-state calls only).
*sorts the table by phone extension, in ascending order.
*sorts the table, within extension subgroups, by date in descending order.
*sorts the table, within date subgroups, by duration in ascending order.
result = QUERY_TABLE(phone_data, $
'* Where (DUR > 1.0) And (AREA <> 303) '+ $
'Order By EXT, DATE Desc, DUR Desc')
A portion of the table is shown below:
 
DATE
TIME
DUR
INIT
EXT
COST
AREA
NUMBER
901002
094800
1.31
RLD
248
.35
617
6175551999
901002
093200
21.40
TAC
311
5.78
215
2155554242
901002
094700
17.44
EBH
320
4.71
214
2145559893
901004
095000
3.77
DJC
331
1.02
512
5125551228
901003
091500
2.53
DLH
332
.68
614
6145555553
901004
094900
1.93
SRB
379
.52
818
8185552880
901003
091600
1.53
SRB
379
.41
212
2125556618