DC_READ_FREE Function
Reads freely-formatted ASCII files, which have the following limits:
maximum field width (non white space characters): 256
maximum number of columns: 2048
maximum number of variables: 2048
maximum number of bytes per row: 7422
Usage
status = DC_READ_FREE(filename, var_list)
Input Parameters
filename — A string containing the pathname and filename of the file containing the data.
Output Parameters
var_list — The list of variables into which the data is read. Include as many variables names in var_list as you want to be filled with data, up to a maximum of 2048. Note that variables of type structure are not supported. An exception to this is the !DT, or date/time, structure. It is possible to transfer date/time data using this routine.
note | The variables in the var_list do not need to be predefined unless multiple data types exist in the data file. An example of a file with multiple data types is: 08/04/1994 10:00:00 23.00 -94.00 11.00 Since the above example contains date/time and float data types, all of the variables holding this data will need to be declared before the DC_READ_FREE function is called. |
Returned Value
status — The value returned by DC_READ_FREE; expected values are:
< 0 — Indicates an error, such as an invalid filename or an I/O error.
0 — Indicates a successful read.
Keywords
Column — A flag that signifies filename is a column-organized file.
Delim — An array of single-character strings that are the field separators used in the data file. If not provided, a comma- or space- delimited file is assumed.
note | TAB is NOT a default delimeter for DC_READ_FREE, use the octal value '\011' or decimal value 9B to identify the TAB character. For example: status = DC_READ_FREE('file',v1,v2,..., Delim='\011') or status = DC_READ_FREE('file',v1,v2,..., Delim=9B) |
Dt_Template — An array of integers indicating the date/time templates that are to be used for interpreting date/time data. Positive numbers refer to date templates; negative numbers refer to time templates. For more details, see
"Example 5". To see a complete list of date/time templates, see the PV‑WAVE
Programmer’s Guide.
Filters — An array of one-character strings that PV-WAVE should check for and filter out as it reads the data. A character found on the keyboard can be typed; a special character not found on the keyboard is specified by ASCII code. The maximum number of filter characters is 10. For more details about the
Filters keyword, see
"Filtering and Substitution While Reading Data".
Get_Columns — An array of integers indicating column numbers to read in the file. If not provided or if set equal to zero (0), all columns are read. Ignored if the Row keyword is supplied.
note | Get_Columns defines an array of column numbers that correspond with variables supplied in var_list. For example, if an array, Get_Columns=[5, 1, 3], is supplied with variables a1, a2, a3, the Get_Columns array will automatically be sorted before reading the columns, resulting in a1 = column #1, a2 = column #3, a3 = column #5. |
Ignore — An array of strings; if any of these strings are encountered, PV-WAVE skips the entire line and starts reading data from the next line. The maximum number of ignore strings is 64. Any string is allowed, but the following five strings have special meanings:
$BLANK_LINES — Skip all blank lines; this prevents those lines from being interpreted as a series of zeroes.
$TEXT_IN_NUMERIC — This depends upon whether
Column or
Row is set:
Column is set: | Skip any line where text is found in a numeric field. If all lines are skipped, all numeric and date/time variables in var_list are set to 0, all string variables are set to an empty string, and the following error message is issued: % DC_READ_FREE: Text is found in a numeric field, and all lines are skipped. |
Row is set: | Skip the data when it is not a numeric value and fill in the variable with the next data. No warning message is issued. |
$BAD_DATE_TIME — Skip lines where invalid date/time data is found.
$NSKIP — Take the passed-in values in the
Ignore keyword into account when skipping records via the
Nskip keyword. This was not done prior to PV-WAVE version 2017.0, so it must be enabled with the
$NSKIP value in the
Ignore array to avoid breaking existing user code.
$NRECS — Take ignored records into account when counting records via the
Nrecs keyword. Ignored records were still counted towards the total number of records read for the purposes of the
Nrecs keyword prior to PV-WAVE version 2017.0. To avoid breaking existing code, the
$NRECS string must be used to enable the new behavior.
Note that this keyword causes the incoming buffer to be preprocessed and to have all ignored records stripped out before variable processing begins. One side effect of this is that, when used in conjunction with $TEXT_IN_NUMERIC, the entire record is disqualified if there is any text in any column, whether you specify that column for reading or not.
For an example showing the
Ignore keyword, see
"Example 2".
Miss_Str — A single string or string array that specifies strings that may be present in the data file to represent missing data. If not present, PV-WAVE does not check for missing data as it reads the file.
Miss_Vals — A scalar or a floating point array of values, each of which corresponds to a string in Miss_Str. As PV-WAVE reads the input data file, occurrences of strings that match those in Miss_Str are replaced by the corresponding element of Miss_Vals.
MissStr_Vals — A single string or an array of strings, each of which corresponds to a string in Miss_Str. As PV-WAVE reads the input data file, occurrences of strings that match those in Miss_Str are replaced by the corresponding string in MissStr_Vals.
Nrecs — Number of records to read. If not provided or if set equal to zero (0), the entire file is read. For more information about records, see
"Physical Records vs. Logical Records".
Nskip — Number of physical records in the file to skip before data is read. If not provided or if set equal to zero (0), no records are skipped.
Resize — An array of integers indicating the variables in var_list that can be resized based on the number of records detected in the input data file. Values in Resize should be in the range:
1 ≤ Resizen ≤ #_of_vars_in_var_list
For an example showing how to use the
Resize keyword, see DC_READ_FIXED,
"Example 4", or DC_READ_FREE,
"Example 4".
Row — A flag that signifies filename is a row-organized file. If neither Row nor Column is present, Row is the default.
Vals_Per_Rec — A long integer that specifies how many values comprise a single record in the input data file; use only with column-oriented files. If not provided, each line of data in the file is treated as a new record. For more details about when to use the
Vals_Per_Rec keyword, see
"Example 4".
Discussion
DC_READ_FREE is very adept at reading column-oriented data files. Also, DC_READ_FREE handles many steps that you have to do yourself when using other PV-WAVE functions and procedures. These steps include: 1) opening the file, 2) assigning it a logical unit number (LUN), and 3) closing the file when you are done reading the data.
DC_READ_FREE relieves you of the task of composing a format string that describes the organization of the data in the input file. All you do is tell DC_READ_FREE which delimiters to expect in the file; comma and space are the default delimiters expected. In other words, DC_READ_FREE easily reads data values separated by any combination of commas and spaces, or any other delimiters that you explicitly define using the Delim keyword.
If neither the Row nor Column keywords are provided, the file is assumed to be organized by rows. If both keywords are used, the Row keyword is assumed.
note | This function can be used to read data into date/time structures, but not into any other kind of structures. |
String Resources Used By This Function
Upon execution, the DC_READ_FREE function examines two strings in a string resource file. These strings, described below, allow you to control how the function handles binary files.
The string resource file is:
(UNIX) <wavedir>/xres/!Lang/kernel/dc.ads
(WIN) <wavedir>\xres\!Lang\kernel\dc.ads
Where <wavedir> is the main PV‑WAVE directory.
The strings that are examined are DC_binary_check and DC_allow_chars.
DC_binary_check — This string can be set to the values True or False. If set to True, the data file is checked for the presence of binary characters before the file is read. If binary characters are found, the file is not read. If this string is set to False, no binary character checking is performed. (Default: True)
For example, to turn off binary checking, set the string as follows in the dc.ads file:
DC_binary_check: False
DC_allow_chars — Specifies additional characters to allow in the check for binary files. If non-printable characters are found, the file is considered to be a binary file and the file is not read. By default, all printable characters in the system locale are allowed. Characters may be specified either by entering them directly or numerically by three digit decimal values by preceding them with a “\” (backslash).
For example, to allow characters 165 and 220, set the string as follows in the dc.ads file:
DC_allow_chars: \165\220
How the Data is Transferred into Variables
As many as 2048 variables can be included in the input argument var_list. You can use the continuation character ($) to continue the function call onto additional lines, if needed. Any undeclared variables in var_list are assumed to have a data type of float (single-precision floating-point).
As data is being transferred into multi-dimensional variables, those variables are treated as collections of scalar variables, meaning the first subscript of the import variable varies the fastest. For two-dimensional import variables, this implies that column index varies faster than the row index. In other words, data is transferred into a two-dimensional import variable one row at a time. For more details about reading column-oriented data into multi-dimensional variables, see
"Example 4".
If the current input line is empty or DC_READ_FREE has reached the end of the line and there are still unused variables in var_list, the next line is read. When there are no unused variables left in var_list, the remainder of the line is ignored.
When reading into numeric variables, PV-WAVE attempts to convert the input into a value of the expected type. Decimal points are optional and scientific notation is allowed when reading into FLOAT and DOUBLE variables and the string $NRECS is not included in your Ignore keyword settings. If a real value is provided for an integer variable, the value is truncated at the decimal point.
note | If the file contains string data, make sure the strings do not contain delimiter characters. Otherwise, the string will be interpreted as more than one string, and the data in the file will not match the variable list. |
Once all variables in the variable list have been filled with data, DC_READ_FREE stops reading data, and returns a status code of zero (0). Even if an error occurs, and status is nonzero, the data that has been read successfully (prior to the error) is returned in the var_list variables.
note | If an error does occur, use the PRINT command to view the contents of the variables to see where the last successfully read value occurs. This will enable you to isolate the portion of the file in which the error occurred. |
Physical Records vs. Logical Records
In an ASCII text file, the end-of-line is signified by the presence of either a CTRL-J or a CTRL-M character, and a record extends from one end-of-line character to the next. However, there are actually two kinds of records:
physical records
logical records
For column-oriented files, the amount of data in a physical record is often sufficient to provide exactly one value for each variable in var_list, and then it is a logical record, as well. For row-oriented files, the concept of logical records is not relevant, since data is merely read as contiguous values separated by delimiters, and the end-of-line is merely interpreted as another delimiter.
note | The Nrecs keyword counts by logical records, if they have been defined. The Nskip keyword, on the other hand, counts by physical records, regardless of any logical record size that has been defined. |
Changing the Logical Record Size
You can use the
Vals_Per_Rec keyword to explicitly define a different logical record size, if you wish. However, in most cases, you do not need to provide this keyword. For an example of when to use the
Vals_Per_Rec keyword, see
"Example 4".
note | By default, DC_READ_FREE considers the physical record to be one line in the file, and the concept of a logical record is not needed. But if you are using logical records, the physical records in the file must all contain the same number of values. The Vals_Per_Rec keyword can be used only with column-oriented data files. |
Filtering and Substitution While Reading Data
If you want certain characters filtered out of the data as it is read, use the Filters keyword to specify these characters. Each character (or sequence of digits that represents the ASCII code for a character) must be enclosed with single quotes. For example, either of the following is a valid specification:
',' or '44'
Furthermore, the two specifications shown above are equivalent to one another. For an example of using the
Filters keyword, see
"Example 4".
note | Be sure not to filter characters that were used in the file as delimiters. The delimiters enable DC_READ_FREE to discern where one data value ends and another one begins. |
Characters that match one of the values in Filters are treated as if they aren’t even there; in other words, these characters are not treated as data and do not contribute to the size of the logical record, if one has been defined using the Vals_Per_Rec keyword.
| If you want to supply multi-character strings instead of individual characters, you can do this with the Ignore keyword. However, keep in mind that a character that matches Filters is simply discarded, and filtering resumes from that point, while a string that matches Ignore causes that entire line to be skipped. |
So if you are reading a data file that contains a value such as #$*10.00**, but you don’t want the entire line to be skipped, filter the characters individually with Filters = ['#', '$', '*'] instead of collectively with Ignore = ['#$*', '**'].
Data Substitution and Missing Data
PV-WAVE expects to substitute an element from Miss_Vals or MissStr_Vals whenever it encounters one of the following situations:
a string from
Miss_Str in the data or
two or more non-whitespace delimiters in a row with no data in between them.
To have PV-WAVE insert data when two or more non-whitespace delimiters in a row with no data in between them is encountered, include an empty string in the Miss_Str keyword array. If you do not include an empty missing string, a 0 (zero) will be inserted for the missing data. This feature works for strings as well as all other data types; for strings, substituted values are read from the MissStr_Vals keyword array.
note | The Miss_Vals or MissStr_Vals keyword is required if Miss_Str is used. If two delimiters in a row are detected when the Miss_Str array contains an empty string, the corresponding value from the Miss_Vals or MissStr_Vals array is substituted. Missing data will not be detected if whitespace delimiters, tabs or spaces, are used in the data file. If the number of elements in Miss_Str does not match the number of elements in Miss_Vals or MissStr_Vals, a nonzero status is returned and no data is read. The maximum number of values permitted in Miss_Str, Miss_Vals, and MissStr_Vals is 10. |
If the end of the file is reached before all variables are filled with data, the remainder of each variable is set to Miss_Vals(0) or MissStr_Vals(0) if one of these keywords was specified, or 0 (zero) if neither was specified. In this case, status is returned with a value less than zero to signify an unexpected end-of-file condition.
Delimiters in the Input File
Values in the file can be separated by commas, spaces, and any other delimiter characters specified with the Delim keyword. If you use any other delimiter, the delimiter character is treated as data and type conversion is attempted. If type conversion is not possible, status is set to less than zero to signify an error condition.
note | Use a different delimiter to separate data values in the file than you use to separate the different fields of dates and times, such as months, days, hours, and minutes. Otherwise, your date/time data may not be interpreted correctly. The only delimiters that can be used inside date/time data are: slash ( / ), colon (:), hyphen (–), and comma (,). |
Reading Row-Oriented Files
If you include the Row keyword, each variable in var_list is completely filled before any data is transferred to the next variable.
When reading row-oriented data, only the dimensionality of the last variable in var_list can be unknown; a variable of length n is created, where n is the number of values remaining in the file. All other variables in var_list must be pre-dimensioned.
If you include the Resize keyword with the call to the function DC_READ_FREE, the last variable can be redimensioned to match the actual number of values that were transferred to the variable during the read operation.
If you are interested in an illustration showing what row-oriented data can look like inside a file, see the PV‑WAVE Programmer’s Guide.
Reading Column-Oriented Files
If you include the Column keyword, DC_READ_FREE views the data files as a series of columns, with a one-to-one correspondence between columns in the file and variables in the variable list. In other words, one value from the first record of the file is transferred into each variable in var_list, then another value from the next record of the file is transferred into each variable in var_list, and so forth, until all the data in the file has been read, or until the variables are completely filled with data.
If a variable in var_list is undefined, a floating-point variable of length n is created, where n is the number of records read from the file. To get a similar effect in an existing variable, include the Resize keyword with the function call.
All variables specified with the Resize keyword are redimensioned to the same length — the length of the longest column of data in the file. The variables that correspond to the shortest columns in the file will have one or more values added to the end; either Miss_Vals(0) if it was specified, or 0 (zero) if Miss_Vals was not specified.
If you are interested in an illustration demonstrating what column-oriented data can look like inside a file, see the PV‑WAVE Programmer’s Guide.
For more information about how column-oriented data in a file is read into multi-dimensional variables, see
"Multi-dimensional Variables".
Example 1
The data file shown below is a freely-formatted ASCII file named monotonic.dat:
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
The function call:
status = DC_READ_FREE(!Data_dir + 'monotonic.dat', var1, $
/Column, Get_Columns=[3])
results in var1=[3.0, 8.0, 13.0, 18.0]. Because var1 was not predefined, DC_READ_FREE creates it as a resizable one-dimensional floating-point array.
On the other hand, the commands:
var1 = INTARR(2)
var2 = INTARR(2)
status = DC_READ_FREE('monotonic.dat', var1, $
var2, /Column, Get_Columns=[2, 4], Nskip=2)
result in var1=[12, 17] and var2=[14, 19].
Example 2
The data file shown below is a freely-formatted ASCII file named measure.dat:
0 5 10 15 20 25 30 35 40 45 50 56 61 66 71 76 81 86 91
96 101 107 112 117 122 127 132 137 142 147 152 158 163 168 173 178 183 188
193 198 203 209 214 219 224 229 234 239 244 249 255 255 255 255 255 255 255
255 255 255 255 255
The commands:
var1 = INTARR(5)
var2 = INTARR(5)
status = DC_READ_FREE(!Data_dir + 'measure.dat', $
var1, var2, Ignore=['$BLANK_LINES'])
result in var1 = [0, 5, 10, 15, 20] and var2 = [25, 30, 35, 40, 45]. Note that the file was interpreted as row-oriented data, since neither the Row or Column keyword was specified. All blank lines are ignored.
note | If the Resize = [2] keyword had been provided, var2 would have been resizable and would have ended up having many more elements. Specifically, var2 would have ended up with 57 elements instead of just 5. |
Example 3
The data file shown below is a freely-formatted ASCII file named intake.dat:
151-182-BADY-214-515
316-197-BADX-199-206
The commands:
valve = INTARR(30)
status = DC_READ_FREE(!Data_dir + 'intake.dat', $
valve, Miss_Str=['BADX','BADY'], $
Miss_Vals=[9999, -9999], Resize=[1], Delim=['-'])
result in valve=[151, 182, –9999, 214, 515, 316, 197, 9999, 199, 206]. The hyphens in the data are filtered out. Because valve is resizable, it ends up with 10 elements instead of 30. The two values from Miss_Vals are substituted for the two strings in the file, 'BADX' and 'BADY'.
Example 4
The data file shown below is a freely-formatted ASCII file named level.dat. This data file uses the semi-colon (;) and the slash (/) as delimiters, and the comma (,) to separate the thousands digit from the hundreds digit. This file has three logical records on every line; at the end of each logical record is a slash:
5,992;17,121/8,348;17,562/5,672;19,451/
5,459;18,659/7,088;17,052/8,541;13,437/
6,362;15,894/8,992;17,509/7,785;14,796/
The commands:
gap = INTARR(20)
bar = INTARR(20)
status = DC_READ_FREE(!Data_dir + 'level.dat', gap, bar, $
/Column, Delim=[';', '/'], Filter=[','], $
Resize=[1, 2], Vals_Per_Rec=2)
result in:
gap = [5992, 8348, 5672, 5459, 7088, 8541,
6362, 8992, 7785] and bar = [17121, 17562,
19451, 18659, 17052, 13437, 15894, 17509,
14796].
The commas have been filtered out of the data because of the value of the string that was provided with the Filter keyword.
Suppose you wanted gap and bar to be dimensioned as 3-by-3 arrays instead of 1-by-9 vectors. The best way to do this is by reading the data with the commands shown above, and then using the REFORM command to redimension the variables:
gaparr = REFORM(gap, 3, 3)
bararr = REFORM(bar, 3, 3)
By approaching the data transfer in this way, DC_READ_FREE does not expect to transfer two columns of data into the same multi-dimensional variable.
For example, the following commands demonstrate the problem:
gap = INTARR(3, 3)
bar = INTARR(3, 3)
status = DC_READ_FREE('level.dat', gap, bar, $
/Column, Delim=[';', '/'], Filter=[','], $
Resize=[1, 2], Vals_Per_Rec=2)
results in:
and:
The data is transferred into gap using the rule, “The first subscript varies fastest.” With Vals_Per_Rec set to “2”, no value is available for the third column — hence, every element in the third column is set equal to “0” (zero). Furthermore, notice that gap gets all the data (it is resizable) and bar gets none of the data.
Example 5
Assume that you have a file, events.dat, that contains some data values and also some chronological information about when those data values were recorded:
01/01/92 5:45:12 10 01-01-92 3276
02/01/92 10:10:10 15.89 06-15-91 99
05/15/91 2:02:02 14.2 12-25-92 876
The date/time templates that will be used to transfer this data have the following definitions:
1 — MM*DD*YY (* = any delimiter)
–1 — HH*MM*SS (* = any delimiter)
To read the date and time from the first two columns into one date/time variable and read the third column of floating point data into another variable, use the following commands:
; The system structure definition of date/time is !DT. Date/time
; variables must be defined as !DT structure arrays before being
; used if the date/time data is to be read as such.
date1 = REPLICATE({!DT},3)
; Note, the variable date1 is listed twice; this way, both the date
; data and the time data can be stored in the same variable, date1.
status = DC_READ_FREE(!Data_dir + 'events.dat', date1, $
date1, float1, /Column, Dt_Template=[1,-1], Delim=[' '])
; To see the values of the two variables, you can use the PRINT command.
; Print one row at a time.
FOR I = 0,2 DO PRINT, date1(I), float1(I)
To see the values of the two variables, you can use the PRINT command:
; Print one row at a time.
FOR I = 0,2 DO BEGIN
PRINT, date1(I), float1(I)
ENDFOR
Executing these statements results in the following output:
{ 1992 01 01 05 45 12.00 87402.240 0}
10.0000 { 1992 02 01 10 10 10.00 87433.424 0}
15.8900 { 1992 05 15 02 02 02.00 87537.035 0} 14.2000
Because date1 is a structure, curly braces, “{” and “}”, are placed around the output. When displaying the value of date1 and float1, PV-WAVE uses default formats for formatting the values, and attempts to place as many items as possible onto each line.
note | Another alternative to view the contents of date1 and float1 is to use the DT_PRINT command instead of PRINT. |
For more information about the internal organization of the !DT system structure, see the PV‑WAVE Programmer’s Guide.
To read the first, second, fourth, and fifth columns, define an integer array and another date/time variable, and change the call to DC_READ_FREE as shown below:
calib = INTARR(3)
date2 = REPLICATE({!DT},3)
status = DC_READ_FREE('events.dat', date1, $
date1, date2, calib, /Column, Delim=[' '], $
Get_Columns= [1, 2, 4, 5], Dt_Template = $
[1, -1], Ignore=['$BAD_DATE_TIME'])
Notice how the date/time templates are reused. For each new record, Template 1 is used first to read the date data into date1. Next, Template –1 is used to read the time data into date1. Finally, since there is another date/time variable to be read (date2) and there are no more templates left, the template list is reset and Template 1 is used again. The template list is reset for each record.
Normally an error would be reported if the input text to be read as date/time is invalid and cannot be converted. But because the Ignore=["$BAD_DATE_TIME"] keyword was provided, any record containing this type of error is ignored and no error is reported.
Example 6
The following data file is a freely-formatted ASCII file named num.dat:
0,1,,3,4
5,6,7,8,9
To substitute the number 99 for a missing value, use the following:
; Initialize the value to be returned to an integer
returnval = 0
status = DC_READ_FREE(!Data_dir + 'num.dat', returnval, /Column, $
Resize=[1], Get_Col=[3], Miss_Str=[''], Miss_Val=[99])
results in returnval = 99.
Example 7
The data file shown below is a freely-formatted ASCII file named char.dat:
a,b,c,d
e,BAD,g,h
To substitute the string GOOD for a missing value, use the following:
; Initialize value to be returned to string type
returnval=' '
status = DC_READ_FREE(!Data_dir + 'char.dat', returnval, /Col, $
Resize=[1], Get_Col=[2], Miss_Str=['BAD'], MissStr_Val=['GOOD'])
results in returnval = GOOD.
Example 8
The data file shown below is a freely-formatted ASCII file named chemicals.dat:
Elemental Carbon
Sulfate
Benzo[e]pyrene
Indeno[1,2,3-cd]pyrene
n-Heptadecanoic acid
This file has a single column, where the chemical names contain commas and spaces, both of which are typically used as delimiters. To read this file, the Delim keyword needs to specify a delimiter that does not actually exist. For this example we know that there are no Tab characters so Tab is specified as the delimiter (ASCII byte value 9). However, another single character could be specified, such as ‘$’ or ‘Q’.
chemicals = STRARR(1)
status = DC_READ_FREE(!Data_dir + 'chemicals.txt', chemicals, $
/Column, Resize=[1], Delim=[9B] )
PM, chemicals
; PV-WAVE prints:
; Elemental Carbon
; Sulfate
; Benzo[e]pyrene
; Indeno[1,2,3-cd]pyrene
; n-Heptadecanoic acid
See Also
See the PV‑WAVE Programmer’s Guide for more information about free format I/O in PV-WAVE.
note | For an example showing how to use DC_READ_FREE to import data from a Microsoft Excel spreadsheet, see the PV‑WAVE Programmer’s Guide. |
Version 2017.0
Copyright © 2017, Rogue Wave Software, Inc. All Rights Reserved.