Programmer Guide > Working with Data Files > Data File Organization
  

Data File Organization
In ASCII files, the file can either be organized by rows or columns; the fact that ASCII files are human-readable helps you interpret their contents. In binary files, however, the organization of the file may be considerably less clear; you need to know something about the application that created the file, and understand the operating system under which the application was running to fully understand the organization of the file.
Column-Oriented ASCII Data Files
A column-oriented data file is one that contains multiple data values arranged in columns; because it is ASCII, the data is human-readable. At the end of each row is a control character, such as Ctrl-J or Ctrl-M, that forces a line feed and carriage return.
In a column-oriented file, the values in each column are related in some way; ultimately, you will probably want to group all the data in each column into a different variable for further analysis. Figure 9-1: Typical File Organization, Column Oriented on page 169 shows the typical file organization for a column-oriented ASCII data import file. In this example, the first column of data is associated with a variable named Month, the second column with a variable named Hour, the third column with a variable named Fahrenheit, the fourth column with a variable named CO, and the fifth column with a variable named SO2.
 
note
Not all files that contain columns of values contain column-oriented data. For example, if you are reading every value in the file into the same variable, the file is probably a row-oriented file, despite its apparent columnar organization. The organization of row-oriented files is discussed further in "Row-Oriented ASCII Data Files".
 
Figure 9-1: Typical File Organization, Column Oriented
Row-Oriented ASCII Data Files
A row-oriented data file is one that contains multiple data values arranged in a continuous stream; because it is ASCII, the data is human-readable. When reading this kind of file, the size of the variables in the variable list determines how many values get transferred. The data type of the variables also influences how the data gets interpreted, because if the data is not the expected type, PV-WAVE performs type conversion as it reads the data. Figure 9-2: Typical File Organization, Row Oriented on page 170 shows a typical file organization for a row-oriented ASCII data import file. Spaces are being used as the delimiter to separate adjacent data values. In this example, the first group of data is associated with a variable named Source, the second group with a variable named Date, the third group with a variable named Bin, the fourth group with a variable named Chute, the fifth group with a variable named Mill, and the sixth group with a variable named Phase_Shift.
 
Figure 9-2: Typical File Organization, Row Oriented
How Long is a Record?
It can be important to understand the concept of records, especially if you are performing certain types of I/O. The following sections discuss records, both in the context of formatted and unformatted data.
Record Length in ASCII (Formatted) Files
In an ASCII text file, the end-of-line is signified by the presence of either a Ctrl-J or a Ctrl-M character, and a record extends from one end-of-line character to the next. However, there are actually two kinds of records:
*physical records
*logical records
For column-oriented files, the amount of data in a physical record is often sufficient to provide exactly one value for each variable in the variable list, and then it is a logical record, as well. For row-oriented files, the concept of logical records is not relevant, since data is merely read as contiguous values separated by delimiters, and the end-of-line is interpreted as yet another delimiter.
Changing the Logical Record Size
If you are using one of the DC_READ routines for simplified I/O, and you are reading column-oriented data, you can use a command line keyword to explicitly define a different logical record size, if you wish. The “DC” routines are introduced in "Functions for Simplified Data Connectivity".
 
note
By default, PV-WAVE considers the physical record to be one line in the file, and the concept of a logical record is not needed. So in most cases, you do not need to define a logical record. But if you are using logical records, the physical records in the file must all be the same length.
For more details about the keywords that control logical record size, refer to the descriptions for the DC_READ_FIXED and DC_READ_FREE routines; these descriptions are found in the PV‑WAVE Reference.
Record Length in Binary (Unformatted) Files
Binary data is a continuous stream of ones and zeros. To fully understand the organization of binary files, you need to know something about the application that created the file, and understand the operating system under which the application was running. You would then choose variables for the variable list that match that organization. The type and size of the variables in the variable list establish a framework by which the ones and zeros in the file are interpreted.
For more information about how the operating system affects the transfer of binary data, refer to "Reading UNIX FORTRAN-Generated Binary Data".
Number of Records in a File
In the UNIX operating system, files are not divided into records, unless the application or individual that created it chose to organize it by records when creating the file.

Version 2017.0
Copyright © 2017, Rogue Wave Software, Inc. All Rights Reserved.