Access Large Arrays by Memory Order
When an array’s size is larger than or near the working set size, it should always, if possible, be accessed in memory-address order.
To illustrate some side-effects of the virtual memory environment, consider the process of transposing a large array. Assume the array is a 512-by-512 byte image and there is a 100-kilobyte working set. The array requires 512 x 512 or approximately 250 kilobytes. Clearly, less than half of the image may be in memory at any one instant.
In the transpose operation, each row must be interchanged with the corresponding column. The first row, containing the first 512 bytes of the image, will be read into memory, if necessary, and written to the first column.
Because arrays are stored in row order (the first subscript varies the fastest), one column of the image spans a range of addresses almost equal to the size of the entire image. In order to write the first column, 250,000 bytes of data must be read into physical memory, updated, and written back to the disk. This process must be repeated for each column, requiring the entire array be read and written almost 512 times!
The time required to transpose the array using the above naive method will be on the order of minutes. The TRANSPOSE function transposes large arrays by dividing them into subarrays smaller than the working set size and will transpose a 512-by-512 image in less than 10 seconds.
Example
Consider the operation of the statement:
FOR X = 0, 511 DO FOR Y = 0, 511 DO ARR(X, Y) = ...
This statement will require an extremely large amount of time to execute because the entire array must be transferred between memory and the disk 512 times. The proper form of the statement is to process the points in address order:
FOR Y = 0, 511 DO FOR X = 0, 511 DO ARR(X, Y) = ...
The time savings are at least a factor of 50 for this example.