Slices
An Essential Math Module vector view describes a slice of data. This slice is defined by a pointer to the beginning of the data, a length, and the increment between successive data items. Indexing into a vector thus requires a multiply as well as an add in order to account for the stride between successive data items. This contrasts with C, where only an add is necessary.
Accessing data through slices can be just as efficient as with a C-style vector. To scan through a regular vector of contiguous elements, a pointer must be incremented by the size of an element; for example, 8 in the case of a double on a typical machine. To scan through a slice requires incrementing by the size of the element times the stride length. The only difference is that the size of the element is known at compile time (always 8), while the stride length is known at run time (8 times stride length). Hence, the former can be held as an immediate operand in a move statement, and the latter must be held as an indirect operand, requiring one more memory access. If the machine has enough registers, the memory access is only required at the start of the scan. From then on, it can be held in a register, and there is no time penalty beyond the initial memory access. The net effect is that the time it takes to multiply two vector slices together, for example, is essentially the same as to multiply two vectors of contiguous elements. Furthermore, much of the Essential Math Module is based on the BLAS package, which was written in terms of slices. (See Basic Linear Algebra Package.)
With access times essentially the same, the challenge is to make the actual construction of a slice as cheap as possible. The Essential Math Module class library does this by making every vector a slice. This makes the construction of a helper class to keep track of the slice unnecessary. Hence, there is very little performance penalty for working with slices instead of vectors of contiguous elements.
Indexing, however, is a different story. This requires a multiply by the stride length with every access, not just the first one. Tests have shown about a 20% slowdown to index random elements in a slice versus a vector of contiguous elements.