Overall Efficiency

SourcePro Analysis : Essential Math Module User’s Guide : Technical Notes : Efficiency : Overall Efficiency

Overall Efficiency

The following comparative example demonstrates the efficiency you can achieve with the Essential Math Module. Consider the Fortran program given below:

double precision a(8100), b(8100), c(8100)

double precision second, start, dt

Niter = 1000

do 500 N=500,4000,500

do 100 i=1,N

a(i) = 3

b(i) = 5

100 continue

c Second returns elapsed time

start = second()

do 200 iter=1, Niter

do 200 i=1, N

c(i) = a(i) * b(i)

200 continue

dt = second() - start

write(6,2000) N, dt, dt/Niter

2000 format(i6, 5x, f8.5, 12x, f8.5)

500 continue

end

Here is the C++ program, using the Essential Math Module:

#include <rw/math/mathvec.h>

#include <iostream.h>

double second(); // Returns elapsed time

void main() {

int Niter = 1000;

for(int N=500; N<=4000; N+=500){

RWMathVec<double> a(N, 3);

RWMathVec<double> b(N, 5);

double start = second();

for(int iter=0; iter<Niter; iter++){

RWMathVec<double> c = a*b;

}

double dt = second() - start;

cout << N << " " << dt << " " << dt/Niter << endl;

}

Notice that the C++ program is much simpler. Also, the Fortran program uses static memory; the memory is declared and defined at compile time. This greatly simplifies the job for the compiler, since the addresses of the arrays are known at compile time, but greatly complicates the job for the programmer, since you must know the maximum size of your arrays in advance. The Essential Math Module version uses dynamic memory; it is obtained at run time.

Despite the use of dynamic memory and built-in slices, the Essential Math Module is faster than Fortran. Because C++ allows all of the code for the various binary and unary operators to be localized in one spot, it becomes easy to write optimization in the critical spots. This is what we have done. With Fortran, you are at the mercy of the compiler.

In this example, the critical code resides inside the binary multiply operator:

template <class T>

RWMathVec<T> operator*(const RWMathVec<double>&,

const RWMathVec<double>&);

This operator requires that three addresses be juggled: the vector operands and the return value. Two of these, the operands, can potentially be slices (that is, their stride may not be 1), which requires that their strides be held as well. Note that other operators, such as:

RWMathVec<double>& RWMathVec<double>::operator+=(const

RWMathVec<double>&);

involve only two operands, and so are much easier to optimize.