MLFF_NETWORK_TRAINER Function
Trains a multilayered feedforward neural network.
Usage
result = MLFF_NETWORK_TRAINER (network, categorical, continuous, output)
Input Parameters
network—A structure containing the feedforward network. See the MLFF_NETWORK Function. On return, the weights and bias values are updated.
categorical—Array of size n_patterns by n_nominal containing values for the nominal input attributes, where n_patterns is the number of network training patterns, and n_nominal is the number of nominal attributes. The ith row contains the nominal input attributes for the ith training pattern..
continuous—Array of size n_patterns by n_continuous containing values for the continuous input attributes, where n_continuous is the number of continuous attributes. The ith row contains the continuous input attributes for the ith training pattern.
output—Array of size n_patterns by n_outputs containing the output training patterns, where n_outputs is the number of output perceptrons in the network. n_outputs = network.n_outputs. For more details, see the MLFF_NETWORK Function.
Returned Value
result—Array of length 5 containing the summary statistics from the network training, organized as follows:
*result(0) = Error sum of squares at the optimum.
*result(1) = Total number of Stage I iterations.
*result(2) = Smallest error sum of squares after Stage I training.
*result(3) = Total number of Stage II iterations.
*result(4) = Smallest error sum of squares after Stage II training.
If training is unsuccessful, NULL is returned.
Input Keywords
Grad_tol—A scalar float defining the scaled gradient tolerance in the optimizer. Default: Grad_tol = ε1∕2, where ε is the machine precision, ε1∕3 is used in double precision.
Max_fcn—A scalar long value indicating the maximum number of function evaluations in the optimizer, per epoch. Default: Max_fcn = 400.
Max_itn—A scalar long value indicating the maximum number of iterations in the optimizer, per epoch. Default: Max_itn = 1000.
Max_step—A scalar float value indicating the maximum allowable step size in the optimizer. Default: Max_step = 1000.
No_stage_II—If present and non-zero, Stage II training is not performed. By default, in Stage I, network weights are learned using a steepest descent optimization. Stage II begins with these weights and uses a Quasi-Newton optimization to seek improved values. Default: Stage II training is performed.
Print—If present and nonzero, this option turns on printing of the intermediate results during network training. By default, intermediate results are not printed.
Rel_fcn_tol—A scalar float defining the relative function tolerance in the optimizer. By default the tolerance is: Rel_fcn_tol = max (10-10, ε2/3), max (10-20, ε2/3) in double precision, where ε is the machine precision.
Tolerance—A scalar float value indicating the absolute accuracy tolerance for the sum of squared errors in the optimizer. Default: Tolerance = 0.1.
Stage_I—A two element integer array, [n_epochs, epoch_size], where n_epochs is the number of epochs used for Stage I training and epoch_size is the number of observations used during each epoch. If epoch training is not needed, set epoch_size = n_patterns and n_epochs = 1. By default, n_epochs = 15, epoch_size = n_patterns.
Init_weights_method—Specifies the algorithm to use for initializing weights. Init_weights_method contains the weight initialization method to be used. Valid values for Init_weights_method are listed in Table 14-19: Init_weights_method Values.
 
Init_weights_method Values
Value
Enumeration
Description
0
IMSLS_NN_NETWORK
No initialization method will be performed. Weights in NN_Network structure network will be used instead.
1
IMSLS_EQUAL
Equal weights
2
IMSLS_RANDOM
Random Weights
3
IMSLS_PRINCIPAL_COMPONENTS
Principal Component Weights
See the MLFF_INITIALIZE_WEIGHTS Function for a detailed description of the initialization methods. Default: Init_weights_method = IMSLS_RANDOM.
Output Keywords
Residuals—Array of size n_patterns by n_outputs containing the residuals for each observation in the training data, where n_outputs is the number of output perceptrons in the network:
n_outputs = network.n_outputs
Gradients—Array of size n_links + n_nodesn_inputs containing the gradients for each weight found at the optimum training stage, where:
n_links = network.n_links
n_nodes = network.n_nodes
n_inputs = network.n_inputs
Weights—This keyword has been deprecated starting with version 10.0 of PV-WAVE.
Forecasts—Array of size n_patterns by n_outputs, where n_outputs is the number of output perceptrons in the network:
n_outputs = network.layers(network.n_layers-1).n_nodes
The values of the ith row are the forecasts for the outputs for the ith training pattern.
Discussion
MLFF_NETWORK_TRAINER trains a multilayered feedforward neural network returning the forecasts for the training data, their residuals, the optimum weights and the gradients associated with those weights. Linkages among perceptrons allow for skipped layers, including linkages between inputs and perceptrons. The linkages and activation function for each perceptron, including output perceptrons, can be individually configured. For more details, see the Link_all, Link_layer, and Link_node keywords in MLFF_NETWORK Function.
Training Data
Neural network training patterns consist of the following three types of data:
1. categorical input attributes
2. continuous input attributes
3. continuous output classes
The first data type contains the encoding of any nominal input attributes. If binary encoding is used, this encoding consists of creating columns of zeros and ones for each class value associated with every nominal attribute. If only one attribute is used for input, then the number of columns is equal to the number of classes for that attribute. If more columns appear in the data, then each nominal attribute is associated with several columns, one for each of its classes.
Each column consists of zeros, if that classification is not associated with this case, otherwise, one if that classification is associated. Consider an example with one nominal variable and two classes: male and female (male, male, female, male, female). With binary encoding, the following matrix is sent to the training engine to represent this data:
Continuous input and output data is passed to the training engine using two double precision arrays: continuous and outputs. The number of rows in each of these matrices is n_observations. The number of columns in continuous and outputs, corresponds to the number of input and output variables, respectively.
Network Configuration
The network configuration consists of the following:
*the number of inputs and outputs
*the number of hidden layers
*a description of the number of perceptrons in each layer
*and a description of the linkages among the perceptrons
This description is passed into MLFF_NETWORK_TRAINER using the structure NN_Network. See the MLFF_NETWORK Function.
Training Efficiency
The training efficiency determines the time it takes to train the network. This is controlled by several factors. One of the most important factors is the initial weights used by the optimization algorithm. These are taken from the initial values provided in the structure NN_Network, network.links(i).weight. Equally important are the scaling and filtering applied to the training data.
In most cases, all variables, particularly output variables, should be scaled to fall within a narrow range, such as [0, 1]. If variables are unscaled and have widely varied ranges, then numerical overflow conditions can terminate network training before an optimum solution is calculated.
Output
Output from MLFF_NETWORK_TRAINER consists of scaled values for the network outputs, a corresponding forecast array for these outputs, a weights array for the trained network, and the training statistics. The NN_Network structure is updated with the weights and bias values and can be used as input to the MLFF_NETWORK_FORECAST Function. For more details about the weights and bias values, see Table 14-17: Structure Members and Their Descriptions.
Example
This example trains a two-layer network using 100 training patterns from one nominal and one continuous input attribute. The nominal attribute has three classifications which are encoded using binary encoding. This results in three binary network input columns. The continuous input attribute is scaled to fall in the interval [0,1].
The network training targets were generated using the relationship:
Y = 10*X1 + 20*X2 + 30*X3 + 2.0*X4
where X1, X2, X3 are the three binary columns, corresponding to the categories 1-3 of the nominal attribute, and X4 is the scaled continuous attribute.
The structure of the network consists of four input nodes and two layers, with three perceptrons in the hidden layer and one in the output layer. Figure 14-9: A 2-layer, Feedforward Network with 4 Inputs and 1 Output illustrates this structure:
 
Figure 14-9: A 2-layer, Feedforward Network with 4 Inputs and 1 Output
 
 
 
 
There are a total of 15 weights and 4 bias weights in this network. In the output below 19 weight values are printed. Weights 0–14 correspond to the links between the network nodes. Weights 15–18 correspond to the bias values associated with the four non-input layer nodes, X4, X5, X6, and X7. The activation functions are all linear.
Since the target output is a linear function of the input attributes, linear activation functions guarantee that the network forecasts will exactly match their targets. Of course, the same result could have been obtained using multiple regression. Printing (the Print keyword is set to 1) is turned on to show progress during the training session.
n_obs  = 100
n_cat  = 3
n_cont = 1
 
categorical = [ $
   1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, $
   0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, $
   0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, $
   1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, $
   0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, $
   0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, $
   0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, $
   0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, $
   1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, $
   0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, $
   1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, $
   0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, $
   0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, $
   1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, $
   0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1]
 
; To see the categorical 3-element vars by row, do:
; PM, TRANSPOSE(REFORM(categorical, 3,100))
 
continuous = [ $
   4.007054658, 7.10028447,  4.740350984, 5.714553211, $
   6.205437459, 2.598930065, 8.65089967,  5.705787357, $
   2.513348184, 2.723795955, 4.1829356,   1.93280416, $
   0.332941608, 6.745567628, 5.593588463, 7.273544478, $
   3.162117939, 4.205381208, 0.16414745,  2.883418275, $
   0.629342241, 1.082223406, 8.180324708, 8.004894314, $
   7.856215418, 7.797143157, 8.350033996, 3.778254431, $
   6.964837082, 6.13938006,  0.48610387,  5.686627923, $
   8.146173848, 5.879852653, 4.587492779, 0.714028533, $
   7.56324211,  8.406012623, 4.225261454, 6.369220241, $
   4.432772218, 9.52166984,  7.935791508, 4.557155333, $
   7.976015058, 4.913538616, 1.473658514, 2.592338905, $
   1.386872932, 7.046051685, 1.432128376, 1.153580985, $
   5.6561491,   3.31163251,  4.648324851, 5.042514515, $
   0.657054195, 7.958308093, 7.557870384, 7.901990083, $
   5.2363088,   6.95582150,  8.362167045, 4.875903563, $
   1.729229471, 4.380370223, 8.527875685, 2.489198107, $
   3.711472959, 4.17692681,  5.844828801, 4.825754155, $
   5.642267843, 5.339937786, 4.440813223, 1.615143829, $
   7.542969339, 8.100542684, 0.98625265,  4.744819569, $
   8.926039258, 8.813441887, 7.749383991, 6.551841576, $
   8.637046998, 4.560281415, 1.386055087, 0.778869034, $
   3.883379045, 2.364501589, 9.648737525, 1.21754765, $
   3.908879368, 4.253313879, 9.31189696,  3.811953836, $
   5.78471629,  3.414486452, 9.345413015, 1.024053777]
 
continuous = continuous/10.0
 
output = FLTARR(n_obs, /Nozero)
FOR i=0L, n_obs-1 DO output(i) = (10 * categorical(i*3)) $
   + (20 * categorical(i*3+1)) $
   + (30 * categorical(i*3+2)) $
   + (20 * continuous(i))
 
; Reform the categorical array to be 2D (three columns 
; corresponding to the three categorical variables, 100 
; observations each.
categorical = TRANSPOSE(REFORM(categorical, 3,100))      
 
ff_net = MLFF_NETWORK_INIT(4, 1)
ff_net = MLFF_NETWORK(ff_net, Create_hidden_layer=3)
ff_net = MLFF_NETWORK(ff_net, /Link_all)
ff_net = MLFF_NETWORK(ff_net, Activation_fcn_layer_id=1, $
   Activation_fcn_values=[1,1,1])
 
RANDOMOPT, Set=12345
stats = MLFF_NETWORK_TRAINER(ff_net, $
   categorical, $
   continuous, $
   output, $
   /Print, $
   Rel_fcn_tol=1.0e-20, $
   Grad_tol=1.0e-20, $
   Max_step=5.0, $
   Max_fcn=1000, $
   Tolerance=1.0e-5, $
   Stage_I=[10,100], $
   Forecasts=forecasts, $ 
   Residuals=residuals)
 
PRINT, 'Error sum of squares at the optimum: ', stats(0)
PRINT, 'Total number of Stage I iterations:  ', stats(1)
PRINT, 'Smallest error sum of squares after Stage I training: ', $
   stats(2)
PRINT, 'Total number of Stage II iterations: ', stats(3)
PRINT, 'Smallest error sum of squares after Stage II ' + $
   'training: ', stats(4)
PRINT
 
PM, [[output(90:99)], [forecasts(90:99)], [residuals(90:99)]], $
   Title='Model Fit for Last Ten Observations:'
Output
 TRAINING PARAMETERS:
  Stage II Opt.   = 1
  n_epochs        = 10
  epoch_size      = 100
  max_itn         = 1000
  max_fcn         = 1000
  max_step        = 5.000000
  rfcn_tol        = 1e-020
  grad_tol        = 1e-020
  tolerance       = 0.000010
 
STAGE I TRAINING STARTING
Stage I: Epoch 1 - Epoch Error SS = 4349.96 (Iterations=7)
Stage I: Epoch 2 - Epoch Error SS = 3406.89 (Iterations=7)
Stage I: Epoch 3 - Epoch Error SS = 4748.62 (Iterations=7)
Stage I: Epoch 4 - Epoch Error SS = 1825.62 (Iterations=7)
Stage I: Epoch 5 - Epoch Error SS = 3353.35 (Iterations=7)
Stage I: Epoch 6 - Epoch Error SS = 3771.22 (Iterations=7)
Stage I: Epoch 7 - Epoch Error SS = 2769.11 (Iterations=7)
Stage I: Epoch 8 - Epoch Error SS = 3781.3 (Iterations=9)
Stage I: Epoch 9 - Epoch Error SS = 2404.1 (Iterations=7)
Stage I: Epoch 10 - Epoch Error SS = 4350.14 (Iterations=7)
 
 
STAGE I FINAL ERROR SS = 1825.617676
 
OPTIMUM WEIGHTS AFTER STAGE I TRAINING:
weight[0] = -2.31313 weight[1] = 0.389252   weight[2] = 1.89219
weight[3] = 1.76989 weight[4] = -0.975819   weight[5] = 0.91344
weight[6] = 2.38119 weight[7] = 1.42829    weight[8] = -2.60983
weight[9] = 1.09477 weight[10] = 3.04915   weight[11] = 2.49006
weight[12] = 7.95465 weight[13] = 10.7354 weight[14] = 10.2354 
weight[15] = -0.614357 weight[16] = 1.22405 
weight[17] = 1.72196 weight[18] = 4.6308
 
STAGE II TRAINING USING QUASI-NEWTON
 
STAGE II FINAL ERROR SS = 0.319787
 
OPTIMUM WEIGHTS AFTER STAGE II TRAINING:
weight[0] = -6.81913 weight[1] = -7.35462 weight[2] = -3.6998
weight[3] = 5.64984 weight[4] = -0.740951 weight[5] = 1.21874
weight[6] = -0.726229 weight[7] = 4.05967 weight[8] = -2.42175
weight[9] = -0.580469 weight[10] = 4.85256 weight[11] = 3.45859
weight[12] = 10.4209 weight[13] = 16.9226 weight[14] = 20.8385
weight[15] = -0.944827 weight[16] = -0.143303 
weight[17] = -1.44022 weight[18] = 4.91185
 
 
GRADIENT AT THE OPTIMUM WEIGHTS
g[0] =   0.031620        weight[0] =    -6.819134
g[1] =  -0.245708        weight[1] =    -7.354622
g[2] =  -0.551255        weight[2] =    -3.699798
g[3] =  -0.550290        weight[3] =     5.649842
g[4] =   1.109601        weight[4] =    -0.740951
g[5] =   0.120410        weight[5] =     1.218741
g[6] =  -2.830182        weight[6] =    -0.726229
g[7] =  -0.902171        weight[7] =     4.059670
g[8] =   0.197736        weight[8] =    -2.421750
g[9] =   1.453753        weight[9] =    -0.580469
g[10] = -0.178615        weight[10] =    4.852565
g[11] =  0.208161        weight[11] =    3.458595
g[12] = -0.166397        weight[12] =   10.420921
g[13] = -0.396238        weight[13] =   16.922632
g[14] = -0.924226        weight[14] =   20.838531
g[15] = -0.765342        weight[15] =   -0.944827
g[16] = -1.600171        weight[16] =   -0.143303
g[17] =  1.472873        weight[17] =   -1.440215
g[18] = -0.417480        weight[18] =    4.911847
 
Training Completed
 
Error sum of squares at the optimum:       0.319787
Total number of Stage I iterations:         7.00000
Smallest error sum of squares after Stage I training:   1825.62
Total number of Stage II iterations:        1015.00
Smallest error sum of squares after Stage II training: 0.319787
 
 
Model Fit for Last Ten Observations:
       49.2975       49.0868     -0.210632
       32.4351       32.3968    -0.0383034
       37.8178       37.7850    -0.0328064
       38.5066       38.4759    -0.0307388
       48.6238       48.5395    -0.0843124
       37.6239       37.5900    -0.0339203
       41.5694       41.5405    -0.0289001
       36.8290       36.7881    -0.0409050
       48.6908       48.5958    -0.0950241
       32.0481       32.0286    -0.0194893