PV-WAVE Advantage > IMSL Statistics Reference Guide > Data Mining > MLFF_INITIALIZE_WEIGHTS Function
MLFF_INITIALIZE_WEIGHTS Function
Initializes weights for multilayered feedforward neural networks prior to network training using one of four user selected methods.
Usage
result = MLFF_INITIALIZE_WEIGHTS (network, nominal, continuous)
Input Parameters
network—A structure of type NN_Network containing the parameters that define the feedforward network’s architecture, including network weights and bias values. For more details, see the MLFF_NETWORK Function. When network training is successful, the weights and bias values in network are replaced with the values calculated for the optimum trained network.
nominal—Array of size n_patterns by n_nominal containing the nominal input variables, where n_patterns is the number of training patterns and n_nominal is the number of unencoded nominal attributes.
continuous—Array of size n_patterns by n_continuous containing the continuous and scaled ordinal input variables, where n_continuous is the number of continuous attributes, including ordinal attributes encoded using cumulative percentage.
Returned Value
An array of length:
network.n_links + (network.n_nodes – network.n_inputs) 
containing the initialized weights. See the Discussion section for details on weight ordering.
Input Keywords
Init_weights_method—Specifies the algorithm to use for initializing weights. Init_weights_method contains the weight initialization method to be used. Valid values for Init_weights_method are listed in Table 14-18: Init_weights_method Values.
 
Init_weights_method Values
Value
Enumeration
Description
1
IMSLS_EQUAL
Equal weights
2
IMSLS_RANDOM
Random Weights
3
IMSLS_PRINCIPAL_COMPONENTS
Principal Component Weights
4
IMSLS_DISCRIMINANT
Discriminant Analysis Weights
The discriminant weights method can only be used to initialize weights for classification networks without binary encoded nominal attributes. See the Discussion section for details. Default: Init_weights_method = IMSLS_RANDOM.
Print—If present and nonzero, this option turns on printing of the initial weights. By default, initial weights are not printed.
Classification—An array of length n_patterns containing the encoded training target classifications which must be integers from 0 to n_classes–1. Here n_classes=network.n_outputs except when n_outputs = 1 then n_classes = 2. Classification(i)is the target classification for the ith training pattern described by nominal(i) and continuous(i). This option is used by the discriminant analysis weight initialization. This option is ignored for all other methods.
Discussion
MLFF_INITIALIZE_WEIGHTS calculates initial values for the weights of a feedforward neural network using one of the following algorithms:
*IMSLS_EQUAL—Equal weights
*IMSLS_RANDOM—Random Weights
*IMSLS_PRINCIPAL_COMPONENTS—Principal Component Weights
*IMSLS_DISCRIMINANT—Discriminant Analysis Weights
The Init_weights_method keyword can be used to select the algorithm for weight initialization. By default, the random weights algorithm will be used.
The 3-layer feed forward network with 3 input attributes and 6 perceptrons in Figure 14-8: 3-layer, Feed Forward Network with 3 Input Attributes and 6 Perceptrons is used to describe the initialization algorithms. In this example, one of the input attributes is continuous (X3) and the others are nominal (X1 and X2).
 
Figure 14-8: 3-layer, Feed Forward Network with 3 Input Attributes and 6 Perceptrons
This network has a total of 23 weights. The first nine weights, labeled W1, W2, ..., W9, are the weights assigned to the links connecting the network inputs to the perceptrons in the first hidden layer. Note that W1, W2, W4, W5, W7, and W8 are assigned to the two nominal attributes and W3, W6 and W9 are assigned to the continuous attribute. All neural network functions in the C Numerical Library use this weight ordering. Weights for all nominal attributes are placed before the weights for any continuous attributes.
 
 
Perceptron
Potential
H1,1
g1 = W18 + W1X1 + W2X2 + W3X3
H1,2
g2 = W19 + W4X1 + W5X2 + W6X3
H1,3
g3 = W20 + W7X1 + W8X2 + W9X3
H2,1
g4 = W21 + W10g1 + W11g2 + W12g3
H2,2
g5 = W22 + W13g1 + W14g2 + W15g3
Z1
g6 = W23 + W16g4 + W17g5
The next six weights are the weights between the first and second hidden layers, and W16 and W17 are the weights for the links connecting the second hidden layer to the output layer. The last six elements in the weights array are the perceptron bias weights. These weights, W18, W19, ..., W23 are the weights for perceptrons H1,1, ..., H1,3, H2,1, ..., H2,3, and Z1, respectively.
The perceptron potential calculations for this network are described in the table above. Following the notation presented in the introduction to this chapter, g1, g2, ..., g5 are the perceptron activations from perceptrons H1,1, ..., H1,3, H2,1, ..., H2,3, respectively.
All initialization algorithms in MLFF_INITIALIZE_WEIGHTS set the weights for perceptrons not linked directly to the input perceptrons in the same manner. Bias weights for perceptrons not directly linked to input attributes are set to zero. All non-bias weights for these same perceptrons are assigned a value of 1/k where k=the number of links into that perceptron (network.nodes(i).n_inlinks).
For example, in this network, the last three bias weights W21, W22 and W23 are initialized to zero since perceptrons H2,1, H2,2 and Z1 and not directly connected to the input attributes. The other weights to perceptrons H2,1 and H2,2 are assigned a value of one half since these perceptrons each have only two input links. The weights to the output perceptron, Z1, are also one half since Z1 has two inputs links.
The calculation of the weights for the links between the input attributes and their perceptrons are initialized differently by the four algorithms. All algorithms, however, scale these weights so that the average potential for the first layer perceptrons is zero. This reduces the possibility of saturation or numerical overflow during the initial stages of optimization.
Equal Weights (Init_weights_method=IMSLS_EQUAL)
In this algorithm, the non-bias weights for each link between the input attributes and the perceptrons in the first layer are initialized to:
where Wi is the weight for all links between the ith input attributes, n is equal to the total number of input attributes and Si is equal to the standard deviation of the potential for the ith input attribute. In the above example, the values for weights W1, W2, ..., W9, each would be set to:
since this network has three input attributes.
Next the average potential for each of the perceptrons connected to the input layer is calculated by:
where is equal to the average potential for the ith input attribute. All other bias weights are set to zero.
Random Weights (Init_weights_method=IMSLS_RANDOM)
This algorithm first generates random values for the input layer weights using the Uniform [–0.5, +0.5] distribution. These are then scaled using the standard deviation of the input layer potentials.
where U is a random number uniformly distributed on the interval [–0.5, +0.5] and and Si is equal to the standard deviation of the potential for the ith input attribute.
Next the average potential for each of the perceptrons connected to the input layer is calculated by:
where is equal to the average potential for the ith input attribute. All other bias weights are set to zero.
Principal Component Weights (Init_weights_method=IMSLS_PRINCIPAL_COMPONENTS)
This uses principal component analysis to generate weights. The arrays nominal and continuous are combined into a single matrix. The correlation matrix of this matrix is decomposed using principal component analysis. The elements of the principal components from this analysis are used to initialize weights associated with the network inputs. As with the other methods the principal component weights are scaled by using the standard deviation of the potential for the perceptrons connected to the input layer:
where Wi is the weight for the link between the ith input attribute and the jth perceptron, ξij is the ith value of the jth principal component, and Si is equal to the standard deviation of the potential for the ith input attribute.
If the number of principal components is less than the number of perceptrons in the first layer, i.e., (n_continuous+n_nominal) < n_layer1, where n_layer1 is the number of perceptrons in the first layer, then it is not possible to initialize all weights with principal components. In this case, the first (n_continuous + n_nominal) perceptrons are initialized using the principal components and then the remainder are initialized using random weights (Init_weights_method=IMSLS_RANDOM).
As with the other methods, the bias weights for each of the first layer perceptrons is set to ensure that the average potential in this layer is equal to zero:
where is equal to the average potential for the link between ith input attribute and the jth first layer perceptron, and Sij is the standard deviation for this same potential.
Discriminant Weights (Init_weights_method=IMSLS_DISCRIMINANT)
This method is very similar to principal component weights. Instead the discriminant analysis elements replace the principal component elements. The weights between the ith input attribute and the jth perceptron in the first layer are calculated by:
Where Wi is the weight for the link between the ith input attribute and the jth perceptron, θij is the ith value of the jth discriminant component, and Si is equal to the standard deviation of the potential for the ith input attribute.
If the number of discriminant components is less than the number of perceptrons in the first layer, i.e., (n_continuous + n_nominal) < n_layer1, where n_layer1 is the number of perceptrons in the first layer, then it is not possible to initialize all weights with components from the discriminant analysis. In this case, the first (n_continuous + n_nominal) perceptrons are initialized using the discriminant components and then the remainder are initialized using random weights (Init_weights_method=IMSLS_RANDOM).
As with the other methods, the bias weights for each of the first layer perceptrons is set to ensure that the average potential in this layer is equal to zero:
where is equal to the average potential for the link between ith input attribute and the jth first layer perceptron, and Sij is the standard deviation for this same potential.
Example 1
This example illustrates random initialization algorithms for a three layer network with one output. The first and second hidden layers contain three and two perceptrons for a total of five network perceptrons, respectively.
The nine input attributes consist of two continuous attributes plus seven binary attributes encoded from two nominal attributes using binary encoding.
The weights are initialized using the random weights algorithm. This results in different weights for every perceptron in the first hidden layer. The weights in other layers are initialized using equal weights. It should be noted that the bias weights in the first layer are not random. Except for the discriminant weights algorithm, the bias weights are always calculated to ensure that the average potential for each perceptron in the first layer is zero.
PRO mlff_initialize_weights_ex1
    
   n_patterns    = 24L    ; no. of training patterns  
   n_nvars       = 2L     ; 2 nominal unencoded variables  
   n_nominal     = 7L     ; 7 inputs for the binary encoded  
                         ; nominal vars
   n_inputs      = 9L 
   n_outputs     = 1L 
   n_continuous  = 2L     ; 2 continuous input attributes  
   nominalIn = LONARR(n_patterns)  ; work arrays used to encode        
   classification = $ 
      [0L, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,$ 
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1] 
   ; raw nominal input data  
    nominal_unencoded = $
      [0, 0, 0, 1, 0, 2, $ 
       1, 0, 1, 1, 1, 2, $
       2, 0, 2, 1, 2, 2, $
       3, 0, 3, 1, 3, 2, $
       0, 0, 0, 1, 0, 2, $
       1, 0, 1, 1, 1, 2, $
       2, 0, 2, 1, 2, 2, $
       3, 0, 3, 1, 3, 2 ]
 
   ; Input array for binary encoded version of  
   ; nominal_unencododed() array above.
   nominal = FLTARR(n_patterns,n_nominal) 
   continuous= $       ; (n_patterns,n_continuous)
   [ [0.00,0.02,0.04,0.06,0.08,0.10,0.12,0.14,$
      0.16,0.18,0.20,0.22,0.24,0.26,0.28,0.30,$
      0.32,0.34,0.36,0.38,0.40,0.42,0.44,0.46],$
     [0.00,0.02,0.04,0.06,0.08,0.10,0.12,0.14,$
      0.16,0.18,0.20,0.22,0.28,0.30,0.32,0.34,$
      0.36,0.38,0.40,0.42,0.44,0.46,0.48,0.50]]
 
   ; Set up nominal input attributes using binary encoding.  
   m=0 
   FOR i=0L, n_nvars-1 DO BEGIN 
      FOR j=0L, n_patterns-1 DO BEGIN 
         nominalIn(j) = nominal_unencoded(2*j+i) + 1; 
      ENDFOR 
      nominalOut = UNSUPERVISED_NOMINAL_FILTER(nominalIn,$  
                                      N_classes=n_classes) 
      FOR k=0L, n_classes-1 DO BEGIN 
         FOR j=0L, n_patterns-1 DO BEGIN 
            nominal(j,m) = nominalOut(j,k); 
         ENDFOR 
         m = m+1 
      ENDFOR 
   ENDFOR 
 
   PRINT,"INPUT TRAINING PATTERNS" 
   PRINT,"Y   Nom1 Nom2    X0       X1" 
   FOR i=0L, n_patterns-1 DO BEGIN
      PRINT,STRTRIM(classification(i),2),"     ",$ 
         STRTRIM(nominal_unencoded(i*2),2),"    ",$
         STRTRIM(nominal_unencoded(i*2+1),2),"   ",$  
         STRING(continuous(i,0),Format="(f6.4)"),"   ",$
         STRING(continuous(i,1),Format="(f6.4)") 
   ENDFOR
 
   ; Binary classification network 9 inputs 1 output=2 classes.
   network = MLFF_NETWORK_INIT(n_inputs, n_outputs) 
   network = MLFF_NETWORK(network, Create_hidden=3) 
   network = MLFF_NETWORK(network, Create_hidden=2)  
   network = MLFF_NETWORK(network,/Link_all)
 
   ; Note the following statement is for repeatable output.
   RANDOMOPT,Set=5555 
 
   ; Random Weights  
   weights = MLFF_INITIALIZE_WEIGHTS(network, $
                            nominal, continuous,$
                            /Print)
END
Output
INPUT TRAINING PATTERNS
Y   Nom1 Nom2    X0       X1
0     0    0   0.0000   0.0000
0     0    1   0.0200   0.0200
0     0    2   0.0400   0.0400
0     1    0   0.0600   0.0600
0     1    1   0.0800   0.0800
0     1    2   0.1000   0.1000
1     2    0   0.1200   0.1200
1     2    1   0.1400   0.1400
1     2    2   0.1600   0.1600
1     3    0   0.1800   0.1800
1     3    1   0.2000   0.2000
1     3    2   0.2200   0.2200
0     0    0   0.2400   0.2800
0     0    1   0.2600   0.3000
0     0    2   0.2800   0.3200
0     1    0   0.3000   0.3400
0     1    1   0.3200   0.3600
0     1    2   0.3400   0.3800
1     2    0   0.3600   0.4000
1     2    1   0.3800   0.4200
1     2    2   0.4000   0.4400
1     3    0   0.4200   0.4600
1     3    1   0.4400   0.4800
1     3    2   0.4600   0.5000
 
-------------------------------------------
-        NETWORK WEIGHTS INITIALIZED USING 
-        RANDOM WEIGHTS
-         Input Attributes:    9
-            Nominal:          2
-            Nominal(encoded): 7
-            Continuous:       2
-         Output Attributes:   1
-            n_classes:        2
-         Layers:              3
-         Perceptrons:         6
-         Weights:             41
-         Patterns:            24
-------------------------------------------
------------- HIDDEN  LAYER 1 -------------
 
        --- Perceptron 0 ---
        Link from Input Node     Weight 
        N0                       0.937069 
        N1                       -0.547569 
        N2                       1.468247 
        N3                       0.107160 
        N4                       -0.884992 
        N5                       -0.814069 
        N6                       -1.979680 
        X7                       -0.041228 
        X8                       -1.368315 
        Bias 3.3099 
 
        --- Perceptron 1 ---
        Link from Input Node     Weight 
        N0                       -0.308421 
        N1                       -1.058450 
        N2                       -0.981207 
        N3                       1.040820 
        N4                       -0.033493 
        N5                       -0.575732 
        N6                       0.571939 
        X7                       0.811886 
        X8                       -0.415498 
        Bias 0.573287 
 
        --- Perceptron 2 ---
        Link from Input Node     Weight 
        N0                       -1.117744 
        N1                       0.620799 
        N2                       0.174895 
        N3                       -0.100458 
        N4                       -0.961071 
        N5                       0.854179 
        N6                       0.046423 
        X7                       0.880998 
        X8                       -0.903982 
        Bias 1.00437 
 
-------------------------------------------
------------- HIDDEN  LAYER 2 -------------
 
        --- Perceptron 0 ---
        Link from Input Node     Weight 
        P0                       0.333333 
        P1                       0.333333 
        P2                       0.333333 
        Bias 0 
 
        --- Perceptron 1 ---
        Link from Input Node     Weight 
        P0                       0.333333 
        P1                       0.333333 
        P2                       0.333333 
        Bias 0 
 
-------------------------------------------
------------- OUTPUT   LAYER  -------------
 
        --- Perceptron 0 ---
        Link from Input Node     Weight 
        P3                       0.500000 
        P4                       0.500000 
        Bias 0 
 
-------------------------------------------
Example 2
This example illustrates the discriminant weights initialization algorithm for a three layer network with one output. The first and second hidden layers contain three and two perceptrons for a total of five network perceptrons, respectively.
The data is the same as Example 1, and the network structure is the same except that all nominal input attributes are removed. This was necessary since the discriminant weights algorithm only works when all input attributes are continuous.
The discriminant weights algorithm initializes the weights in the first hidden layer to the coefficients of the discriminant functions. Since this example is a binary classification example, the number of discriminant functions is equal to the number of classes, two, but there are three perceptrons in the first layer. The weights for the first two perceptrons in this layer are the discriminant function coefficients, including the bias weight. The weights for the last perceptron in this layer were determined randomly.
PRO mlff_initialize_weights_ex2    
 
   ; intialize CNL constants and COMMON blocks
   @CMAST_COMMON
 
   n_patterns    =24L    ; no. of training patterns  
   n_continuous  =2L     ; 2 continuous input attributes
   n_inputs      =2L
   n_outputs     =1L  
   classification = $ 
      [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,$
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1] 
   continuous= $                   ;(n_patterns,n_continuous)
   [ [0.00,0.02,0.04,0.06,0.08,0.10,0.12,0.14,$
      0.16,0.18,0.20,0.22,0.24,0.26,0.28,0.30,$
      0.32,0.34,0.36,0.38,0.40,0.42,0.44,0.46],$
     [0.00,0.02,0.04,0.06,0.08,0.10,0.12,0.14,$
      0.16,0.18,0.20,0.22,0.28,0.30,0.32,0.34,$
      0.36,0.38,0.40,0.42,0.44,0.46,0.48,0.50]] 
 
   PRINT,"INPUT TRAINING PATTERNS" 
   PRINT,"Y        X0       X1" 
   FOR i=0L, n_patterns-1 DO BEGIN
      PRINT,STRTRIM(classification(i),2),"     ",$ 
         STRING(continuous(i,0),Format="(f6.4)"),"   ",$
         STRING(continuous(i,1),Format="(f6.4)") 
   ENDFOR   
 
   ; Binary classification network 2 inputs 1 output=2 classes
   network = MLFF_NETWORK_INIT(n_inputs, n_outputs) 
   network = MLFF_NETWORK(network, Create_hidden=3) 
   network = MLFF_NETWORK(network, Create_hidden=2)  
   network = MLFF_NETWORK(network,/Link_all)
   
   ; Discriminant weights  
   ; Set seed for consistent results  
   RANDOMOPT, set=12357 
   weights = MLFF_INITIALIZE_WEIGHTS(network,$
              0L, $       ;Note: there are no nominal attributes
              continuous, $  
              Init_weights_method=IMSLS_DISCRIMINANT, $  
              Classification=classification, $ 
              /Print)
END
Output
INPUT TRAINING PATTERNS
Y        X0       X1
0     0.0000   0.0000
0     0.0200   0.0200
0     0.0400   0.0400
0     0.0600   0.0600
0     0.0800   0.0800
0     0.1000   0.1000
1     0.1200   0.1200
1     0.1400   0.1400
1     0.1600   0.1600
1     0.1800   0.1800
1     0.2000   0.2000
1     0.2200   0.2200
0     0.2400   0.2800
0     0.2600   0.3000
0     0.2800   0.3200
0     0.3000   0.3400
0     0.3200   0.3600
0     0.3400   0.3800
1     0.3600   0.4000
1     0.3800   0.4200
1     0.4000   0.4400
1     0.4200   0.4600
1     0.4400   0.4800
1     0.4600   0.5000
Discriminant Analysis Classification Error Rate = 0.000000 
 
-------------------------------------------
-        NETWORK WEIGHTS INITIALIZED USING 
-        DISCRIMINANT WEIGHTS
-         Input Attributes:    2
-            Nominal:          0
-            Nominal(encoded): 0
-            Continuous:       2
-         Output Attributes:   1
-            n_classes:        2
-         Layers:              3
-         Perceptrons:         6
-         Weights:             20
-         Patterns:            24
-------------------------------------------
------------- HIDDEN  LAYER 1 -------------
 
        --- Perceptron 0 ---
        Link from Input Node     Weight 
        X0                       229.164703 
        X1                       -189.879257 
        Bias -2.13361 
 
        --- Perceptron 1 ---
        Link from Input Node     Weight 
        X0                       889.166504 
        X1                       -755.595093 
        Bias -12.5051 
 
        --- Perceptron 2 ---
        Link from Input Node     Weight 
        X0                       -4.495895 
        X1                       -0.976034 
        Bias 6.07218 
 
-------------------------------------------
------------- HIDDEN  LAYER 2 -------------
 
        --- Perceptron 0 ---
        Link from Input Node     Weight 
        P0                       0.333333 
        P1                       0.333333 
        P2                       0.333333 
        Bias 0 
 
        --- Perceptron 1 ---
        Link from Input Node     Weight 
        P0                       0.333333 
        P1                       0.333333 
        P2                       0.333333 
        Bias 0 
 
-------------------------------------------
------------- OUTPUT   LAYER  -------------
 
        --- Perceptron 0 ---
        Link from Input Node     Weight 
        P3                       0.500000 
        P4                       0.500000 
        Bias 0 
 
-------------------------------------------