PV-WAVE Advantage > IMSL Statistics Reference Guide > Data Mining > MLFF_CLASSIFICATION_TRAINER Function
MLFF_CLASSIFICATION_TRAINER Function
rains a multilayered feedforward neural network for classification.
Usage
result = MLFF_CLASSIFICATION_TRAINER (network,
classification, nominal, continuous)
Input Parameters
network—A structure of type NN_Network containing the feedforward network’s architecture, including network weights and bias values. For more details, see the MLFF_NETWORK Function. When network training is successful, the weights and bias values in network are replaced with the values calculated for the optimum trained network.
classification—Array of size n_patterns containing the target classifications for the training patterns, where n_patterns is the number of network training patterns. These must be numbered sequentially from 0 to n_classes – 1, where n_classes is the number of target categories. For binary classification problems, n_classes = 2. For other problems, n_classes = n_outputs = network.n_outputs. For more details, see the MLFF_NETWORK Function.
nominal—Array of size n_patterns by n_nominal containing values for the nominal input attributes, where n_nominal is the number of nominal input attributes. The ith row contains the nominal input attributes for the ith training pattern. If n_nominal = 0, this argument is ignored.
continuous—Array of size n_patterns by n_continuous containing values for the continuous input attributes, where n_continuous is the number of continuous attributes. The ith row contains the continuous input attributes for the ith training pattern. If n_continuous = 0, this argument is ignored.
Returned Value
result—An array of training statistics, containing six summary statistics from the classification neural network, organized as follows:
 
Element
Training Statistics
0
Minimum Cross-Entropy at the optimum.
1
Total number of Stage I iterations.
2
Minimum Cross-Entropy after Stage I training.
3
Total number of Stage II iterations.
4
Minimum Cross-Entropy after Stage II training.
5
Classification error rate from optimum network.
The classification error rate is calculated using the ratio n_errors/n_patterns, where n_errors is the number of patterns that are incorrectly classified using the trained neural network. For each training pattern, the probability that it belongs to each of the target classes is calculated from the trained network. A pattern is considered incorrectly classified if the classification probability for its target classification is not the largest among that pattern’s classification probabilities.
A classification error of zero indicates that all training patterns are correctly classified into their target classifications. A value near one indicates that most patterns are not classified into their target classification.
If training is unsuccessful, NULL is returned.
Input Keywords
Stage_I—A two element integer array, [n_epochs, epoch_size], where n_epochs is the number of epochs used for Stage I training and epoch_size is the number of observations used during each epoch. If epoch training is not needed, set epoch_size = n_patterns and n_epochs = 1. Stage I training is implemented using steepest ascent optimization and backward propagation for gradient calculations. By default, n_epochs = 15, epoch_size = n_patterns.
No_stage_II—If present and non-zero, Stage II training is not performed. By default, in Stage I, network weights are learned using a steepest descent optimization. Stage II begins with these weights and uses a Quasi-Newton optimization to seek improved values. Default: Stage II training is performed.
Max_step—A scalar float value indicating the maximum allowable step size in the optimizer. Default: Max_step = 10.
Max_itn—A scalar long value indicating the maximum number of iterations in the optimizer, per epoch. Default: Max_itn = 1000.
Max_fcn—A scalar long value indicating the maximum number of function evaluations in the optimizer, per epoch. Default: Max_fcn = 1000.
Rel_fcn_tol—A scalar float defining the relative function tolerance in the optimizer. By default the tolerance is: Rel_fcn_tol = max (10-10, ε2/3), max (10-20, ε2/3) in double precision, where ε is the machine precision.
Grad_tol—A scalar float defining the scaled gradient tolerance in the optimizer. Default: Grad_tol = ε1∕2, where ε is the machine precision, ε1∕3 is used in double precision.
Tolerance—A scalar float value indicating the absolute accuracy tolerance for the entropy. If the network entropy for an epoch during Stage I training falls below tolerance, the network is considered optimized, training is halted and the network with the minimum entropy is returned. Default: Tolerance = ε1∕3, where ε is the machine precision, ε2∕3 is used in double precision. .
Print—If present and nonzero, this option turns on printing of the intermediate results during network training. By default, intermediate results are not printed.
Init_weights_method—Specifies the algorithm to use for initializing weights prior to network training. One of the five values listed in Table 14-20: Init_weights_method Values is accepted.
 
Init_weights_method Values
Value
Enumeration
Description
0
IMSLS_NN_NETWORK
No initialization method will be performed. Weights in NN_Network structure network will be used instead.
1
IMSLS_EQUAL
Equal weights
2
IMSLS_RANDOM
Random Weights
3
IMSLS_PRINCIPAL_COMPONENTS
Principal Component Weights
4
IMSLS_DISCRIMINANT
Discriminant Analysis Weights
Default: Init_weights_method = IMSLS_RANDOM.
Logistictable—If present and nonzero, during Stage I optimization all logistic activation functions in the hidden layers are calculated using a table lookup approximation to the logistic function. This reduces the time for Stage I training with logistic activation. However, during Stage II optimization this setting is ignored. Default: All logistic activations are calculated without table lookup.
Output Keywords
Predicted_class—An array of size n_patterns containing the predicted classification for each training pattern.
Gradients—An array of size network.n_links + network.n_nodesnetwork.n_inputs containing the gradients for each weight in the optimum network.
Predicted_prob—An array of size n_patterns by n_classes, where n_classes is the number of target classes in the network. For binary classification problems, n_classes = 2, but for all other problems n_classes = n_outputs, where n_outputs is the number of outputs in the network, network.n_outputs. The values of the ith row are the predicted probabilities associated with the target classes. For binary classification, Predicted_prob(i) is the predicted probability that the ith pattern is associated with class = 0. For other classification problems values in the ith row of Predicted_prob are the predicted probabilities that this pattern belongs to each of the target classes.
Class_error—An array with n_patterns containing the classification probability errors for each pattern in the training data. The classification error for the ith training pattern is equal to 1 – Predicted_class(k) where classification(i).
Discussion
MLFF_CLASSIFICATION_TRAINER trains a multilayered feedforward neural network for classifying patterns. It returns training summaries, the classification probabilities associated with training patterns, their classification errors, the optimum network weights and gradients. Linkages among perceptrons allow for skipped layers, including linkages between inputs and output perceptrons. Except for output perceptrons, the linkages and activation function for each perceptron can be individually configured. For more details, see keywords Link_all, Link_layer, and Link_node in MLFF_NETWORK Function.
Binary classification is handled differently from classification problems involving more than two classes. Binary classification problems only have two target classes, which must be coded as either zero or one. This is represented using a single network output with logistic activation. The output is an estimate of the probability that the pattern belongs to class = 0. The probability associated with class = 1 can be calculated from the relationship P(class = 1) = 1- P(class = 0).
Networks designed to classify patterns into more than two categories use one output for each target class, i.e., n_classes = n_outputs. The first output predicts P(class = 0), the second P(class = 1), etc. All output perceptrons are normalized using softmax activation. This ensures that the estimated class probabilities are between zero and one, and that they always sum to one.
Training Patterns
Neural network training patterns consist of the following three types of data:
1. nominal input attributes
2. continuous input attributes, including encoded ordinal attributes
3. pattern classifications numbered 0, 1, ..., n_classes – 1
The first data type, nominal data, contains the encoding of nominal input attributes, if any. Nominal input attributes must be encoded into multiple columns for network input. Although not required, binary encoding is typically used to create these input columns. Binary encoding consists of creating columns of zeros and ones for each class value associated with every nominal attribute. If only one attribute is used for input, then the number of columns is equal to the number of classes for that attribute. If several nominal attributes appear in the data, then each attribute is associated with several columns, one for each of its classes.
The UNSUPERVISED_NOMINAL_FILTER Function can be used to generate these columns. For a nominal attribute with m classes, UNSUPERVISED_NOMINAL_FILTER returns an n_patterns by m matrix. Each column of this matrix consists of zeros and ones. The column value is set to zero if the pattern is not associated with this classification; otherwise, the value is set to one indicating that this pattern is associated with this classification.
Consider an example with one nominal variable that has two classes: male and female and five training patterns: male, male, female, male, female. With binary encoding, the following matrix is used as nominal network input to represent these patterns:
Continuous input attribute data, including ordinal data encoded to cumulative percentages, are passed to this routine in a separate floating point array, continuous. The number of rows in this array is n_patterns. The number of columns is n_continuous. If the continuous input attributes have widely different ranges, then typically it is advantageous to scale these attributes before using them in network training. The SCALE_FILTER Function can be used for scaling continuous input attributes before using it in network training. Ordinal attributes can be encoded using the UNSUPERVISED_ORDINAL_FILTER Procedure.
It is important to note that if input attributes are encoded or scaled for network training, then the network weights are calculated for that encoding and scaling. Subsequent pattern classifications using these weights must also use the identical encoding and scaling used during training.
Training pattern classification targets are stored in the one-dimensional integer array classification. The ith value in this array is the class assignment for the ith training pattern. Class assignments must be represented using the integers 0, 1, ..., n_classes1. This encoding is arbitrary, but it should be consistent. For example, if the class assignments correspond to the colors red, white and blue, then they must be encoded as zero, one, and two. However, it is arbitrary whether red gets assigned to class = 0, 1 or 2 provided that assignment is used for every pattern.
Network Configuration
The network configuration consists of the following:
*number of inputs and outputs,
*number of hidden layers,
*description of the number of perceptrons in each layer,
*description of the linkages among the perceptrons, and
*initial values for network weights, including bias weights.
This description is passed into MLFF_CLASSIFICATION_TRAINER using the structure NN_Network. See the MLFF_NETWORK Function.
Training Efficiency
INITIAL NETWORK WEIGHTS: The training efficiency determines the speed of network training. This is controlled by several factors. One of the most important factors is the initial weights used by the optimization algorithm. By default, these are set randomly. Other options can be specified through the keyword Init_weights_method. See the MLFF_INITIALIZE_WEIGHTS Function for a detailed description of the available initialization methods.
Initial weights are scaled to reduce the possibility of perceptron saturation during the initial phases of network training. Saturation occurs when initial perceptron potential calculations are so large, or so small, that the activation calculation for a potential is driven to the largest or smallest possible values that can be represented on the computer in the stated precision (single or double). If saturation occurs, warning messages may appear indicating that network training did not converge to an optimum solution.
The scaled initial weights are modified prior to every epoch by adding noise to these base values. The noise component is uniformly distributed over the interval [–0.5, +0.5].
SCALING INPUTS: Although automatic scaling of network weights protects against saturation during initial training iterations, the training algorithm can push the weights into regions that may cause saturation. Typically this occurs when input attributes have widely different scaling. For that reason, it is recommended to also scale all continuous input attributes to z-scores or a common interval, such as [–1, +1]. The SCALE_FILTER Function can be used to scale continuous input attributes to z-scores or a common interval.
LOGISTIC CALCULATIONS: If Stage I training is slow, the Logistictable keyword can reduce this time by using a table lookup for calculating the logistic activation function in hidden layer perceptrons. This option is ignored during Stage II training. If Stage II training is used, then weights for the optimum network will be calculated using exact calculations for any logistic activation functions. If Stage II training is not used and the Logistictable option is invoked, care must be taken to ensure that this option is also used for any network classification predictions using MLFF_PATTERN_CLASSIFICATION Function.
NUMBER OF EPOCHS AND EPOCH SIZE: To ensure that a globally optimum network results from the training, several training sessions are conducted and compared. Each session is referred to as an epoch. The training for each epoch is conducted using all of the training patterns or a random sample of all available patterns.
Both the number of epochs and epoch size can be set using the Stage_I keyword. By default the number of epochs during Stage I training is 15 and the epoch size is equal to the total number of training patterns. Increasing the number of epochs increases the training time, but it can result in a more accurate classification network.
During Stage I training, the network entropy is calculated after each epoch. If that value is smaller than Tolerance Stage I training will stop since it is assumed that a network with entropy that low is acceptably accurate, and it is not necessary to continue training. The value for Tolerance can be set using the Tolerance keyword. Setting this to a larger value, such as 0.001, is useful for initially evaluating alternate network architectures.
NETWORK SIZE AND VALIDATION: The network architecture, the number of perceptrons and network layers, also play a key role in network training. Larger networks with many inputs and perceptrons have a larger number of weights. Large networks can provide very accurate classifications, driving the misclassification error rate for the training patterns to zero. However networks with too many weights can take too long to train, and can be inaccurate for classifying patterns not adequately represented among the training patterns.
A starting point is to ensure the total number of network weights is approximately equal to the number of training patterns. A trained network of this size typically has a low misclassification error rate when calculated for the training patterns. That is, it is able to accurately reproduce the training data. However, it might be inaccurate for classifying other patterns.
One approach to this validation is to split the total number of training patterns into two or more subsets then train the network using only one of the subsets and classify the remaining data using the trained network. The misclassification error rate for the data not used in training will be a better estimate of the true classification error rate for this network.
However, this approach to validation is only possible when the number of training patterns is large.
Output
Output from MLFF_CLASSIFICATION_TRAINER consists of classification probabilities calculated for each training pattern, a classification error array for these patterns, predicted classifications, weights and their associated gradients for the trained network, and the training statistics. The NN_Network structure is automatically updated with the weights, gradients and bias values for use as input to MLFF_PATTERN_CLASSIFICATION Function.
For more details about the weights and bias values, see Table 14-17: Structure Members and Their Descriptions.
Example 1
This example trains a three-layer network using 48 training patterns with two nominal and two continuous input attributes. The first nominal attribute has three classifications and the second has four. Classifications for the nominal attributes are encoded using the UNSUPERVISED_NOMINAL_FILTER Function. This function uses binary encoding, generating a total of 7 input attributes to represent the two nominal attributes. The two additional continuous attributes increase the total number of network inputs to 9.
In this example, the target classification is binary, either zero or one. The continuous input attribute was scaled to the interval [0,1].
The structure of the network consists of nine input attributes in the input layer and three other layers. There are three perceptrons in the 1st hidden layer, and two in the 2nd. Since the classification target in this example is binary, there is only one perceptron in the output layer.
All perceptrons use the logistic function for activation, including the output perceptron. Since logistic activation values are always between 0 and 1, the output from this network can be interpreted directly as the estimated probability, P(0), that a pattern belongs to target classification 0.
 
Figure 14-11: A Binary 3-layer, Classification Network with 7 Inputs and 6 Perceptrons
There are a total of 41 weights in this network. Six are bias weights and the remaining 35 are the weights for the input links to every perceptron, e.g., 35 = 9*3+3*2+2.
Printing is turned on to show progress during the training session.
PRO t_mlff_classification_trainer_ex1
 
   n_patterns =48  ; # of training patterns.
   n_inputs   =9   ; 2 nominal (7 classes) and 2 continuous.
   n_nominal  =7   ; 2 attributes with 3 and 4 classes each.
   n_continuous =2 ; 2 continuous input attributes.
   n_outputs    =1 ; binary classification.
   classification = LONARR(48) 
   nominalAtt = LONARR(n_patterns,n_nominal) 
   n_cat = 2 
   nomTempIn = LONARR(48) 
   inputData = $     ; size = 5*n_patterns
      [0.00, 0.00, 0, 0, 0, 0.02, 0.02, 0, 1, 0, 0.04,$
      0.04, 0, 2, 0, 0.06, 0.06, 0, 3, 0, 0.08, 0.08, $ 
      1, 0, 0, 0.10, 0.10, 1, 1, 0, 0.12, 0.12, 1, 2, $ 
      0, 0.14, 0.14, 1, 3, 0, 0.16, 0.16, 2, 0, 0,    $
      0.18, 0.18, 2, 1, 0, 0.20, 0.20, 2, 2, 0, 0.22, $
      0.22, 2, 3, 0, 0.24, 0.28, 0, 0, 0, 0.26, 0.30, $
      0, 1, 0, 0.28, 0.32, 0, 2, 0, 0.30, 0.34, 0, 3, $
      0, 0.32, 0.36, 1, 0, 0, 0.34, 0.38, 1, 1, 0, 0.36,$
      0.40, 1, 2, 0, 0.38, 0.42, 1, 3, 0, 0.40, 0.44, $
      2, 0, 0, 0.42, 0.46, 2, 1, 0, 0.44, 0.48, 2, 2, $
      0, 0.46, 0.50, 2, 3, 0, 0.52, 0.48, 0, 0, 0, 0.54,$
      0.50, 0, 1, 1, 0.56, 0.52, 0, 2, 1, 0.58, 0.54, 0,$
      3, 1, 0.60, 0.56, 1, 0, 1, 0.62, 0.58, 1, 1, 1, $
      0.64, 0.60, 1, 2, 1, 0.66, 0.62, 1, 3, 1, 0.68, $
      0.64, 2, 0, 0, 0.70, 0.66, 2, 1, 0, 0.72, 0.68, $
      2, 2, 0, 0.74, 0.70, 2, 3, 0, 0.76, 0.76, 0, 0, $
      1, 0.78, 0.78, 0, 1, 1, 0.80, 0.80, 0, 2, 1, 0.82,$
      0.82, 0, 3, 1, 0.84, 0.84, 1, 0, 1, 0.86, 0.86, $
      1, 1, 1, 0.88, 0.88, 1, 2, 1, 0.90, 0.90, 1, 3, $
      1, 0.92, 0.92, 2, 0, 0, 0.94, 0.94, 2, 1, 0, 0.96,$
      0.96, 2, 2, 0, 0.98, 0.98, 2, 3, 0]
   contAtt = FLTARR(n_patterns,n_continuous)
   colLabels = ["Pattern", "Class=0", "Class=1"]
 
   PRINT,"***********************************" 
   PRINT,"*  BINARY CLASSIFICATION EXAMPLE  *"
   PRINT,"***********************************"
 
   ;  Set up continuous input attributes and   
   ;  classification target arrays.
     
   FOR i=0L, n_patterns-1 DO BEGIN
      ; Assign input to array for continuous input attributes. 
      contAtt(i,0)    = inputData(i*5)
      contAtt(i,1)  = inputData(i*5+1)
      ; Assign input to classification target array.
      classification(i) = LONG(inputData(i*5+4))
   ENDFOR   
   ; Set up nominal input attributes using binary encoding.
   m=0
   FOR i=0L, n_cat-1 DO BEGIN 
      FOR j=0L, n_patterns-1 DO BEGIN 
         nomTempIn(j) = LONG(inputData(j*5+n_continuous+i)+1)
      ENDFOR
      nomTempOut= UNSUPERVISED_NOMINAL_FILTER(nomTempIn,$  
                                        N_classes=nClass)
      FOR k=0L, nClass-1 DO BEGIN 
         FOR j=0L, n_patterns-1 DO BEGIN 
            nominalAtt(j,m) = nomTempOut(j,k)
         ENDFOR
         m = m+1 
      ENDFOR 
   ENDFOR 
   PRINT,"      TRAINING PATTERNS"
   PRINT,"Y  N1  N2     Z1     Z2"
   FOR i=0L, n_patterns-1 DO BEGIN 
      j = LONG(inputData(i*5+2)) 
      k = LONG(inputData(i*5+3))
      PRINT,STRTRIM(classification(i),2),$
      "   ",STRTRIM(j,2),$
      "   ",STRTRIM(k,2),$
      "   ",STRING(contAtt(i,0),Format="(f4.2)"),$ 
      "   ",STRING(contAtt(i,1),Format="(f4.2)") 
   ENDFOR
 
   network = MLFF_NETWORK_INIT(n_inputs, n_outputs) 
   network = MLFF_NETWORK(network,Create_hidden_layer=3)  
   network = MLFF_NETWORK(network,Create_hidden_layer=2) 
   network = MLFF_NETWORK(network,/Link_all) 
 
   ; Note the following statement is for repeatable output.
   RANDOMOPT,Set=5555l 
   ; Train the classification network.
   PRINT,"STARTING NETWORK TRAINING" 
   trainStats = MLFF_CLASSIFICATION_TRAINER(network,$ 
                classification, nominalAtt, contAtt,$  
                /Print,Predicted_prob=classProb) 
 
   ; Print class predictions.
   PRINT,"Predicted Classification Probabilities"
   PRINT,"    Pattern   Class=0"
      FOR i=0L, n_patterns-1 DO BEGIN  
        PRINT,i+1,"        ", STRING(classprob(i),Format="(F3.1)") 
      ENDFOR
END
Output
Notice that although by default the maximum number of epoch iterations in Stage I training is 15, in this case Stage I optimization is halted after the first epoch. This occurs because the minimum entropy for that epoch is less than the default tolerance.
***********************************
*  BINARY CLASSIFICATION EXAMPLE  *
***********************************
      TRAINING PATTERNS
Y  N1  N2     Z1   Z2
0   0   0   0.00   0.00
0   0   1   0.02   0.02
0   0   2   0.04   0.04
0   0   3   0.06   0.06
0   1   0   0.08   0.08
0   1   1   0.10   0.10
0   1   2   0.12   0.12
0   1   3   0.14   0.14
0   2   0   0.16   0.16
0   2   1   0.18   0.18
0   2   2   0.20   0.20
0   2   3   0.22   0.22
0   0   0   0.24   0.28
0   0   1   0.26   0.30
0   0   2   0.28   0.32
0   0   3   0.30   0.34
0   1   0   0.32   0.36
0   1   1   0.34   0.38
0   1   2   0.36   0.40
0   1   3   0.38   0.42
0   2   0   0.40   0.44
0   2   1   0.42   0.46
0   2   2   0.44   0.48
0   2   3   0.46   0.50
0   0   0   0.52   0.48
1   0   1   0.54   0.50
1   0   2   0.56   0.52
1   0   3   0.58   0.54
1   1   0   0.60   0.56
1   1   1   0.62   0.58
1   1   2   0.64   0.60
1   1   3   0.66   0.62
0   2   0   0.68   0.64
0   2   1   0.70   0.66
0   2   2   0.72   0.68
0   2   3   0.74   0.70
1   0   0   0.76   0.76
1   0   1   0.78   0.78
1   0   2   0.80   0.80
1   0   3   0.82   0.82
1   1   0   0.84   0.84
1   1   1   0.86   0.86
1   1   2   0.88   0.88
1   1   3   0.90   0.90
0   2   0   0.92   0.92
0   2   1   0.94   0.94
0   2   2   0.96   0.96
0   2   3   0.98   0.98
 
STARTING NETWORK TRAINING
 
TRAINING PARAMETERS:
  Stage II Opt.   = 1 
  n_epochs        = 15 
  epoch_size      = 48 
  maxIterations   = 1000 
  maxFunctionEval = 1000 
  maxStep         = 10.000000 
  functionTol     = 2.42218e-05 
  gradientTol     = 0.000345267 
  accuracy        = 0.000345267 
  n_inputs        = 9 
  n_continuous    = 2 
  n_nominal       = 7 
  n_classes       = 2 
  n_outputs       = 1 
  n_patterns      = 48 
  n_layers        = 3 
  n_perceptrons   = 6 
  n_weights       = 41 
 
STAGE I TRAINING STARTING 
Stage I: Epoch 1 - Cross-Entropy Error = 1.2219e-05 (Iterations=50)
(CPU Min.=0.000167)
Stage I Training Converged at Epoch = 1 
 
 
STAGE I FINAL CROSS-ENTROPY ERROR = 0.000012 (CPU Min.=0.000167)
 
OPTIMUM WEIGHTS AFTER STAGE I TRAINING: 
weight[0] =     -0.62051        weight[1] =     3.09276         
weight[2] =     0.758899        weight[3] =     3.25391         
weight[4] =     0.880689        weight[5] =     0.570943        
weight[6] =     0.582033        weight[7] =     2.02618         
weight[8] =     2.10231         weight[9] =     4.52622         
weight[10] =    3.17279         weight[11] =    -10.8302        
weight[12] =    -1.11139        weight[13] =    0.922095        
weight[14] =    -0.715357       weight[15] =    -0.427637       
weight[16] =    11.0583         weight[17] =    8.9387  
weight[18] =    2.18456         weight[19] =    1.82989         
weight[20] =    -8.44236        weight[21] =    -3.11275        
weight[22] =    -0.00365442     weight[23] =    0.626157        
weight[24] =    -0.229832       weight[25] =    13.5145         
weight[26] =    11.8758         weight[27] =    -4.80003        
weight[28] =    -18.7821        weight[29] =    -9.04489        
weight[30] =    -6.97589        weight[31] =    -14.9861        
weight[32] =    -17.6179        weight[33] =    23.4709         
weight[34] =    11.2042         weight[35] =    -6.75005        
weight[36] =    -12.9537        weight[37] =    -14.0837        
weight[38] =    17.7174         weight[39] =    21.5824         
weight[40] =    -19.7159        
 
STAGE I TRAINING CONVERGED
STAGE I CROSS-ENTROPY ERROR = 0.000012 
 
0 PATTERNS OUT OF 48 INCORRECTLY CLASSIFIED
 
 
GRADIENT AT THE OPTIMUM WEIGHTS 
-->g[0] =       0.000000         weight[0] =    -0.620510 
-->g[1] =       -0.000001        weight[1] =    3.092761 
-->g[2] =       0.000000         weight[2] =    0.758899 
-->g[3] =       -0.000001        weight[3] =    3.253912 
-->g[4] =       0.000000         weight[4] =    0.880689 
-->g[5] =       0.000000         weight[5] =    0.570943 
-->g[6] =       0.000000         weight[6] =    0.582033 
-->g[7] =       -0.000000        weight[7] =    2.026184 
-->g[8] =       -0.000000        weight[8] =    2.102315 
-->g[9] =       0.000000         weight[9] =    4.526219 
-->g[10] =      -0.000006        weight[10] =   3.172789 
-->g[11] =      0.000000         weight[11] =   -10.830215 
-->g[12] =      -0.000006        weight[12] =   -1.111387 
-->g[13] =      0.000000         weight[13] =   0.922095 
-->g[14] =      0.000000         weight[14] =   -0.715357 
-->g[15] =      0.000000         weight[15] =   -0.427637 
-->g[16] =      -0.000003        weight[16] =   11.058254 
-->g[17] =      -0.000003        weight[17] =   8.938703 
-->g[18] =      0.000000         weight[18] =   2.184557 
-->g[19] =      -0.000005        weight[19] =   1.829893 
-->g[20] =      0.000000         weight[20] =   -8.442363 
-->g[21] =      -0.000005        weight[21] =   -3.112754 
-->g[22] =      0.000000         weight[22] =   -0.003654 
-->g[23] =      0.000000         weight[23] =   0.626157 
-->g[24] =      0.000000         weight[24] =   -0.229832 
-->g[25] =      -0.000003        weight[25] =   13.514451 
-->g[26] =      -0.000003        weight[26] =   11.875845 
-->g[27] =      0.000001         weight[27] =   -4.800026 
-->g[28] =      0.000000         weight[28] =   -18.782066 
-->g[29] =      0.000000         weight[29] =   -9.044890 
-->g[30] =      0.000001         weight[30] =   -6.975893 
-->g[31] =      0.000001         weight[31] =   -14.986099 
-->g[32] =      0.000000         weight[32] =   -17.617901 
-->g[33] =      -0.000012        weight[33] =   23.470867 
-->g[34] =      -0.000012        weight[34] =   11.204187 
-->g[35] =      -0.000001        weight[35] =   -6.750047 
-->g[36] =      -0.000006        weight[36] =   -12.953682 
-->g[37] =      -0.000005        weight[37] =   -14.083694 
-->g[38] =      0.000001         weight[38] =   17.717358 
-->g[39] =      0.000001         weight[39] =   21.582405 
-->g[40] =      -0.000011        weight[40] =   -19.715902 
 
Training Completed - leaving training engine (CPU Min.=0.000167) 
 
Predicted Classification Probabilities
    Pattern   Class=0
       1        1.0
       2        1.0
       3        1.0
       4        1.0
       5        1.0
       6        1.0
       7        1.0
       8        1.0
       9        1.0
      10        1.0
      11        1.0
      12        1.0
      13        1.0
      14        1.0
      15        1.0
      16        1.0
      17        1.0
      18        1.0
      19        1.0
      20        1.0
      21        1.0
      22        1.0
      23        1.0
      24        1.0
      25        1.0
      26        0.0
      27        0.0
      28        0.0
      29        0.0
      30        0.0
      31        0.0
      32        0.0
      33        1.0
      34        1.0
      35        1.0
      36        1.0
      37        0.0
      38        0.0
      39        0.0
      40        0.0
      41        0.0
      42        0.0
      43        0.0
      44        0.0
      45        1.0
      46        1.0
      47        1.0
      48        1.0
Example 2
Fisher’s (1936) Iris data is often used for benchmarking discriminant analysis and classification solutions. It is part of the IMSL data sets and consists of the following continuous input attributes and classification target:
Continuous Attributes — X1(sepal length), X2(sepal width), X3(petal length), and X4(petal width)
Classification Target (Iris Type) — Setosa, Versicolour or Virginica.
These data consist of 150 patterns. Since all pattern input attributes are continuous, linear discriminant analysis can be used for classifying these patterns, see Example 1 of the DISCR_ANALYSIS Procedure. Linear discriminant analysis is able to correctly classify 98% of the training patterns. In this example, the simple neural network illustrated in the following figure is able to achieve 100% accuracy.
 
Figure 14-12: A 2-layer, Classification Network with 4 Inputs 5 Perceptrons and a Target Classification with 3 Classes
The hidden layer in this example consists of only two perceptrons with logistic activation. Since the target attribute in this example has three classes, the network output layer consists of three perceptrons, one for each class.
There are a total of 19 weights in this network. Fourteen of the weights are assigned to the input links, i.e., 4 × 2 + 2 × 3 = 14. The last five weights are the bias weights for each of the five network perceptrons. All weights were initialized using principal components, i.e. Init_weights_method = IMSLS_PRINCIPAL_COMPONENTS.
Although in these data the continuous attributes have similar ranges, they were scaled using z-score scaling to make network training more efficient. For more details, see the SCALE_FILTER Function.
For non-binary classification problems, MLFF_CLASSIFICATION_TRAINER uses softmax activation for output perceptrons. This ensures that the estimates of the classification probabilities sum to one, i.e.:
Note that the default setting for Max_step was changed from 10 to 1000. The default setting converged to a network with 100% classification accuracy. However, the following warning message appeared:
*** WARNING Error IMSLS_UNBOUNDED from imsls_d_mlff_classification_trainer.
*** Five consecutive steps of length "max_step" have been taken;
*** either the function is unbounded below, or has a finite
*** asymptote in some direction or the maximum allowable step
*** size "max_step" is too small.
In addition, the number of iterations used for each epoch were well below the default maximum (1000), and the gradients at the optimum solution for this network were not zero.
STAGE I TRAINING STARTING
Stage I: Epoch 1 - Cross-Entropy Error = 5.50552 (Iterations=40) (CPU Min.=0.000260)
Stage I: Epoch 2 - Cross-Entropy Error = 5.65875 (Iterations=69) (CPU Min.=0.000260)
Stage I: Epoch 3 - Cross-Entropy Error = 4.83886 (Iterations=81) (CPU Min.=0.000260)
Stage I: Epoch 4 - Cross-Entropy Error = 5.94979 (Iterations=108) (CPU Min.=0.000521)
Stage I: Epoch 5 - Cross-Entropy Error = 5.54461 (Iterations=47) (CPU Min.=0.000260)
Stage I: Epoch 6 - Cross-Entropy Error = 6.04163 (Iterations=51) (CPU Min.=0.000260)
Stage I: Epoch 7 - Cross-Entropy Error = 5.95148 (Iterations=151) (CPU Min.=0.000521)
Stage I: Epoch 8 - Cross-Entropy Error = 5.5646 (Iterations=55) (CPU Min.=0.000260)
Stage I: Epoch 9 - Cross-Entropy Error = 5.94914 (Iterations=442) (CPU Min.=0.001563)
Stage I: Epoch 10 - Cross-Entropy Error = 5.94381 (Iterations=271) (CPU Min.=0.001302)
Stage I: Epoch 11 - Cross-Entropy Error = 5.41955 (Iterations=35) (CPU Min.=0.000000)
Stage I: Epoch 12 - Cross-Entropy Error = 6.01766 (Iterations=48) (CPU Min.=0.000260)
Stage I: Epoch 13 - Cross-Entropy Error = 4.20551 (Iterations=112) (CPU Min.=0.000521)
Stage I: Epoch 14 - Cross-Entropy Error = 5.95085 (Iterations=103) (CPU Min.=0.000260)
Stage I: Epoch 15 - Cross-Entropy Error = 5.9596 (Iterations=55) (CPU Min.=0.000260)
Stage I: Epoch 16 - Cross-Entropy Error = 5.96131 (Iterations=59) (CPU Min.=0.000260)
Stage I: Epoch 17 - Cross-Entropy Error = 4.83231 (Iterations=74) (CPU Min.=0.000260)
Stage I: Epoch 18 - Cross-Entropy Error = 17.1345 (Iterations=30) (CPU Min.=0.000260)
Stage I: Epoch 19 - Cross-Entropy Error = 5.95569 (Iterations=92) (CPU Min.=0.000260)
Stage I: Epoch 20 - Cross-Entropy Error = 3.15336 (Iterations=46) (CPU Min.=0.000260)
GRADIENT AT THE OPTIMUM WEIGHTS
-->g[0] = 0.675632 weight[0] = 0.075861
-->g[1] = -0.953480 weight[1] = -0.078585
-->g[2] = 1.065184 weight[2] = 2.841074
-->g[3] = 0.535531 weight[3] = -0.941049
-->g[4] = -0.019011 weight[4] = -10.638772
-->g[5] = 0.001459 weight[5] = -14.573394
-->g[6] = -0.031098 weight[6] = 6.037813
-->g[7] = -0.035305 weight[7] = 72.382073
-->g[8] = 0.011015 weight[8] = -73.564433
-->g[9] = 0.000000 weight[9] = -14.853988
-->g[10] = -0.074332 weight[10] = 2.057743
-->g[11] = 0.000522 weight[11] = -39.952435
-->g[12] = 0.063316 weight[12] = 73.164141
-->g[13] = -0.000522 weight[13] = 57.065975
-->g[14] = 1.279914 weight[14] = -0.661036
-->g[15] = -0.043097 weight[15] = -61.171894
-->g[16] = 0.003227 weight[16] = 24.236722
-->g[17] = -0.108146 weight[17] = 14.968424
-->g[18] = 0.104919 weight[18] = -39.079343
Combined, this information suggests that either the default tolerances were too high or the maximum step size was too small. As shown in the output, when the maximum step size was changed to 1000, the number of iterations increased, the gradients went to zero and the warning message for step size disappeared.
PRO mlff_classification_trainer_ex2
 
   ; Define the IMSLS_* constants.
   @CMAST_COMMON
 
   n_patterns    = 150 
   n_inputs      = 4       ; all continuous inputs 
   n_nominal     = 0       ; no nominal    
   n_continuous  = 4    
   n_outputs     = 3   
 
   act_fcn = [1, 1, 1] 
   classification = LONARR(n_patterns)
   unscaledX = FLTARR(n_patterns)
   scaledX = FLTARR(n_patterns) 
   contAtt = FLTARR(n_patterns,n_continuous) 
  
   mean = FLTARR(n_continuous)
   s = FLTARR(n_continuous) 
 
   PRINT,"*****************************************************"
   PRINT,"* IRIS CLASSIFICATION EXAMPLE                       *"
   PRINT,"*****************************************************"
 
   ;  irisData():  The raw data matrix.  This is a 2-D matrix with
   ;               150 rows and 5 columns. The last 4 columns are
   ;               the continuous input attributes and the 1st
   ;               column is the classification category
   ;               (1-3). These data contain no nominal input
   ;               attributes.
     
   irisData = STATDATA(3) 
 
   ; Set up the continuous attribute input array, contAtt(), and
   ; the network target classification array, classification(),
   ; using the above raw data matrix.
      
   classification(*) = LONG(irisData(*,0)-1)
   contAtt(*,*) = irisData(*,1:4)      
 
   ; Scale continuous input attributes using z-score method.
   FOR j=0L, n_continuous-1 DO BEGIN  
      unscaledX(*) = contAtt(*,j)
      scaledX = SCALE_FILTER(unscaledX, 2,$ 
             Return_center_spread=centerspread)   
      contAtt(*,j) = scaledX(*) 
      mean(j) = centerspread(0) 
      s(j)    = centerspread(1) 
   ENDFOR
   PRINT,"Scale Parameters: " 
   FOR j=0L, n_continuous-1 DO BEGIN
      PRINT,"Var ",STRTRIM(j+1,2),$
      "  Mean = ",STRING(mean(j),Format="(f10.5)"),$
      "  S = ",STRING(s(j),Format="(f10.5)")
   ENDFOR
 
   network = MLFF_NETWORK_INIT(n_inputs, n_outputs) 
   network = MLFF_NETWORK(network, Create_hidden=2)
   network = MLFF_NETWORK(network, /Link_all)  
 
   ; Note the following statement is for repeatable output.
   RANDOMOPT,Set=5555 
   ; Train classification network.
   startTime = SYSTIME(1) 
   trainStats = MLFF_CLASSIFICATION_TRAINER(network,$ 
      classification, $
      nominalAtt, $
      contAtt,  $
      /Print,   $
      StageI=[20L, 150L], $ 
      Init_weights_Method=IMSLS_PRINCIPAL_COMPONENTS, $ 
      Max_step=1000.0, $ 
      Predicted_class=predicted_class, $ 
      Predicted_prob=predicted_class_prob,$ 
      Class_error=class_error)
 
   endTime = SYSTIME(1) 
   PRINT,"================================================="
   PRINT,"Minimum Cross-Entropy Error: ",$
          STRING(trainStats(0),Format="(E8.5)") 
   PRINT,"Classification Error Rate: ", $
          STRING(trainStats(5),Format="(f8.5)") 
   PRINT,"Execution Time (Sec.):     ",$
          STRING((endTime-startTime),Format="(f8.5)") 
  
   hdr  = "Predicted                           Class"
   hdr1 = "Class  |    P(0)    P(1)    P(2)  | Error"
   line= "------------------------------------------"
   PRINT,hdr
   PRINT,line
   FOR i=0L, n_patterns-1 DO BEGIN
      PRINT,STRTRIM(predicted_class(i),2),$ 
      "      | ",$
      STRING(predicted_class_prob(i,0),Format="(f8.5)"),$ 
      STRING(predicted_class_prob(i,1),Format="(f8.5)"),$
      ;" | ",$ 
      STRING(predicted_class_prob(i,2),Format="(f8.5)"),$
      " | ",$ 
      STRING(class_error(i),Format="(f8.5)")
      if i EQ 49 OR i EQ 99 THEN BEGIN 
          PRINT,hdr 
          PRINT,hdr1
          PRINT,line 
      ENDIF
   ENDFOR 
END
Output
Note that the misclassification error rate is zero and Stage I training halts automatically at the 16th epoch because the cross-entropy error after the 16th epoch is below the default tolerance.
*******************************************************
* IRIS CLASSIFICATION EXAMPLE                         *
*******************************************************
% SCALE_FILTER: Alert: STAT_NORMAL_UNDERFLOW
        The normal distribution is used for large degrees of freedom.  However, it has produced underflow.  Therefore, the probability is set to zero.
% SCALE_FILTER: Alert: STAT_NORMAL_UNDERFLOW
        The normal distribution is used for large degrees of freedom.  However, it has produced underflow.  Therefore, the probability is set to zero.
% SCALE_FILTER: Alert: STAT_NORMAL_UNDERFLOW
        The normal distribution is used for large degrees of freedom.  However, it has produced underflow.  Therefore, the probability is set to zero.
% SCALE_FILTER: Alert: STAT_NORMAL_UNDERFLOW
        The normal distribution is used for large degrees of freedom.  However, it has produced underflow.  Therefore, the probability is set to zero.
Scale Parameters: 
Var 1  Mean =    5.84333  S =    0.82807
Var 2  Mean =    3.05733  S =    0.43587
Var 3  Mean =    3.75800  S =    1.76530
Var 4  Mean =    1.19933  S =    0.76224
 
TRAINING PARAMETERS:
  Stage II Opt.   = 1 
  n_epochs        = 20 
  epoch_size      = 150 
  maxIterations   = 1000 
  maxFunctionEval = 1000 
  maxStep         = 1000.000000 
  functionTol     = 2.42218e-05 
  gradientTol     = 0.000345267 
  accuracy        = 0.000345267 
  n_inputs        = 4 
  n_continuous    = 4 
  n_nominal       = 0 
  n_classes       = 3 
  n_outputs       = 3 
  n_patterns      = 150 
  n_layers        = 2 
  n_perceptrons   = 5 
  n_weights       = 19 
 
STAGE I TRAINING STARTING 
Stage I: Epoch 1 - Cross-Entropy Error = 4.92196 (Iterations=73)
(CPU Min.=0.001000)
Stage I: Epoch 2 - Cross-Entropy Error = 5.9593 (Iterations=119)
(CPU Min.=0.001333)
Stage I: Epoch 3 - Cross-Entropy Error = 5.95304 (Iterations=193)
(CPU Min.=0.002000)
Stage I: Epoch 4 - Cross-Entropy Error = 74.9248 (Iterations=30)
(CPU Min.=0.000167)
Stage I: Epoch 5 - Cross-Entropy Error = 4.92197 (Iterations=102)
(CPU Min.=0.001500)
Stage I: Epoch 6 - Cross-Entropy Error = 5.9556 (Iterations=197)
(CPU Min.=0.003500)
Stage I: Epoch 7 - Cross-Entropy Error = 4.92196 (Iterations=101)
(CPU Min.=0.001333)
Stage I: Epoch 8 - Cross-Entropy Error = 4.92197 (Iterations=90)
(CPU Min.=0.001167)
Stage I: Epoch 9 - Cross-Entropy Error = 5.02341 (Iterations=476)
(CPU Min.=0.006833)
Stage I: Epoch 10 - Cross-Entropy Error = 5.94134 (Iterations=161)
(CPU Min.=0.001167)
Stage I: Epoch 11 - Cross-Entropy Error = 4.92203 (Iterations=82)
(CPU Min.=0.001167)
Stage I: Epoch 12 - Cross-Entropy Error = 4.92198 (Iterations=88)
(CPU Min.=0.001000)
Stage I: Epoch 13 - Cross-Entropy Error = 4.92195 (Iterations=97)
(CPU Min.=0.001333)
Stage I: Epoch 14 - Cross-Entropy Error = 5.95137 (Iterations=178)
(CPU Min.=0.001167)
Stage I: Epoch 15 - Cross-Entropy Error = 5.95102 (Iterations=130)
(CPU Min.=0.001333)
Stage I: Epoch 16 - Cross-Entropy Error = 6.90229e-05 (Iterations=184)
(CPU Min.=0.002500)
Stage I Training Converged at Epoch = 16 
 
 
STAGE I FINAL CROSS-ENTROPY ERROR = 0.000069 (CPU Min.=0.028500)
 
OPTIMUM WEIGHTS AFTER STAGE I TRAINING: 
weight[0] =     0.299444        weight[1] =     -0.149455       
weight[2] =     4.55187         weight[3] =     -1.94491        
weight[4] =     -13.4141        weight[5] =     -474.311        
weight[6] =     1202.39         weight[7] =     2515.17         
weight[8] =     -3152.97        weight[9] =     -176.632        
weight[10] =    106.797         weight[11] =    -1195.65        
weight[12] =    3048.21         weight[13] =    1374.08         
weight[14] =    0.991334        weight[15] =    -2747.78        
weight[16] =    1629.43         weight[17] =    622.712         
weight[18] =    -2252.16        
 
STAGE I TRAINING CONVERGED
STAGE I CROSS-ENTROPY ERROR = 0.000069 
 
0 PATTERNS OUT OF 150 INCORRECTLY CLASSIFIED
 
 
GRADIENT AT THE OPTIMUM WEIGHTS 
-->g[0] =       0.002142         weight[0] =    0.299444 
-->g[1] =       0.002150         weight[1] =    -0.149455 
-->g[2] =       0.001187         weight[2] =    4.551874 
-->g[3] =       0.000373         weight[3] =    -1.944913 
-->g[4] =       0.000000         weight[4] =    -13.414126 
-->g[5] =       0.000000         weight[5] =    -474.311462 
-->g[6] =       0.000000         weight[6] =    1202.392822 
-->g[7] =       0.000000         weight[7] =    2515.173584 
-->g[8] =       0.000001         weight[8] =    -3152.970947 
-->g[9] =       0.000000         weight[9] =    -176.631683 
-->g[10] =      -0.000000        weight[10] =   106.796753 
-->g[11] =      0.000000         weight[11] =   -1195.651855 
-->g[12] =      -0.000001        weight[12] =   3048.212646 
-->g[13] =      0.000000         weight[13] =   1374.076416 
-->g[14] =      -0.001546        weight[14] =   0.991334 
-->g[15] =      0.000000         weight[15] =   -2747.777588 
-->g[16] =      0.000003         weight[16] =   1629.434448 
-->g[17] =      -0.000003        weight[17] =   622.712341 
-->g[18] =      -0.000000        weight[18] =   -2252.163818 
 
Training Completed - leaving training engine (CPU Min.=0.028500) 
 
=================================================
Minimum Cross-Entropy Error:  0.00007
Classification Error Rate:  0.00000
Execution Time (Sec.):      1.74515
Predicted                           Class
------------------------------------------
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
0      |  1.00000 0.00000 0.00000 |  0.00000
Predicted                           Class
Class  |    P(0)    P(1)    P(2)  | Error
------------------------------------------
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 0.99999 0.00001 |  0.00001
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 0.99999 0.00001 |  0.00001
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 0.99998 0.00002 |  0.00002
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
1      |  0.00000 1.00000 0.00000 |  0.00000
Predicted                           Class
Class  |    P(0)    P(1)    P(2)  | Error
------------------------------------------
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00003 0.99997 |  0.00003
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000
2      |  0.00000 0.00000 1.00000 |  0.00000