PV-WAVE Advantage > IMSL Statistics Reference Guide > Data Mining > MLFF_PATTERN_CLASSIFICATION Function
MLFF_PATTERN_CLASSIFICATION Function
Calculates classifications for trained multilayered feedforward neural networks.
Usage
result = MLFF_PATTERN_CLASSIFICATION (network, n_patterns, nominal, continuous)
Input Parameters
network—A structure of type NN_Network containing the trained feedforward network. For more details, see the MLFF_NETWORK Function.
n_patterns—A scalar long value indicating the number of patterns to classify.
nominal—Array of size n_patterns by n_nominal containing the nominal input variables, where n_nominal is the number of nominal input attributes. If n_nominal = 0, this argument is ignored.
continuous—Array of size n_patterns by n_continuous containing values for the continuous and scaled ordinal input variables, where n_continuous is the number of continuous attributes. If n_continuous = 0, this argument is ignored.
Returned Value
result—An array of size n_patterns by n_classes containing the predicted class probabilities associated with each input pattern, where n_classes is the number of possible target classifications. n_classes = network.n_outputs for non-binary classification categories. For binary classification, n_classes = 2.
Input Keywords
Logistictable—If present and nonzero, this option specifies that all logistic activation functions are calculated using the table lookup approximation. This is only needed when a network is trained with this option and Stage II training is bypassed. If Stage II training was not bypassed during network training, weights were based upon the optimum network from Stage II which never uses a table lookup approximation to calculate logistic activations. This is the default.
Output Keywords
Pred_class—An array of size n_patterns containing the predicted classification for each pattern.
Discussion
MLFF_PATTERN_CLASSIFICATION calculates classification probabilities from a previously trained multilayered feedforward neural network using the same network structure and scaling applied during the training. The structure NN_Network describes the network structure used to originally train the network. The weights, which are the key output from training, are used as input to this function. The weights are stored in the NN_Network structure.
In addition, two two-dimensional arrays are used to describe the values of the nominal and continuous attributes that are to be used as network inputs for calculating classification probabilities. Optionally, it can also return the predicted classifications in Pred_class. The predicted classification is the target class with the highest probability, Pred_class_prob.
MLFF_PATTERN_CLASSIFICATION returns classification probabilities for the network input patterns.
Pattern Classification Attributes
Neural network classification inputs consist of the following types of attributes:
1. nominal attributes
2. continuous attributes, including ordinal attributes encoded to cumulative percentages
The first data type contains the encoding of any nominal input attributes. If binary encoding is used, this encoding consists of creating columns of zeros and ones for each class value associated with every nominal attribute. The UNSUPERVISED_NOMINAL_FILTER Function function can be used for this encoding.
When only one nominal attribute is used for input, then the number of binary encoded columns is equal to the number of classes for that attribute. If more nominal attributes appear in the data, then each nominal attribute is associated with several columns, one for each of its classes. Each column consists of zeros and ones. The column value is zero if that classification is not associated with this pattern; otherwise, it is equal to one if it is assigned to this pattern.
Consider an example with one nominal variable and two classes: male and female and the following five patterns: male, male, female, male, female. With binary encoding, the following 5 by 2 matrix is sent to the pattern classification to request classification probabilities for these patterns:
The second category of input attributes corresponds to continuous attributes. They are passed to this classification function via the floating point array continuous. The number of rows in this matrix is n_patterns, and the number of columns is n_continuous, corresponding to the number of continuous input attributes.
Ordinal input attributes, if used, are typically encoded to cumulative percentages. Since these are floating point values, they are placed into a column of the continuous array and n_continuous is set equal to the number of columns in this array.
In some cases, one of these types of input attributes may not exist. In that case, either n_nominal = 0 or n_continuous = 0 and their corresponding input matrix is ignored.
Network Configuration
The configuration of the network consists of a description of the number of perceptrons for each layer, the number of hidden layers, the number of inputs and outputs, and a description of the linkages among the perceptrons. This description is passed into this training routine through the structure NN_Network. See the MLFF_NETWORK Function. For binary problems there is only a single output since the probability P(class = 0) is equal to 1 – P(class = 1). For other classification problems, however, n_outputs = n_classes and P(class = j) is equal to the classification probabilities in the j + 1 column of Pred_class_prob.
Classification Probabilities
Classification probabilities are calculated from the input attributes, network structure and weights provided in network.
Classification probabilities are returned in a two-dimensional array, Pred_class_prob, with n_patterns rows and n_classes columns. The values in the ith column are estimated probabilities for the class = (i – 1).
Example 1
Fisher’s (1936) Iris data is often used for benchmarking discriminant analysis and classification solutions. It is part of the IMSL data sets and consists of the following continuous input attributes and classification target:
Continuous Attributes — X1 (sepal length), X2 (sepal width), X3 (petal length), and X4 (petal width)
Classification Target (Iris Type) — Setosa, Versicolour or Virginica.
The input attributes were scaled to z-scores using the SCALE_FILTER Function. The hidden layer contained only 2 perceptrons and the output layer consisted of three perceptrons, one for each classification target.
Example 2 for MLFF_CLASSIFICATION_TRAINER Function used the following network structure for the 150 patterns in these data:
 
Figure 14-13: A 2-layer, Classification Network with 4 Inputs 5 Perceptrons and a Target Classification with 3 Classes
MLFF_CLASSIFICATION_TRAINER found the following 19 weights for this network:
W1 = -0.109866  W2 = -0.0534655  W3 = 4.92944   W4 = -2.04734
W5 = 10.2339    W6 = -1495.09    W7 = 3336.49   W8 =  7372.98
W9 = -9143.53   W10 = 48.8937    W11 = 240.958  W12 = -3386.21
W13 = 8904.6    W14 = 3339.1     W15 = 0.874638 W16 = -7978.42
W17 = 4586.22   W18 = 1931.89    W19 = -6518.14
The association of these weights with the calculation of the potentials for each perceptron is described in the following table:
 
Association of Network Weights with Perceptron Calculations
Perceptron
Potential
Activation
H1,1
W15 + X1W1 + X2W2 + X3W3 + X4W4
LOGISTIC
H1,2
W16 + X1W5 + X2W6 + X3W7 + X4W8
LOGISTIC
H2,1
W17 + H1,1W9 + H1,2W10
SOFTMAX
H2,2
W18 + H1,1W11 + H1,2W12
SOFTMAX
H2,3
W19 + H1,1W13 + H1,2W14
SOFTMAX
The potential calculations for each perceptron are activated using the assigned activation function. In this example, default activations were used, e.g., logistic for H1,1 and H1,2 and softmax for the output perceptrons H2,1, H2,2 and H2,3.
Note that in this case the network weights were retrieved from a file named iris_classfication.txt. This retrieves the trained network from the MLFF_CLASSIFICATION_TRAINER Function described in Example 2. These were passed directly to MLFF_PATTERN_CLASSIFICATION in the NN_Network structure.
PRO mlff_pattern_classification_ex1, test_arch_str  
 
   @CMAST_COMMON
 
   n_patterns    =150 
   n_inputs      =4    ; four inputs, all continuous  
   n_nominal     =0    ; no nominal input attributes  
   n_continuous  =4    ; one continuous input attribute  
   n_outputs     =3    ; total number of output perceptrons  
   act_fcn = [1, 1, 1]
   classification = LONARR(n_patterns) 
   unscaledX = FLTARR(n_patterns)
   scaledX   = FLTARR(n_patterns)
   contAtt   = FLTARR(n_patterns,n_continuous) 
   mean = FLTARR(n_continuous)
   s    = FLTARR(n_continuous)
 
   PRINT,"***************************************************"
   PRINT," IRIS CLASSIFICATION EXAMPLE-PATTERN CLASSIFICATION"
   PRINT,"***************************************************" 
   irisData = STATDATA(3) 
 
   ; 
   ; Set up the continuous attribute input array, contAtt(),
   ; and the network target classification array,  
   ; classification(), using the above raw data matrix. 
   classification(*) = LONG(irisData(*,0)-1)
   contAtt(*,*) = irisData(*,1:4)       
 
   ; Scale continuous input attributes using z-score method.
   FOR j=0L, n_continuous-1 DO BEGIN  
      unscaledX(*) = contAtt(*,j); 
      scaledX = SCALE_FILTER(unscaledX, 2,$ 
             Return_center_spread=centerspread)   
      contAtt(*,j) = scaledX(*) 
      mean(j) = centerspread(0) 
      s(j)    = centerspread(1) 
   ENDFOR
   PRINT,"Scale Parameters: " 
   FOR j=0L, n_continuous-1 DO BEGIN
      PRINT,"Var ",STRTRIM(j+1,2),$
      "  Mean = ",STRING(mean(j),Format="(f10.5)"),$
      "  S = ",STRING(s(j),Format="(f10.5)")
   ENDFOR
 
   ; Restore the saved network from 
   ; MLFF_CLASSIFICATION_TRAINER example 2
   filename = "iris_classification_"+test_arch_str+".sav"
   restore, Filename=filename
  
   ; Use pattern classification routine to classify training 
   ; patterns using trained network.  
   classProb = MLFF_PATTERN_CLASSIFICATION(network,$ 
                            n_patterns, 0L, contAtt,$  
                            Pred_class=predicted_class) 
 
   ; Print class predictions 
   prtLabel ="Predicted_Class  |  P(0)   P(1)   P(2)" 
   dashes   ="-------------------------------------------" 
 
   PRINT, prtLabel 
   PRINT, dashes 
   FOR i=0L, n_patterns-1 DO BEGIN
      PRINT,STRTRIM(predicted_class(i),2),$
      "                | ",$
      STRING(classProb(i,0),Format="(f6.4)")," ",$
      STRING(classProb(i,1),Format="(f6.4)")," ",$
      STRING(classProb(i,2),Format="(f6.4)") 
      IF (i EQ 49 OR i EQ 99) THEN BEGIN 
         PRINT,prtLabel 
         PRINT,dashes 
      ENDIF
   ENDFOR
END
Output
The output for this example reproduces the 100% classification accuracy found during network training. For details, see Example 2 of the MLFF_CLASSIFICATION_TRAINER Function.
******************************************************
 IRIS CLASSIFICATION EXAMPLE - PATTERN CLASSIFICATION 
******************************************************
 
Scale Parameters: 
Var 1  Mean =    5.84333  S =    0.82807
Var 2  Mean =    3.05733  S =    0.43587
Var 3  Mean =    3.75800  S =    1.76530
Var 4  Mean =    1.19933  S =    0.76224
 
Predicted_Class  |  P(0)   P(1)   P(2)
-------------------------------------------
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
0                | 1.0000 0.0000 0.0000
Predicted_Class  |  P(0)   P(1)   P(2)
-------------------------------------------
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
1                | 0.0000 1.0000 0.0000
Predicted_Class  |  P(0)   P(1)   P(2)
-------------------------------------------
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
2                | 0.0000 0.0000 1.0000
Example 2
Pattern classification is often used for pattern recognition, including playing simple games such as tic-tac-toe. The University of California at Irvine maintains a repository of data mining data, http://kdd.ics.uci.edu/. One consists of 958 patterns for board positions in tic-tac-toe donated by David Aha. See http://archive.ics.uci.edu/ml/datasets/Tic-Tac-Toe+Endgame for access to the actual data.
Each of the 958 patterns is described by nine nominal input attributes and one classification target. The nine nominal input attributes are the nine board positions in the game. Each has three classifications: X occupies the position, O occupies the position and vacant.
The target class is binary. A value of one indicates that the X player has one of eight possible wins in the next move. A value of zero indicates that this player does not have a winning position. 65.3% of the 958 patterns have a class = 1.
The nine nominal input attributes are mapped into 27 binary encoded columns, three for each of the nominal attributes. This makes a total of 27 input columns for the network. In this example, a neural network with one hidden layer containing ten perceptrons was found to provide 100% classification accuracy. This configuration is illustrated in Figure 14-14: A 2-layer, Binary Classification Network for Playing Tic-Tac-Toe.
 
Figure 14-14: A 2-layer, Binary Classification Network for Playing Tic-Tac-Toe
All hidden layer perceptrons used the default activation, logistic, and since the classification target is binary only one perceptron with logistic activation is used to calculate the probability of a loss for X, i.e., P(class = 0). All logistic activations are calculated using the Logistictable keyword, which can reduce Stage I training time. Since Stage II training is bypassed, this option must also be used with the MLFF_PATTERN_CLASSIFICATION routine. This is the only time this option is used. If Stage II training was part of the network training, the final network weights would have been calculated without using the logistic table to approximate the calculations.
This structure results in a network with 27 × 8 + 8 + 9 = 233 weights. It is surprising that with this small a number of weights relative to the number of training patterns, the trained network achieves 100% classification accuracy.
Unlike Example 1 in which the network was trained previously and retrieved, this example first trains the network and then passes the network structure network into MLFF_PATTERN_CLASSIFICATION.
PRO mlff_pattern_classification_ex2
 
   n_cat         =9    ; 9 nominal input attributes  
   n_categorical =27   ; 9 Encoded = 27 categorical inputs  
   n_classes     =2    ; positive or negative 
   n_outputs     =1    ; binary classification
 
   ; Get tic-tac-toe data   
   inputData = STATDATA(10)  
 
   n_patterns = (SIZE(inputData,/Dim))(0)
   n_var      = (SIZE(inputData,/Dim))(1)
 
   classification = LONARR(n_patterns)
 
   PRINT,"" 
   PRINT,"****************************************************"
   PRINT,"* TIC-TAC-TOE BINARY CLASSIFICATION EXAMPLE        *"
   PRINT,"****************************************************"
   ; Allocate memory for categoricalATT array  
   categoricalAtt    = LONARR(n_patterns,n_categorical) 
   ; Populate categoricalAtt from catAtt using binary encoding  
   nomTempIn  = LONARR(n_patterns)
 
   m=0L 
   FOR i=0L, n_cat-1 DO BEGIN 
      nomTempIn(*) = inputData(*,i)+1;  
      nomTempOut = UNSUPERVISED_NOMINAL_FILTER(nomTempIn, $
                                       N_classes=nClass)
      FOR k=0L, nClass-1 DO BEGIN 
         categoricalAtt(*,m) = nomTempOut(*,k)
         m = m+1 
      ENDFOR 
   ENDFOR 
    
 
   ; Set up the classification array, classification()  
   classification(*) = inputData(*,n_var-1); 
 
   network = MLFF_NETWORK_INIT(n_categorical, n_outputs)
   network = MLFF_NETWORK(network, Create_hidden=8) 
   network = MLFF_NETWORK(network,/Link_all) 
 
   RANDOMOPT,set=5555 
   ; Train Classification Network  
   trainStats = MLFF_CLASSIFICATION_TRAINER(network,$
                classification, categoricalAtt, 0L, $ 
                StageI=[30L, n_patterns], /NoStageII,$
                /LogisticTable, $ 
                Init_weights_method=IMSLS_EQUAL) 
 
   ; Use pattern classification routine to classify training 
   ; patterns using trained network.  This will reproduce  
   ; the results returned in predicted_class().
   classProb = MLFF_PATTERN_CLASSIFICATION(network, $
       n_patterns, categoricalAtt, 0L,$ 
       /LogisticTable, $
       Pred_class=predictedClass) 
 
   ; Printing Classification Predictions  
   PRINT,"****************************************************"
   PRINT,"Classification Minimum Cross-Entropy Error: ", $  
      STRING(trainStats(0),Format="(f8.6)") 
   PRINT,"Classification Error Rate: ",$
      STRING(trainStats(5),Format="(f8.6)")  
   PRINT,"****************************************************" 
   PRINT," "
   PRINT,"PRINTING FIRST TEN PREDICTIONS FOR EACH TARGET CLASS" 
   PRINT,"*****************************************************"
   PRINT,"        |TARGET|PREDICTED|             |            *" 
   PRINT,"PATTERN |CLASS |  CLASS  | P(class=0)  |  P(class=1)*" 
   PRINT,"*****************************************************" 
   FOR k=0L, n_classes-1 DO BEGIN
      FOR i=k*627, (k*627+10)-1 DO BEGIN
         PRINT,STRTRIM(i+1,2),"       |  ",$
               STRTRIM(classification(i),2),"   |    ",$
               STRTRIM(predictedClass(i),2),"    |  ", $
               STRING(classProb(i,0),Format="(f8.6)"),"      ",$
               STRING(classProb(i,1),Format="(f8.6)")  
      ENDFOR 
      PRINT,"" 
   ENDFOR 
END
Output
The output for this example demonstrates how MLFF_PATTERN_CLASSIFICATION reproduces the 100% classification accuracy found during network training.
*******************************************************
* TIC-TAC-TOE BINARY CLASSIFICATION EXAMPLE           *
*******************************************************
*******************************************************
Classification Minimum Cross-Entropy Error: 0.000126
Classification Error Rate: 0.000000
*******************************************************
 
PRINTING FIRST TEN PREDICTIONS FOR EACH TARGET CLASS
*******************************************************
        |TARGET|PREDICTED|             |              *
PATTERN |CLASS |  CLASS  | P(class=0)  |  P(class=1)  *
*******************************************************
1       |  1   |    1    |  0.000000      1.000000
2       |  1   |    1    |  0.000000      1.000000
3       |  1   |    1    |  0.000002      0.999998
4       |  1   |    1    |  0.000000      1.000000
5       |  1   |    1    |  0.000000      1.000000
6       |  1   |    1    |  0.000000      1.000000
7       |  1   |    1    |  0.000000      1.000000
8       |  1   |    1    |  0.000000      1.000000
9       |  1   |    1    |  0.000000      1.000000
10       |  1   |    1    |  0.000000      1.000000
 
628       |  0   |    0    |  1.000000      0.000000
629       |  0   |    0    |  1.000000      0.000000
630       |  0   |    0    |  1.000000      0.000000
631       |  0   |    0    |  1.000000      0.000000
632       |  0   |    0    |  1.000000      0.000000
633       |  0   |    0    |  1.000000      0.000000
634       |  0   |    0    |  1.000000      0.000000
635       |  0   |    0    |  1.000000      0.000000
636       |  0   |    0    |  1.000000      0.000000
637       |  0   |    0    |  1.000000      0.000000