MLFF_PATTERN_CLASSIFICATION Function

PV-WAVE Advantage > IMSL Statistics Reference Guide > Data Mining > MLFF_PATTERN_CLASSIFICATION Function

Calculates classifications for trained multilayered feedforward neural networks.

Usage

result = MLFF_PATTERN_CLASSIFICATION (network, n_patterns, nominal, continuous)

Input Parameters

network—A structure of type NN_Network containing the trained feedforward network. For more details, see the MLFF_NETWORK Function.

n_patterns—A scalar long value indicating the number of patterns to classify.

nominal—Array of size n_patterns by n_nominal containing the nominal input variables, where n_nominal is the number of nominal input attributes. If n_nominal = 0, this argument is ignored.

continuous—Array of size n_patterns by n_continuous containing values for the continuous and scaled ordinal input variables, where n_continuous is the number of continuous attributes. If n_continuous = 0, this argument is ignored.

Returned Value

result—An array of size n_patterns by n_classes containing the predicted class probabilities associated with each input pattern, where n_classes is the number of possible target classifications. n_classes = network.n_outputs for non-binary classification categories. For binary classification, n_classes = 2.

Input Keywords

Logistictable—If present and nonzero, this option specifies that all logistic activation functions are calculated using the table lookup approximation. This is only needed when a network is trained with this option and Stage II training is bypassed. If Stage II training was not bypassed during network training, weights were based upon the optimum network from Stage II which never uses a table lookup approximation to calculate logistic activations. This is the default.

Output Keywords

Pred_class—An array of size n_patterns containing the predicted classification for each pattern.

Discussion

MLFF_PATTERN_CLASSIFICATION calculates classification probabilities from a previously trained multilayered feedforward neural network using the same network structure and scaling applied during the training. The structure NN_Network describes the network structure used to originally train the network. The weights, which are the key output from training, are used as input to this function. The weights are stored in the NN_Network structure.

In addition, two two-dimensional arrays are used to describe the values of the nominal and continuous attributes that are to be used as network inputs for calculating classification probabilities. Optionally, it can also return the predicted classifications in Pred_class. The predicted classification is the target class with the highest probability, Pred_class_prob.

MLFF_PATTERN_CLASSIFICATION returns classification probabilities for the network input patterns.

Pattern Classification Attributes

Neural network classification inputs consist of the following types of attributes:

1. nominal attributes

2. continuous attributes, including ordinal attributes encoded to cumulative percentages

The first data type contains the encoding of any nominal input attributes. If binary encoding is used, this encoding consists of creating columns of zeros and ones for each class value associated with every nominal attribute. The UNSUPERVISED_NOMINAL_FILTER Function function can be used for this encoding.

When only one nominal attribute is used for input, then the number of binary encoded columns is equal to the number of classes for that attribute. If more nominal attributes appear in the data, then each nominal attribute is associated with several columns, one for each of its classes. Each column consists of zeros and ones. The column value is zero if that classification is not associated with this pattern; otherwise, it is equal to one if it is assigned to this pattern.

Consider an example with one nominal variable and two classes: male and female and the following five patterns: male, male, female, male, female. With binary encoding, the following 5 by 2 matrix is sent to the pattern classification to request classification probabilities for these patterns:

The second category of input attributes corresponds to continuous attributes. They are passed to this classification function via the floating point array continuous. The number of rows in this matrix is n_patterns, and the number of columns is n_continuous, corresponding to the number of continuous input attributes.

Ordinal input attributes, if used, are typically encoded to cumulative percentages. Since these are floating point values, they are placed into a column of the continuous array and n_continuous is set equal to the number of columns in this array.

In some cases, one of these types of input attributes may not exist. In that case, either n_nominal = 0 or n_continuous = 0 and their corresponding input matrix is ignored.

Network Configuration

The configuration of the network consists of a description of the number of perceptrons for each layer, the number of hidden layers, the number of inputs and outputs, and a description of the linkages among the perceptrons. This description is passed into this training routine through the structure NN_Network. See the MLFF_NETWORK Function. For binary problems there is only a single output since the probability P(class = 0) is equal to 1 – P(class = 1). For other classification problems, however, n_outputs = n_classes and P(class = j) is equal to the classification probabilities in the j + 1 column of Pred_class_prob.

Classification Probabilities

Classification probabilities are calculated from the input attributes, network structure and weights provided in network.

Classification probabilities are returned in a two-dimensional array, Pred_class_prob, with n_patterns rows and n_classes columns. The values in the ith column are estimated probabilities for the class = (i – 1).

Example 1

Fisher’s (1936) Iris data is often used for benchmarking discriminant analysis and classification solutions. It is part of the IMSL data sets and consists of the following continuous input attributes and classification target:

Continuous Attributes — X1 (sepal length), X2 (sepal width), X3 (petal length), and X4 (petal width)

Classification Target (Iris Type) — Setosa, Versicolour or Virginica.

The input attributes were scaled to z-scores using the SCALE_FILTER Function. The hidden layer contained only 2 perceptrons and the output layer consisted of three perceptrons, one for each classification target.

Example 2 for MLFF_CLASSIFICATION_TRAINER Function used the following network structure for the 150 patterns in these data:

Figure 14-13: A 2-layer, Classification Network with 4 Inputs 5 Perceptrons and a Target Classification with 3 Classes

MLFF_CLASSIFICATION_TRAINER found the following 19 weights for this network:

W1 = -0.109866  W2 = -0.0534655  W3 = 4.92944   W4 = -2.04734

W5 = 10.2339    W6 = -1495.09    W7 = 3336.49   W8 =  7372.98

W9 = -9143.53   W10 = 48.8937    W11 = 240.958  W12 = -3386.21

W13 = 8904.6    W14 = 3339.1     W15 = 0.874638 W16 = -7978.42

W17 = 4586.22   W18 = 1931.89    W19 = -6518.14

The association of these weights with the calculation of the potentials for each perceptron is described in the following table:

Association of Network Weights with Perceptron Calculations
Perceptron	Potential	Activation
H1,1	W15 + X1W1 + X2W2 + X3W3 + X4W4	LOGISTIC
H1,2	W16 + X1W5 + X2W6 + X3W7 + X4W8	LOGISTIC
H2,1	W17 + H1,1W9 + H1,2W10	SOFTMAX
H2,2	W18 + H1,1W11 + H1,2W12	SOFTMAX
H2,3	W19 + H1,1W13 + H1,2W14	SOFTMAX

The potential calculations for each perceptron are activated using the assigned activation function. In this example, default activations were used, e.g., logistic for H1,1 and H1,2 and softmax for the output perceptrons H2,1, H2,2 and H2,3.

Note that in this case the network weights were retrieved from a file named iris_classfication.txt. This retrieves the trained network from the MLFF_CLASSIFICATION_TRAINER Function described in Example 2. These were passed directly to MLFF_PATTERN_CLASSIFICATION in the NN_Network structure.

PRO mlff_pattern_classification_ex1, test_arch_str

   @CMAST_COMMON

   n_patterns    =150

   n_inputs      =4    ; four inputs, all continuous

   n_nominal     =0    ; no nominal input attributes

   n_continuous  =4    ; one continuous input attribute

   n_outputs     =3    ; total number of output perceptrons

   act_fcn = [1, 1, 1]

   classification = LONARR(n_patterns)

   unscaledX = FLTARR(n_patterns)

   scaledX   = FLTARR(n_patterns)

   contAtt   = FLTARR(n_patterns,n_continuous)

   mean = FLTARR(n_continuous)

   s    = FLTARR(n_continuous)

   PRINT,"***************************************************"

   PRINT," IRIS CLASSIFICATION EXAMPLE-PATTERN CLASSIFICATION"

   PRINT,"***************************************************"

   irisData = STATDATA(3)

   ; Set up the continuous attribute input array, contAtt(),

   ; and the network target classification array,

   ; classification(), using the above raw data matrix.

   classification(*) = LONG(irisData(*,0)-1)

   contAtt(*,*) = irisData(*,1:4)

   ; Scale continuous input attributes using z-score method.

   FOR j=0L, n_continuous-1 DO BEGIN

      unscaledX(*) = contAtt(*,j);

      scaledX = SCALE_FILTER(unscaledX, 2,$

             Return_center_spread=centerspread)

      contAtt(*,j) = scaledX(*)

      mean(j) = centerspread(0)

      s(j)    = centerspread(1)

   ENDFOR

   PRINT,"Scale Parameters: "

   FOR j=0L, n_continuous-1 DO BEGIN

      PRINT,"Var ",STRTRIM(j+1,2),$

      "  Mean = ",STRING(mean(j),Format="(f10.5)"),$

      "  S = ",STRING(s(j),Format="(f10.5)")

   ENDFOR

   ; Restore the saved network from

   ; MLFF_CLASSIFICATION_TRAINER example 2

   filename = "iris_classification_"+test_arch_str+".sav"

   restore, Filename=filename

   ; Use pattern classification routine to classify training

   ; patterns using trained network.

   classProb = MLFF_PATTERN_CLASSIFICATION(network,$

                            n_patterns, 0L, contAtt,$

                            Pred_class=predicted_class)

   ; Print class predictions

   prtLabel ="Predicted_Class  |  P(0)   P(1)   P(2)"

   dashes   ="-------------------------------------------"

   PRINT, prtLabel

   PRINT, dashes

   FOR i=0L, n_patterns-1 DO BEGIN

      PRINT,STRTRIM(predicted_class(i),2),$

      "                | ",$

      STRING(classProb(i,0),Format="(f6.4)")," ",$

      STRING(classProb(i,1),Format="(f6.4)")," ",$

      STRING(classProb(i,2),Format="(f6.4)")

      IF (i EQ 49 OR i EQ 99) THEN BEGIN

         PRINT,prtLabel

         PRINT,dashes

      ENDIF

   ENDFOR

END

Output

The output for this example reproduces the 100% classification accuracy found during network training. For details, see Example 2 of the MLFF_CLASSIFICATION_TRAINER Function.

******************************************************

 IRIS CLASSIFICATION EXAMPLE - PATTERN CLASSIFICATION

******************************************************

Scale Parameters:

Var 1  Mean =    5.84333  S =    0.82807

Var 2  Mean =    3.05733  S =    0.43587

Var 3  Mean =    3.75800  S =    1.76530

Var 4  Mean =    1.19933  S =    0.76224

Predicted_Class  |  P(0)   P(1)   P(2)

-------------------------------------------

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

0                | 1.0000 0.0000 0.0000

Predicted_Class  |  P(0)   P(1)   P(2)

-------------------------------------------

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

1                | 0.0000 1.0000 0.0000

Predicted_Class  |  P(0)   P(1)   P(2)

-------------------------------------------

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

2                | 0.0000 0.0000 1.0000

Example 2

Pattern classification is often used for pattern recognition, including playing simple games such as tic-tac-toe. The University of California at Irvine maintains a repository of data mining data, http://kdd.ics.uci.edu/. One consists of 958 patterns for board positions in tic-tac-toe donated by David Aha. See http://archive.ics.uci.edu/ml/datasets/Tic-Tac-Toe+Endgame for access to the actual data.

Each of the 958 patterns is described by nine nominal input attributes and one classification target. The nine nominal input attributes are the nine board positions in the game. Each has three classifications: X occupies the position, O occupies the position and vacant.

The target class is binary. A value of one indicates that the X player has one of eight possible wins in the next move. A value of zero indicates that this player does not have a winning position. 65.3% of the 958 patterns have a class = 1.

The nine nominal input attributes are mapped into 27 binary encoded columns, three for each of the nominal attributes. This makes a total of 27 input columns for the network. In this example, a neural network with one hidden layer containing ten perceptrons was found to provide 100% classification accuracy. This configuration is illustrated in Figure 14-14: A 2-layer, Binary Classification Network for Playing Tic-Tac-Toe.

Figure 14-14: A 2-layer, Binary Classification Network for Playing Tic-Tac-Toe

All hidden layer perceptrons used the default activation, logistic, and since the classification target is binary only one perceptron with logistic activation is used to calculate the probability of a loss for X, i.e., P(class = 0). All logistic activations are calculated using the Logistictable keyword, which can reduce Stage I training time. Since Stage II training is bypassed, this option must also be used with the MLFF_PATTERN_CLASSIFICATION routine. This is the only time this option is used. If Stage II training was part of the network training, the final network weights would have been calculated without using the logistic table to approximate the calculations.

This structure results in a network with 27 × 8 + 8 + 9 = 233 weights. It is surprising that with this small a number of weights relative to the number of training patterns, the trained network achieves 100% classification accuracy.

Unlike Example 1 in which the network was trained previously and retrieved, this example first trains the network and then passes the network structure network into MLFF_PATTERN_CLASSIFICATION.

PRO mlff_pattern_classification_ex2

   n_cat         =9    ; 9 nominal input attributes

   n_categorical =27   ; 9 Encoded = 27 categorical inputs

   n_classes     =2    ; positive or negative

   n_outputs     =1    ; binary classification

   ; Get tic-tac-toe data

   inputData = STATDATA(10)

   n_patterns = (SIZE(inputData,/Dim))(0)

   n_var      = (SIZE(inputData,/Dim))(1)

   classification = LONARR(n_patterns)

   PRINT,""

   PRINT,"****************************************************"

   PRINT,"* TIC-TAC-TOE BINARY CLASSIFICATION EXAMPLE        *"

   PRINT,"****************************************************"

   ; Allocate memory for categoricalATT array

   categoricalAtt    = LONARR(n_patterns,n_categorical)

   ; Populate categoricalAtt from catAtt using binary encoding

   nomTempIn  = LONARR(n_patterns)

   m=0L

   FOR i=0L, n_cat-1 DO BEGIN

      nomTempIn(*) = inputData(*,i)+1;

      nomTempOut = UNSUPERVISED_NOMINAL_FILTER(nomTempIn, $

                                       N_classes=nClass)

      FOR k=0L, nClass-1 DO BEGIN

         categoricalAtt(*,m) = nomTempOut(*,k)

         m = m+1

      ENDFOR

   ENDFOR

   ; Set up the classification array, classification()

   classification(*) = inputData(*,n_var-1);

   network = MLFF_NETWORK_INIT(n_categorical, n_outputs)

   network = MLFF_NETWORK(network, Create_hidden=8)

   network = MLFF_NETWORK(network,/Link_all)

   RANDOMOPT,set=5555

   ; Train Classification Network

   trainStats = MLFF_CLASSIFICATION_TRAINER(network,$

                classification, categoricalAtt, 0L, $

                StageI=[30L, n_patterns], /NoStageII,$

                /LogisticTable, $

                Init_weights_method=IMSLS_EQUAL)

   ; Use pattern classification routine to classify training

   ; patterns using trained network.  This will reproduce

   ; the results returned in predicted_class().

   classProb = MLFF_PATTERN_CLASSIFICATION(network, $

       n_patterns, categoricalAtt, 0L,$

       /LogisticTable, $

       Pred_class=predictedClass)

   ; Printing Classification Predictions

   PRINT,"****************************************************"

   PRINT,"Classification Minimum Cross-Entropy Error: ", $

      STRING(trainStats(0),Format="(f8.6)")

   PRINT,"Classification Error Rate: ",$

      STRING(trainStats(5),Format="(f8.6)")

   PRINT,"****************************************************"

   PRINT," "

   PRINT,"PRINTING FIRST TEN PREDICTIONS FOR EACH TARGET CLASS"

   PRINT,"*****************************************************"

   PRINT,"        |TARGET|PREDICTED|             |            *"

   PRINT,"PATTERN |CLASS |  CLASS  | P(class=0)  |  P(class=1)*"

   PRINT,"*****************************************************"

   FOR k=0L, n_classes-1 DO BEGIN

      FOR i=k*627, (k*627+10)-1 DO BEGIN

         PRINT,STRTRIM(i+1,2),"       |  ",$

               STRTRIM(classification(i),2),"   |    ",$

               STRTRIM(predictedClass(i),2),"    |  ", $

               STRING(classProb(i,0),Format="(f8.6)"),"      ",$

               STRING(classProb(i,1),Format="(f8.6)")

      ENDFOR

      PRINT,""

   ENDFOR

END

Output

The output for this example demonstrates how MLFF_PATTERN_CLASSIFICATION reproduces the 100% classification accuracy found during network training.

*******************************************************

* TIC-TAC-TOE BINARY CLASSIFICATION EXAMPLE           *

*******************************************************

*******************************************************

Classification Minimum Cross-Entropy Error: 0.000126

Classification Error Rate: 0.000000

*******************************************************

PRINTING FIRST TEN PREDICTIONS FOR EACH TARGET CLASS

*******************************************************

        |TARGET|PREDICTED|             |              *

PATTERN |CLASS |  CLASS  | P(class=0)  |  P(class=1)  *

*******************************************************

1       |  1   |    1    |  0.000000      1.000000

2       |  1   |    1    |  0.000000      1.000000

3       |  1   |    1    |  0.000002      0.999998

4       |  1   |    1    |  0.000000      1.000000

5       |  1   |    1    |  0.000000      1.000000

6       |  1   |    1    |  0.000000      1.000000

7       |  1   |    1    |  0.000000      1.000000

8       |  1   |    1    |  0.000000      1.000000

9       |  1   |    1    |  0.000000      1.000000

10       |  1   |    1    |  0.000000      1.000000

628       |  0   |    0    |  1.000000      0.000000

629       |  0   |    0    |  1.000000      0.000000

630       |  0   |    0    |  1.000000      0.000000

631       |  0   |    0    |  1.000000      0.000000

632       |  0   |    0    |  1.000000      0.000000

633       |  0   |    0    |  1.000000      0.000000

634       |  0   |    0    |  1.000000      0.000000

635       |  0   |    0    |  1.000000      0.000000

636       |  0   |    0    |  1.000000      0.000000

637       |  0   |    0    |  1.000000      0.000000