The criteria dialog box allows you to select parameters that determine how to measure
the effectiveness of a partition.
11.1 Criterion
Full definitions of the eight Criterion menu items are given in detail in
Appendix 3. The Within MSE is criterion is can be used for continuous Y, as can Subgroup
MSE. The Entropy, Chi-square and Gini diversity indices are for discrete categorical
Y and allow the possibility of specified priors. Only 10 categories of each are
allowed and they must be coded with integer digits 0,1,..9. The Log-rank criterion
is for survival data, Y being a time measure. It allows censoring and if selected
much of the output generated from Process menu will be as incidence rates and rate
ratios, rather than means.
An attribute must be specified for the censoring, its
converse represents a "fail" in incidence rate computations.
Appendix 3
11.2 Balancing parameter
gamma
This refers to the balancing parameter g
(see reference 2). The effectiveness criterion is multiplied by (pApA-) g,
where pA is the proportion of the data
in A.
If you do not want the value currently assigned, change the item to input a new
value. No balancing is g
= 0.
Reference 2
11.3 Penalty parameter beta
This refers to the partition
complexity penalising parameter
b (see reference 2). If a partition has complexity c, the effectiveness criterion is penalised by subtracting bc.
The beta entered value becomes the default, but it can be changed later in the program (see
15.4). If automatic is entered, the b
parameter is chosen automatically, as described
in 15.4.
Reference 2
15.4
11.4 Complexity
There are three ways you can determine complexity (c) of a partition: q+q'-1, no.
of attributes, min(q,q'). These three options are discussed in reference 2. The
q+q'-1 option is default and is advocated.
11.5 Prior for Entropy/Gini
If you select the Entropy or Gini indices you can specify your own prior probabilities
for categories of a categorical outcome. (See A3.3.1
for details) With user specified
prior probabilities using Process:Statistics will produce a frequency table of "prior
adjusted pseudo counts" (see
16.5.5)
A3.3.1
16.5.5
11.6 Costs for Gini/Entropy and Quality index
If you select the Entropy, Gini , Quality indices you can specify utilities for
each category in A and in A-. For example, in the Criteria panel above with a binary
Y variable bwbinary a cost 10 is assigned to the (A,0) combination (a "false positive"), 1 to (A,1) (a "correct positive")
, 5 to (A- 1) (a "false negative") and zero to (A-,0)
(a "correct, negative"). The expected cost will be calculated for each partition
if Specify utilities is checked and if Weigh criterion by utilities
is checked the
Entropy and Gini effectiveness measures are weighted by expected
cost, that is,
it is multiplied by expected cost. The idea is that partitions with greater expected
utility will have greater effectiveness. The expected cost,
EC, is defined, for
costs ciy
assigned to outcome y and i=A or i=A- and data proportion piy
by
EC=åi
åy ciypiy
For the Quality index measure of effectiveness, the costs are used to calculate
the index, r, in the measure QI(r). See
Appendix 3.
[Back to table of contents]