Bookmark and Share Print this page
School of Population Health 11. Criteria

 

The criteria dialog box allows you to select parameters that determine how to measure the effectiveness of a partition.

11.1 Criterion

Full definitions of the eight Criterion menu items are given in detail in Appendix 3. The Within MSE is criterion is can be used for continuous Y, as can Subgroup MSE. The Entropy, Chi-square and Gini diversity indices are for discrete categorical Y and allow the possibility of specified priors. Only 10 categories of each are allowed and they must be coded with integer digits 0,1,..9. The Log-rank criterion is for survival data, Y being a time measure. It allows censoring and if selected much of the output generated from Process menu will be as incidence rates and rate ratios, rather than means. An attribute must be specified for the censoring, its converse represents a "fail" in incidence rate computations.

Appendix 3

11.2 Balancing parameter gamma

This refers to the balancing parameter g (see reference 2). The effectiveness criterion is multiplied by (pApA-) g, where pA is the proportion of the data in A. If you do not want the value currently assigned, change the item to input a new value. No balancing is g = 0.

Reference 2

11.3 Penalty parameter beta

This refers to the partition complexity penalising parameter b (see reference 2). If a partition has complexity c, the effectiveness criterion is penalised by subtracting bc. The beta entered value becomes the default, but it can be changed later in the program (see 15.4). If automatic is entered, the b parameter is chosen automatically, as described in 15.4.

Reference 2

15.4

11.4 Complexity

There are three ways you can determine complexity (c) of a partition: q+q'-1, no. of attributes, min(q,q'). These three options are discussed in reference 2. The q+q'-1 option is default and is advocated.

11.5 Prior for Entropy/Gini

If you select the Entropy or Gini indices you can specify your own prior probabilities for categories of a categorical outcome. (See A3.3.1 for details) With user specified prior probabilities using Process:Statistics will produce a frequency table of "prior adjusted pseudo counts" (see 16.5.5)

A3.3.1

16.5.5

11.6 Costs for Gini/Entropy and Quality index

If you select the Entropy, Gini , Quality indices you can specify utilities for each category in A and in A-. For example, in the Criteria panel above with a binary Y variable bwbinary a cost 10 is assigned to the (A,0) combination (a "false positive"), 1 to (A,1) (a "correct positive") , 5 to (A- 1) (a "false negative") and zero to (A-,0) (a "correct, negative"). The expected cost will be calculated for each partition if Specify utilities is checked and if Weigh criterion by utilities is checked the Entropy and Gini effectiveness measures are weighted by expected cost, that is, it is multiplied by expected cost. The idea is that partitions with greater expected utility will have greater effectiveness. The expected cost, EC, is defined, for costs ciy assigned to outcome y and i=A or i=A- and data proportion piy by

EC=åi åy ciypiy

For the Quality index measure of effectiveness, the costs are used to calculate the index, r, in the measure QI(r). See Appendix 3.

[Back to table of contents]



Please give us your feedback or ask us a question

This message is...


My feedback or question is...


My email address is...

(Only if you need a reply)