Bookmark and Share Print this page
School of Population Health 12. Rank

 

The Rank menu is used to determine positive attributes, to obtain the optimal cuts, to rank the attributes and create ROC curves. To understand what is meant here by optimal cut consider the attributes formed by the following line of a control file:

X 10 20 30 40

It generates attributes X > 10, X > 20, X > 30 and X > 40. Each of these induces a partition of the data and each partition can be assessed by the chosen criterion. The cut that produces the best effectiveness is the optimal. Any of the Rank menu items will select the optimal cut (if more than one is specified on a control file line), rank all attributes and place them as the currently defined attribute set. The ranked attributes will appear tagged in the Select attribute set dialog box. (See 13)

12.1 Attribute Rank Plot

The Rank:Attribute Rank Plot item produces a Rank Attributes graphic which displays the optimal attributes in rank order:

 

The graphic shows the effectiveness criterion by the vertical axis and the rank as the horizontal axis. Each plotting position corresponds to an attribute created in the control file, or an added attribute created within SPAN.

Plotting positions of attributes created by the same line of the control file are aligned vertically, when there is more than one specified cut. In the above graphic there are four attributes created for the variable CLASS corresponding to different cut, CLASS>2 being the optimal. Other control file lines have only one specified cut. Note however, that three different attributes of the variable ALC are specified by three separate lines of the control file.

The best partition is EXEC=yes, the next ALC=none and so on.

The P-values corresponding to each point are also shown. There is no vertical P-value axis; instead horizontal white reference lines at different levels, P = 0.1,0.01,0.001,..., are shown. The P-values are those associated with a chi-square test if Y is dichotomised, binary or nominal and an F test if Y is interval. If Y is multivariate, the individual P-values of each Y is calculated and the smallest value is output.

12.2 Fix Positive attributes

 

Rank:Fix Positive attributes is used to determine the positive attributes according to the direction of a measure of correlation between Y and the attributes. It can also be used to interchange an attribute from positive to negative. or to ignore certain attributes.

When it is run a measure of correlation between Y and each attribute is calculated, as described below. If the correlation is negative, the positive and negative attributes are interchanged. Otherwise the attributes are left unchanged.

A dialog box Rank:Fix Positive attributes is presented which lists the calculated positive attributes, as determined by this process.  You can switch an attribute from positive to negative by using the Swap button. Highlight an item and press  Swap.

The As in CF button is used to revert all attributes to the way they are specified, as default positive attributes,  in the control file.

The Ignore button is used to ensure that selected attributes will not be considered in a search. Highlight an item and click ignore and the attribute to be ignored will be listed enclosed in square brackets [...]. Highlight and click again allows toggling.

Correlation calculation

The correlation is calculated as follows: Consider an attribute X = 1 and its complement X = 0, and suppose x is the binary indicator of X = 1. The (product-moment) correlation between x and Y is calculated (with Y assumed, for the moment, to be univariate; see below for how multivariate Y is handled). If the correlation is positive X = 1 is deemed the positive attribute, otherwise X = 0 is.

When attributes are formed from a string of cuts of a line of the control file, for example,

X <0 >1 >2

they are all assumed to be either positive (or negative). When you highlight and swap, say X<=0 you will therefore find that all three attributes will be swapped. In this case the correlation with each attribute is calculated and shown. The largest correlation, in absolute value ("Max. over cuts" in the dialog) , is used to determine whether they are all positive or all negative. For example, if the correlations calculated for X<=0, X>1 and X>2 are 0.5, -0.3, -0.4 respectively, they are all deemed positive. On the other hand, if the correlations were 0.2, -0.3, -0.4 then X>0, X<=1 and X<=2 would become the positive attributes, because the largest absolute correlation is -0.4 for X>2, making it a negative attribute.

Note that correlations are based on numeric values of Y correlated with the attribute and its complement coded 1 and zero respectively. This measure is effectively a scaled difference of the mean Y for the group defined by the attribute and group defined by its complement. If the values of Y are just coded nominal categories, correlation may have no meaning. If Y is itself binary the correlation is simply the so-called phi-coefficient, taking account of the direction of the relationship. If the log-rank effectiveness measure has been chosen, the correlation is the measure: (rate difference)/(rate sum), for the rates in the attribute and its complement.

12.2.1 Multiple Y

If multivariate Y is specified, with k Y's, Y1,Y2,..., Yk, correlation is calculated for each Yi and the reported value is the largest in absolute value. Further, this is the value that is used to determine positive attributes, as described above.

12.3 Effectiveness versus Cuts plot

This produces, for variables with more than one cut, a plot of the cut value versus the specified effectiveness criterion. A dialog box appears which lists the variables associated with each line of the control file that has specified more than one cut. Select a variable from the list. If there are two or more lines of the control file associated with the same variable, the dialog displays them as, for instance, AGE 1, AGE 2 and so on. Here is an effectiveness versus cuts plot for a variable AGP for which 19 percentile cuts are specified on the control file (with the multiple cuts syntax AGP P(5-95)5)

 

The number of plotting positions is the number of specified cuts in a control file line. SPAN picks the cut with the largest effectiveness to be the optimal cut.

12.4 ROC curves

 

An ROC (Receiver Operating Characteristic) curve can be produced if the Y variable is binary (corresponding to the variable that defines the "disease" and "no disease" categories of the ROC). If the specified Y has more than two states or is defined to be an interval variable ROC is not enabled.

The variable that defines the "test" (eg blood pressure) must be one of the other variables and you must specify a sequence of cutpoints for it in the control file. SPAN does not (as some ROC programs do) use every unique data value as a plotting position. The cutpoints that are specified in the control file provide the plotting positions on the curve.

A dialog box ROC:Select variable appears which lists the variables associated with each line of the control file. Select a variable from the list. Options are given to have the X axis diplaying sensitivity or 1-sensitivity, to superimpose ROC curves and to connect points.  

If there are two or more lines of the control file associated with the same variable, the dialog displays them as, for instance, AGE 1, AGE 2 and so on.

The "multiple cuts" (see 9.2.3) syntax is useful for specifying (up to 50) cutpoints:

9.2.3

BP (80-120)2

specifies cutpoints 80, 82, 84 etc. 

The optimal cut, with respect to the chosen effectiveness criterion, is highlighted.

The area under the ROC is computed by the trapezoidal method and an associated standard error is calculated using formulae in Hanley and McNeil6.

12.5 QROC curves

 

QROC stands for Quality adjusted Receiver Operating Characteristic. These plots are described by Kraemer4. The y-axis is quality adjusted sensitivity (QI(1) in Kraemer's notation) and the x-axis is quality adjusted sensitivity(QI(0)).

The construction is done exactly as for ROC curves, using plotting positions as cuts specified in the control file.

[Back to table of contents]



Please give us your feedback or ask us a question

This message is...


My feedback or question is...


My email address is...

(Only if you need a reply)