
The Rank menu is used to determine positive attributes, to obtain the optimal cuts,
to rank the attributes and create ROC curves. To understand what is meant here by
optimal cut consider the attributes formed by the following line of a control file:
X 10 20 30 40
It generates attributes X > 10, X > 20, X > 30 and X > 40. Each of these
induces a partition of the data and each partition can be assessed by the chosen criterion. The cut that produces the best effectiveness is the optimal. Any of the
Rank menu items will select the optimal cut (if more than one is specified on a
control file line), rank all attributes and place them as the currently defined
attribute set. The ranked attributes will appear tagged in the Select attribute
set dialog box. (See 13)
12.1 Attribute Rank Plot
The Rank:Attribute Rank Plot item produces a Rank Attributes graphic which displays
the optimal attributes in rank order:
The graphic shows the effectiveness criterion by the vertical axis and the rank
as the horizontal axis. Each plotting position corresponds to an attribute created
in the control file, or an added attribute created within SPAN.
Plotting positions of attributes created by the same line of the control file are
aligned vertically, when there is more than one specified cut. In the above graphic
there are four attributes created for the variable CLASS corresponding to different
cut, CLASS>2 being the optimal.
Other control file lines have only one specified cut. Note however, that three different
attributes of the variable ALC are specified by three separate lines of the control
file.
The best partition is EXEC=yes, the
next ALC=none and so on.
The P-values corresponding to each point are also shown.
There is no vertical P-value
axis; instead horizontal white reference lines at different levels, P =
0.1,0.01,0.001,..., are shown. The P-values are those associated with a chi-square
test if Y is dichotomised, binary or nominal and an F test if
Y is interval. If Y is multivariate, the individual P-values of
each Y is calculated and the smallest value is output.
12.2 Fix Positive attributes
Rank:Fix Positive attributes is used to determine the positive attributes
according to the direction of a measure of correlation between Y and the
attributes. It can also be used to interchange an attribute from positive to negative.
or to ignore certain attributes.
When it is run a measure of correlation between Y and each attribute is
calculated, as described below. If the correlation is negative, the positive and
negative attributes are interchanged. Otherwise the attributes are left unchanged.
A dialog box Rank:Fix Positive attributes is presented which lists the calculated positive attributes, as determined
by this process. You can switch an attribute from positive to
negative by using the Swap button. Highlight an item and press Swap.
The As in CF button is used to revert all attributes to the way they are
specified, as default positive attributes, in the control file.
The Ignore button is used to ensure that selected attributes will not be considered
in a search. Highlight an item and click ignore and the attribute to be ignored
will be listed enclosed in square brackets [...]. Highlight and click again allows
toggling.
Correlation calculation
The correlation is calculated as follows: Consider an attribute X = 1 and
its complement X = 0, and suppose x is the binary indicator of
X = 1. The (product-moment) correlation between x and Y
is calculated (with Y assumed, for the moment, to be univariate; see below
for how multivariate Y is handled). If the correlation is positive X
= 1 is deemed the positive attribute, otherwise X = 0 is.
When attributes are formed from a string of cuts of a line of the control file,
for example,
X <0 >1 >2
they are all assumed to be either positive (or negative). When you highlight
and swap, say X<=0 you will therefore
find that all three attributes will be swapped. In this case the correlation with
each attribute is calculated and shown. The largest correlation, in absolute value
("Max. over cuts" in the dialog) ,
is used to determine whether they are all positive or all negative. For example,
if the correlations calculated for X<=0, X>1
and X>2 are 0.5, -0.3, -0.4 respectively,
they are all deemed positive. On the other hand, if the correlations were 0.2, -0.3,
-0.4 then X>0, X<=1 and X<=2 would become the positive attributes,
because the largest absolute correlation is -0.4 for
X>2, making it a negative attribute.
Note that correlations are based on numeric values of Y correlated with
the attribute and its complement coded 1 and zero respectively. This measure is
effectively a scaled difference of the mean Y for the group defined by the attribute
and group defined by its complement. If the values of Y are just coded
nominal categories, correlation may have no meaning. If Y is itself binary the correlation
is simply the so-called phi-coefficient, taking account of the direction of the
relationship. If the log-rank effectiveness measure has been chosen, the correlation
is the measure: (rate difference)/(rate sum), for the rates in the attribute and
its complement.
12.2.1 Multiple Y
If multivariate Y is specified, with k Y's, Y1,Y2,...,
Yk, correlation is calculated
for each Yi and the reported value
is the largest in absolute value. Further, this is the value that is used to determine
positive attributes, as described above.
12.3 Effectiveness versus Cuts plot
This produces, for variables with more than one cut, a plot of the cut value versus
the specified effectiveness criterion. A dialog box appears which lists the variables
associated with each line of the control file that has specified more than one cut.
Select a variable from the list. If there are two or more lines of the control file
associated with the same variable, the dialog displays them as, for instance, AGE 1, AGE 2 and so on. Here is an effectiveness
versus cuts plot for a variable AGP
for which 19 percentile cuts are specified on the control file (with the multiple
cuts syntax AGP P(5-95)5)
The number of plotting positions is the number of specified cuts in a control file
line. SPAN picks the cut with the largest effectiveness to be the optimal cut.
12.4 ROC curves
An ROC (Receiver Operating Characteristic)
curve can be produced if the Y
variable is binary (corresponding to the variable that defines the "disease" and
"no disease" categories of the ROC). If the specified Y has more than two
states or is defined to be an interval variable ROC is not enabled.
The variable that defines the "test" (eg blood pressure) must be one of the other
variables and you must specify a sequence of cutpoints for it in the control file.
SPAN does not (as some ROC programs do) use every unique data value as a plotting
position. The cutpoints that are specified in the control file provide the plotting
positions on the curve.
A dialog box ROC:Select variable appears which lists the variables associated
with each line of the control file. Select a variable from the list. Options are
given to have the X axis diplaying sensitivity or 1-sensitivity, to superimpose ROC curves and to connect points.
If there are two or more lines of
the control file associated with the same variable, the dialog displays them as,
for instance, AGE 1, AGE 2 and so
on.
The "multiple cuts" (see 9.2.3) syntax is useful for specifying (up to 50) cutpoints:
9.2.3
BP (80-120)2
specifies cutpoints 80, 82, 84 etc.
The optimal cut, with respect to the chosen effectiveness criterion, is highlighted.
The area under the ROC is computed by the trapezoidal method and an associated standard
error is calculated using formulae in Hanley and McNeil6.
12.5 QROC curves
QROC stands for Quality adjusted Receiver
Operating Characteristic. These plots are described
by Kraemer4. The y-axis is quality adjusted
sensitivity (QI(1) in Kraemer's notation) and the x-axis is quality adjusted
sensitivity(QI(0)).
The construction is done exactly as for ROC curves, using plotting positions as
cuts specified in the control file.
[Back to table of contents]