The Y menu allows you to select the dependent variable(s) for the analysis. It also
allows scatter and distribution plots, dichotomising, transforming variables and
allowing sub-group analyses, which enable cross-validation to be done.
10.1 OK var
When you have defined the dependent variable(s), using Select Y (below),
this item fixes the selection. If you have selected more than one dependent variable,
the menu item changes to OK Multiple Y: vars for a list of variables.
10.2 Distribution
This item will create a graphic of the (univariate) distribution of Y. There are five options: Empirical pdf (probability density function), Bar chart frequency, Bar
chart relative frequency, Survival and Empirical cdf (cumulative density function)
plot. These options are activated by buttons in the Distribution window:
Empirical pdf
The empirical probability density plot is a kernel density estimate, using
a normal bandwidth smoother. It is only appropriate for interval data and will be
disallowed if there are fewer than 10 distinct values of the variable. The effect
of censoring, if data are censored, is ignored. The width of the normal smoother
can be adjusted by clicking on the Band+
and Band- buttons that become active in
the window. Here is an example:
Bar Chart
Either Bar_n or Bar_% options produce a bar chart for number or percentage
with a bar for each unique value, no grouping is done initially. Once this image image appears the top button change to Bin+ and Bin- which allow grouping in bins. The bin widths are automatically determined ; Bin+
increases and Bin- decreases the number of bins.
Survival, empirical cdf
Survival and empitical cdf plots are estimated non- parametrically.
If the variable is censored (which is only allowed when the log-rank effectiveness
criterion has
been selected, see A3.7) the Kaplan-Meier product-limit method is
used. To obtain a Kaplan-Meier plot you need to have already selected the log-rank
in the Criteria dialog box which allows you to set the censoring attribute.
Survival is simply 1-empirical cdf
Note that if Y is multivariate, the distribution is done only for the first variable
in the list of Y.
The number and % missing data are shown on distributions.
10.3 Scatterplot/crosstab
This items opens a dialog is used to produce a scatter diagram. You will be shown
a dialog to choose X and Y variables to be -plotted.
An example scatter diagram is as follows:
The scatter window become active for mouse input and a series of buttons shown
These act as follows:
Missd - missing values (if there are any) are shown on the plot to the left
of Y or below X axis.
X->Y interchanges X and Y axes
+/- Pts - increases or decreases size of plotting positions
Z - allows a third variable or attribute to be displayed, as described more
fully below.
Dense - allows the intensity or density of data to be visualised. Plotting positions
that overlay existing plotting positions gradually increase in tonal intensity.
Here is an example:
JittX, JittY - jitters the points in either the X or Y direction.
Box - if one of the X or Y variables is detected as being discrete, the other continuous,
a box plot is produced, like this for the above data:
If data are grouped, the size of the plotting position size is scaled according to the
size of groups, i.e. such that area of square is proportional to
FREQ (see 8.3).
Also, if both variables are seen as discrete, rectangles are drawn scaled according to the
number of observations. For
example:
In this case a 2-way cross-tabulation is also written to the output log with associated
chi-square statistic and P-value (not continuity corrected).
10.3.1 Z: Adding a third variable. Rotating 3D plots
A third variable can be included in the scatter function in two ways: first by distinguishing
plotting points according to presence or not of an attribute. The attribute values
are distinguished by colour: red or black. Second the third variable can be
included as a third axis in a 3D rotating plot. If you click the Z button
on a 2D scatter diagram a "Third variable Z" dialog appears. For a rotating plot
choose a "Z variable for rotating plot" from those listed. Buttons on the scatter
window will then change to allow animation of the 3D plot. Here is an example, with
density shading:
If all three X,Y,Z variables are detected to be discrete, a cube proportional
in volume to number of observations is produced:
10.4 Transform
Transform allows you to create new variables by some simple operations:
You need to highlight Y1 and Y2 and how they are to be combined. Also the name of
the new variable must be entered. For log and square root transformations, you only
need highlight Y1. If either
Y1 or
Y2 is missing (negative), the result is assumed
to be missing too.
It is possible that the difference Y1-Y2
and the log operations will produce negative
values, which SPAN would normally interpret as missing. To avoid this a constant
will be added. A message informing of this, and the size of the constant, will be
issued.
If the denominator in the division operation
Y1/Y2 is zero, the result is assumed
to be missing. Similarly for
log(Y1) when Y1 is zero.
Y1 o Y2 is a transformation of binary
Y1 and
Y2 that assigns values (a,b,c)
as follows (0,0)=a,
(1,0)=b (1,1)=c where a, b , c
values are user entered in right hand panel.
perm Y1 creates a random re-ordering (permutation) of the values of Y1.
The new variable is statistically independent of all other variables.
For each transformation new variable is created. There is
no mechanism to save the derived variables.
10.4.1 Creating attributes from binary transformations
The transformation Y1=a,b.. forms a binary variable from Y1 that has value 1
if Y1 takes any of the values a,b...Also a<Y1<=b forms
a binary variable which has value 1 on the range a to b. For both these transformations
a corresponding attribute can, optionally, also be formed by response to a question
dialog which automatically appears. Attributes formed this way are "primitive" in sense that the variable from
which they are formed is not remembered when it comes to manipulating Boolean expressions.
10.5 Select Y
This option lets you pick new Y variable(s). A list of variables will appear and
variables can be picked from it.
You can select more than one variable,
so allowing a multivariate outcome measure
(see 11.5). In this case the correlation matrix of the selected variables is written
to the output log. This feature is still
experimental; in most cases it should not be used.
10.6 By-group/cross-validate
Separate analyses in different groups can be achieved with this option and it effectively
allows cross-validation A By analysis dialog box will appear:
By-groups are defined in two possible ways: by value and by attribute,
as indicated
by radio buttons.
By value
In this option by-groups are defined in terms of distinct values of a selected variable
of the data
set. There will be as many groups as there are distinct values of the
variable. A variable needs to be picked from the displayed menu. If there are more
than 20 distinct values no by group analysis is allowed. With this option by-groups
are mutually exclusive.
By attribute
In this option by-groups are defined according attributes formed from a selected
variable. For example, suppose a control file line for a variable X
is X i >20 >40. SPAN creates
attributes X >20 and
X>40 and a by-group is identified
for each of these two attribute.
As in this example,
the by-groups formed from attributes are not necessarily mutually exclusive.
When you choose
by attribute and there is more than one line
in the control file defining attributes for the selected variable a Attribute groups
box appears. This gives the lines of the control file that specify attributes for
the variable. You need to choose a line to determine the by-groups.
Cross-validation
The by-group facility allows cross-validation to be accomplished. When a Search:
Go is done with by-group selected, a separate search is done for each by-group.
The optimal partition found on each search is then immediately applied to the data
not in search by-group, to establish misclassification error on the data not used
to generate the partition. These misclassification rates are then averaged.
[Back to table of contents]