Search
Ploticus >
Scripts >
Proc scatterplot displays data points in one or two dimensions using the
current data set
and
current plotting area.
It can produce traditional scatterplots and distributions and also can be used as a general
technique for rendering data points or text at specific locations.
Data points can be rendered
as symbols, line segments, or bits of text. Data point color, shape, size,
and/or text content can be driven by data.
Duplicate data points can be clustered
in a variety of ways, or duplicity can be represented by color change.
(Unadjusted duplicate data points can appear as just one point, which may be misleading.)
Clickmap and mouseover text labels
are supported for data points.
See the
gallery scatterplot examples
and
heatmap examples.
Attributes
For a 2-D scatterplot both xfield and yfield must
be specified. For conventional scatterplots you'll probably also
want to specify a particular symbol.
Data point position
xfield
dfield
Contents of this field controls the X location of data points.
First field is 1.
Example: xfield: 2
yfield
dfield
Contents of this field controls the Y location of data points.
First field is 1.
Example: yfield: 1
xlocation
locvalue
If specified, a 1-D distribution will be rendered, with data points to be distributed
(before any clustering) vertically at X = locvalue.
yfield can be used for the Y component.
ylocation
locvalue
If specified, a 1-D distribution will be rendered, with data points to be distributed
(before any clustering) horizontally at Y = locvalue.
xfield can be used for the X component.
Displaying data points using symbols
symbol
symboldetails
If specified, a geometric point symbol will mark data points.
This specifies the attributes of the symbols to be used.
Example: symbol: style=fill shape=circle fillcolor=red
rectangle
width height
[outline]
If specified, data points will be displayed using a rectangle centered around the data point
of width data units wide and height data units high.
If outline is specified, the rectangles will be outlined with a line
(controllable using linedetails). The color of the rectangle can be
controlled via a datafield (see symfield, symrangefield, and
dupsleg below).
Example: rectangle 1 1 outline
Example: rectangle 0.9 0.9
See also the "Data-driven control" options for controlling the appearance of data points, below.
Displaying data points using line segments
linelen
n
If specified, data points will be displayed as short line segments.
The lines segments will be of length n in
absolute units.
The default direction of the line will
be appropriate for 1-D scatterplots; for 2-D it is horizontal.
Line color, etc. may be controlled using linedetails.
Line length may also be influenced using sizefield.
Line direction may be explicitly controlled using linedir.
Example: linelen: 0.2
linedir
h|v|u|r
Allows explicit control of direction of line when displaying data points
as line segments (linelen).
h = horizontal (centered);
v = vertical (centered);
u = upward;
r = rightward.
Example: linedir: v
linedetails
linedetails
If points are displayed using line segments (linelen), this
attribute allows control of color, line width, etc. Also can
be used to control outline when rectangle is used.
Displaying data points using text
text
text
If specified, data points will be displayed using the
given text, centered around the data point.
This attribute may be used with or without a symbol.
Example: text: A
labelfield
dfield
If specified, data points will be displayed using the text in data
field dfield. The text will be centered around the data point.
May not be used with symbol; in order to do datafield-driven label
plus a symbol proc scatterplot must be invoked twice.
Example: labelfield: 4
labelword
string
A template for displaying the values rendered by labelfield.
The value will be substituted in at the token @@VAL.
Example:
labelfield: 2
labelword: N=@@VAL
textdetails
textdetails
Details concerning the rendering of data point text or data point labels.
Example: textdetails: size=6
verticaltext
yes | no
If yes, label text will be rendered vertically.
Data-driven color / shape / size of data points
sizefield
dfield
If specified, the size of data point symbols, lines, or text are controlled by
this data field, effectively allowing another variable to be presented.
For symbols or text the value in dfield will be taken to be a character point size
(see also sizescale).
For line segments, the value in dfield
will scale the length of the lines, ie. a data value
of 2.0 doubles it and 0.5 halves it.
sizescale
n
May be used with sizefield when the size of data point symbols or text is
being controlled by a datafield. This attribute may be used
to scale the size of the point symbols to the desired range.
Scaling is based on symbol area rather than diameter.
A value of 2.0 doubles the resulting size; 0.5 halves it.
colorfield
dfield
If specified, the color of data point symbols, lines, text, or rectangles is controlled by this data field,
effectively allowing another variable to be presented. The data field should contain
color specifications.
(New in 2.41)
altwhen
select expression
altsymbol
symboldetails
In a scatterplot that is displaying data points as symbols (not lines or text),
you can use this easy method to conditionally display an alternate data point symbol.
New in version 2.40
symfield
dfield
If specified, the data point color, size, shape, etc. can be driven by specific data values
in this field.
This attribute uses the
legend-driven technique
(the legend structure is used to map data values to symbol appearance specifications).
If rendering symbols, symbol attributes should be given in the legendentries details;
if rendering rectangles, colors should be given.
Example:
symfld
symrangefield
dfield
If specified, the symbol color, size, shape, etc. can be driven by numeric data
in this field.
Similar to symfield above, except that numeric range comparison is used
when finding the appropriate legend entry, using the
legend-driven technique
(the legend structure is used to map data values to symbol appearance specifications).
Legend tags must be a single numeric value.
Legend entries must be specified in numerical order by tag, from highest to lowest.
Prospective values will be compared against legend entries in the order specified (highest to lowest);
when a legend entry tag is found that is less than or equal to the contents of
the symrangefield data field,
that legend entry is chosen, and the point will be rendered using the symbol
described in that entry.
Examples:
symrangefld
and
heatmap3
Displaying duplicity by clustering of data points
cluster
yes | no
If yes, data will be sorted on X,Y and duplicate (or near-duplicate) data points
will be detected and offset slightly to show duplicity.
The default is no (changed in 2.33).
2-D clusters may be as large as N=38 (after this, points will overlap).
Additional attributes related to clustering are described below.
Note: If labelfield and/or sizefield are being used, clustering
will work properly only when data are presorted into X,Y order.
clustermethod
2d | horiz | vert | upward | rightward
Explicitly control the way that duplicate points will be clustered.
2d clusters the points evenly around the data point.
horiz clusters the points evenly leftward and rightward.
vert clusters the points evenly upward and downward.
upward strings the points upward only for generating little vertical bars.
rightward strings the points rightward only for generating little horizontal bars.
Default is 2d for 2-D scatterplots, or horiz or vert for 1-D
scatterplots depending on orientation.
An example of using clustermethod: upward to form rows of little bars is
snpmap1
To represent duplicate points using different symbol colors (etc.) see dupsleg.
clusterfact
f
May be used when clustering is being done. The clustering offset distance
will be multiplied by f.
A value of 2.0 spreads clustered points out more, and 0.5 spreads them out less.
clusterdiff
f
May be used when clustering is being done. Two values
that are within f
absolute units (inches)
of each other will be considered duplicates
eligible for clustering. Default value is 0.001.
clustevery
n
With clustering, normally every duplicate point is offset from all
the others, which may become cluttered and ineffective with large numbers of duplicates.
This attribute may be used to offset
only for every nth duplicate encountered.
Example: clustevery: 5 ..would result in a point having 35 duplicates
represented using 7 point marks.
Displaying duplicity by data point color
dupsleg
yes | no
If yes, the appearance details of data points will be controlled by
the number of duplicate points counted.
This attribute can be used when rendering data points as symbols or rectangles, and it uses the
legend-driven technique
(the legend structure is used to map data values to appearance specifications).
Each legend entry must have a tag that is an integer.
If you're rendering symbols, supply
symboldetails
for the legendentry details; supply a
color
if rendering rectangles.
Legend entries must be specified in numerical order by tag, from highest to lowest.
As the scatterplot is drawn and duplicate points are detected,
a count of duplicates is maintained.
Then the count is compared against the set of tags (from highest to lowest).
When a tag is found that is <= the duplicate count, that
legend entry is chosen, and the point will be rendered using the symbol
described in that entry.
Example:
dupsleg
Selecting certain data points
select
select expression
May be used to select data rows for inclusion into the scatterplot.
Example: select: @@3 = AA
xrange
low high
If specified, only data points within the given plottable range in X
will be shown. By default the points will be drawn only if within
the plotting area.
Example: xrange: 0 50
yrange
low high
If specified, only data points within the given plottable range in Y
will be shown. By default the points will be drawn only if within
the plotting area.
Example: yrange: 0 50
Legend
legendlabel
text
A label to be associated with the current set of points in the legend.
proc legend must be executed later in order to
render the legend. @NVALUES may be used to signify number of
points rendered.
The \\n construct can be used to force a line break when the legend is displayed,
or the label can be wordwrapped using proc legend wraplen attribute (2.32+).
If
proc getdata field names
are being used, use of
the special symbols #usexname (or #useyname) causes the field name of xfield (or yfield)
to be automatically used as the legend label (2.04+).
Example: legendlabel: Group 4, N=@NVALUES
Example: legendlabel: Round 2
Example: legendlabel: #useyname
Clickmap and mouseover
Note: clickmap is not supported when data points are displayed using line segments.
clickmapurl
url-template
If generating an
HTML clickmap
, this specifies a url template, and
causes the data points (symbol or character) to be mapped.
This attribute usually contains one or more embedded
data field references
preceded by double at-sign (@@).
See
HTML clickmap
for more details and examples.
Example: clickmapurl: http://abc.com/mycgi?category=@@3
clickmaplabel
label-template
If generating a
client-side clickmap,
this specifies a template for building mouseover text labels.
Example: clickmaplabel: @@3 (@@4)
clickmaplabeltext
multiline text
Same as clickmaplabel but multiline text. Must be terminated with a blank line.
Variables that are set by proc scatterplot
NVALUES = the number of in-range plottable points that were rendered.
Note: this may be used in the legendlabel.
MAXDUPS = the maximum number of clustered duplicate points. If clustermeth
is 2d this maxes out at 37.
|