ploticus: proc scatterplot

Search

Ploticus > Scripts >

proc scatterplot

Proc scatterplot displays data points in one or two dimensions using the current data set and current plotting area. It can produce traditional scatterplots and distributions and also can be used as a general technique for rendering data points or text at specific locations. Data points can be rendered as symbols, line segments, or bits of text. Data point color, shape, size, and/or text content can be driven by data. Duplicate data points can be clustered in a variety of ways, or duplicity can be represented by color change. (Unadjusted duplicate data points can appear as just one point, which may be misleading.) Clickmap and mouseover text labels are supported for data points. See the gallery scatterplot examples and heatmap examples.

Attributes
For a 2-D scatterplot both xfield and yfield must be specified. For conventional scatterplots you'll probably also want to specify a particular symbol.

Data point position

xfield dfield
Contents of this field controls the X location of data points. First field is 1. Example: xfield: 2

yfield dfield
Contents of this field controls the Y location of data points. First field is 1. Example: yfield: 1

xlocation locvalue
If specified, a 1-D distribution will be rendered, with data points to be distributed (before any clustering) vertically at X = locvalue. yfield can be used for the Y component.

ylocation locvalue
If specified, a 1-D distribution will be rendered, with data points to be distributed (before any clustering) horizontally at Y = locvalue. xfield can be used for the X component.

Displaying data points using symbols

symbol symboldetails
If specified, a geometric point symbol will mark data points. This specifies the attributes of the symbols to be used.
Example: symbol: style=fill shape=circle fillcolor=red

rectangle width height [outline]
If specified, data points will be displayed using a rectangle centered around the data point of width data units wide and height data units high. If outline is specified, the rectangles will be outlined with a line (controllable using linedetails). The color of the rectangle can be controlled via a datafield (see symfield, symrangefield, and dupsleg below).
Example: rectangle 1 1 outline
Example: rectangle 0.9 0.9

See also the "Data-driven control" options for controlling the appearance of data points, below.

Displaying data points using line segments

linelen n
If specified, data points will be displayed as short line segments. The lines segments will be of length n in absolute units. The default direction of the line will be appropriate for 1-D scatterplots; for 2-D it is horizontal. Line color, etc. may be controlled using linedetails. Line length may also be influenced using sizefield. Line direction may be explicitly controlled using linedir. Example: linelen: 0.2

linedir h|v|u|r
Allows explicit control of direction of line when displaying data points as line segments (linelen). h = horizontal (centered); v = vertical (centered); u = upward; r = rightward. Example: linedir: v

linedetails linedetails
If points are displayed using line segments (linelen), this attribute allows control of color, line width, etc. Also can be used to control outline when rectangle is used.

Displaying data points using text

text text
If specified, data points will be displayed using the given text, centered around the data point. This attribute may be used with or without a symbol. Example: text: A

labelfield dfield
If specified, data points will be displayed using the text in data field dfield. The text will be centered around the data point. May not be used with symbol; in order to do datafield-driven label plus a symbol proc scatterplot must be invoked twice.
Example: labelfield: 4

labelword string
A template for displaying the values rendered by labelfield. The value will be substituted in at the token @@VAL. Example:
labelfield: 2
labelword: N=@@VAL
textdetails textdetails
Details concerning the rendering of data point text or data point labels.
Example: textdetails: size=6

verticaltext yes | no
If yes, label text will be rendered vertically.

Data-driven color / shape / size of data points

sizefield dfield
If specified, the size of data point symbols, lines, or text are controlled by this data field, effectively allowing another variable to be presented. For symbols or text the value in dfield will be taken to be a character point size (see also sizescale). For line segments, the value in dfield will scale the length of the lines, ie. a data value of 2.0 doubles it and 0.5 halves it.

sizescale n
May be used with sizefield when the size of data point symbols or text is being controlled by a datafield. This attribute may be used to scale the size of the point symbols to the desired range. Scaling is based on symbol area rather than diameter. A value of 2.0 doubles the resulting size; 0.5 halves it.

colorfield dfield
If specified, the color of data point symbols, lines, text, or rectangles is controlled by this data field, effectively allowing another variable to be presented. The data field should contain color specifications. (New in 2.41)

altwhen select expression
altsymbol symboldetails
In a scatterplot that is displaying data points as symbols (not lines or text), you can use this easy method to conditionally display an alternate data point symbol. New in version 2.40

symfield dfield
If specified, the data point color, size, shape, etc. can be driven by specific data values in this field. This attribute uses the legend-driven technique (the legend structure is used to map data values to symbol appearance specifications). If rendering symbols, symbol attributes should be given in the legendentries details; if rendering rectangles, colors should be given.
Example: symfld

symrangefield dfield
If specified, the symbol color, size, shape, etc. can be driven by numeric data in this field. Similar to symfield above, except that numeric range comparison is used when finding the appropriate legend entry, using the legend-driven technique (the legend structure is used to map data values to symbol appearance specifications). Legend tags must be a single numeric value. Legend entries must be specified in numerical order by tag, from highest to lowest. Prospective values will be compared against legend entries in the order specified (highest to lowest); when a legend entry tag is found that is less than or equal to the contents of the symrangefield data field, that legend entry is chosen, and the point will be rendered using the symbol described in that entry. Examples: symrangefld and heatmap3

Displaying duplicity by clustering of data points

cluster yes | no
If yes, data will be sorted on X,Y and duplicate (or near-duplicate) data points will be detected and offset slightly to show duplicity. The default is no (changed in 2.33). 2-D clusters may be as large as N=38 (after this, points will overlap). Additional attributes related to clustering are described below. Note: If labelfield and/or sizefield are being used, clustering will work properly only when data are presorted into X,Y order.

clustermethod 2d | horiz | vert | upward | rightward
Explicitly control the way that duplicate points will be clustered.
2d clusters the points evenly around the data point.
horiz clusters the points evenly leftward and rightward.
vert clusters the points evenly upward and downward.
upward strings the points upward only for generating little vertical bars.
rightward strings the points rightward only for generating little horizontal bars.
Default is 2d for 2-D scatterplots, or horiz or vert for 1-D scatterplots depending on orientation. An example of using clustermethod: upward to form rows of little bars is snpmap1

To represent duplicate points using different symbol colors (etc.) see dupsleg.

clusterfact f
May be used when clustering is being done. The clustering offset distance will be multiplied by f. A value of 2.0 spreads clustered points out more, and 0.5 spreads them out less.

clusterdiff f
May be used when clustering is being done. Two values that are within f absolute units (inches) of each other will be considered duplicates eligible for clustering. Default value is 0.001.

clustevery n
With clustering, normally every duplicate point is offset from all the others, which may become cluttered and ineffective with large numbers of duplicates. This attribute may be used to offset only for every nth duplicate encountered.
Example: clustevery: 5 ..would result in a point having 35 duplicates represented using 7 point marks.

Displaying duplicity by data point color

dupsleg yes | no
If yes, the appearance details of data points will be controlled by the number of duplicate points counted. This attribute can be used when rendering data points as symbols or rectangles, and it uses the legend-driven technique (the legend structure is used to map data values to appearance specifications). Each legend entry must have a tag that is an integer. If you're rendering symbols, supply symboldetails for the legendentry details; supply a color if rendering rectangles. Legend entries must be specified in numerical order by tag, from highest to lowest. As the scatterplot is drawn and duplicate points are detected, a count of duplicates is maintained. Then the count is compared against the set of tags (from highest to lowest). When a tag is found that is <= the duplicate count, that legend entry is chosen, and the point will be rendered using the symbol described in that entry. Example: dupsleg

Selecting certain data points

select select expression
May be used to select data rows for inclusion into the scatterplot.
Example: select: @@3 = AA

xrange low high
If specified, only data points within the given plottable range in X will be shown. By default the points will be drawn only if within the plotting area. Example: xrange: 0 50

yrange low high
If specified, only data points within the given plottable range in Y will be shown. By default the points will be drawn only if within the plotting area. Example: yrange: 0 50

Legend

legendlabel text
A label to be associated with the current set of points in the legend. proc legend must be executed later in order to render the legend. @NVALUES may be used to signify number of points rendered. The \\n construct can be used to force a line break when the legend is displayed, or the label can be wordwrapped using proc legend wraplen attribute (2.32+). If proc getdata field names are being used, use of the special symbols #usexname (or #useyname) causes the field name of xfield (or yfield) to be automatically used as the legend label (2.04+).
Example: legendlabel: Group 4, N=@NVALUES
Example: legendlabel: Round 2
Example: legendlabel: #useyname

Clickmap and mouseover

Note: clickmap is not supported when data points are displayed using line segments.
clickmapurl url-template
If generating an HTML clickmap , this specifies a url template, and causes the data points (symbol or character) to be mapped. This attribute usually contains one or more embedded data field references preceded by double at-sign (@@). See HTML clickmap for more details and examples.
Example: clickmapurl: http://abc.com/mycgi?category=@@3

clickmaplabel label-template
If generating a client-side clickmap, this specifies a template for building mouseover text labels.
Example: clickmaplabel: @@3 (@@4)

clickmaplabeltext multiline text
Same as clickmaplabel but multiline text. Must be terminated with a blank line.

Variables that are set by proc scatterplot

NVALUES = the number of in-range plottable points that were rendered. Note: this may be used in the legendlabel.

MAXDUPS = the maximum number of clustered duplicate points. If clustermeth is 2d this maxes out at 37.

Ploticus 2.42 ... May 2013