proc boxplot renders boxplots also known as box-and-whisker plots in the
current plotting area.
By default it produces median-based boxplots which use median and interquartile range, but it can also produce
mean-based boxplots using mean and SD.
Recommended method is to use proc getdata to read in raw data, then use proc processdata (action: summaryplus) to
compute the necessary summary statistics, then invoke this proc to render boxplots.
Prior to version 2.40 this proc was called proc rangebar, and it rendered one boxplot (including computation of the
necessary statistics). In version 2.40 proc processdata's summaryplus action was implemented.
Correspondingly this proc's operation changed.. it now renders a series of boxplots and has a tighter coupling with proc processdata.
Some previously available functionality was removed out of necessity
or where deemed obscure, including options for display of outlier data points,
1.5 IQR tails, log transform, setting of variables such as RANGEBARMIN, and output of computed stats.
Rangebars may be vertical (the default) or horizontal (see orientation below);
to make things easier to explain in this document, vertical boxplots are assumed.
median | mean
If median (the default) median-based boxplots are produced (using median, 5th, 25th, 75th, and 95th percentiles).
If mean, then mean-based boxplots are produced (using mean, +/- sd, min, max ).
5/95 | minmax
Used only with median-based boxplots.
Specifies whether the rangebar tails are to extend to the 5th and 95th percentile, to the min and max.
Default is 5/95.
Example: tailmode: minmax
The data field indicating boxplot location in X (or location in Y with orientation: horizontal).
If not given, boxplots will be rendered at X=1, X=2, X=3 .. X=N.
If you're using proc processdata (action: summaryplus) first, this attribute should not be set (it will be automatic).
If you computed statistics externally, you will need to provide a list of the data fields holding your statistics, using
the exact order shown here.
For median-based boxplots specify the fields for #observations, 5thpercentile, 25thpercentile, median, 75thpercentile, 95thpercentile, and
optionally the mean (if you're doing min/max tails then substitute min for 5thpercentile and max for 95thpercentile).
For mean-based boxplots specify the fields for #observations, mean, sd, min, max.
vertical | horizontal
Bar appearance details
The width of the box portion of the rangebar in
Default is 0.12 inches.
Example: barwidth: 0.1
Specifies the color of the box area.
Example: color: yellow
| dot | line
Specifies the symbol that will be displayed to show the median (or the mean with basis: mean).
May be a symbol specification (to get dots, etc.) or line which
is the default. dot or yes gives a small black dot.
Example: mediansym: shape=diamond
yes | no
If yes, box is outlined with a line.
Controls color, linewidth, etc. of tail lines.
Example: taildetails: color=blue width=1.8
Controls color, linewidth, etc. of box outline and tics.
yes | no
If yes, bars are truncated to plotting area.
Default is yes.
, of the tics which appear at the end of the tails.
Default is 70% of the width of the bar.
yes | no
On median-based boxplots using tailmode: minmax, this option allows display of 5th and 95th percentile by adding tics.
Example: 95tics: yes
yes | dot |
With median-based boxplots, this option causes a second data point symbol to be placed at the mean.
The result will be two symbols, one at the median, and another at the mean.
If yes or dot, a default symbol (a small black dot) will be placed at the mean.
Other symbols may be rendered by giving
other symboldetails specifications. If no, a mean symbol will not
be rendered. Default is no.
Example: meansym: shape=circle style=filled fillcolor=black radius=0.04
Selecting data rows
Allows cases to be selected for inclusion using a selection expression.
Example: select: @2 = B
A label to be associated with the current set of bars, in a legend to be rendered later (proc legend).
The \\n construct can be used to force a line break,
or the label can be wordwrapped using proc legend wraplen attribute.
color | symbol
Legend sample can be the boxplot color or the data point symbol used for the median or mean.
Default is color.
yes | no
If yes, a label showing N (the number of observations) is produced. Default is yes.
Example: printn: no
Where to position the N label. The label will be aligned with the rangebar.
For vertical rangebars location indicates where to place the label
in Y; for horizontal, X.
Example: nlocation: -4