Search
Ploticus >
Scripts >
proc categories
defines a category set to be associated with the X or Y axis, for
handling categorical data, typically for bar positioning, etc.
This proc also controls attributes related to category processing.
Categories can be defined using this proc before invoking proc areadef.
In older Ploticus versions categories were defined within proc areadef using an
old syntax described below
which will continue to be supported.
However, additional capabilities and higher capacities are available via
proc categories.
Proc categories also assumes the functionality of the old proc catslide
(see the slideamount attribute).
Category sets
The category scaletype allows
positioning of data points using categorical bins rather than a continuous scale,
often useful in positioning bars, rangebars, etc.
Category names are alphanumeric labels, and are generally short (less than 40 chars long).
Embedded whitespace in a category name is allowed.
Categories are often used as the basis for an axis, and when this is done the category name can be given as a
locvalue
to position labels, etc (for example in proc annotate's location parameter).
When category names are used this way, and the category names contain embedded spaces, use underscores instead of spaces.
During plotting, data are categorized by comparing a given data field with each
defined category label until a match is found, then the point is plotted at that
location.
If no match is found nothing is plotted, and an error is issued if the -showbad
command line option is in effect.
One category set may be defined for the X axis, and one for the Y axis.
Category sets and associated attributes are independent of individual
plotting areas (thus categories may be defined one time and then used in
several different plotting areas). Category sets are also completely independent
from input data sets (thus categories may be defined from one set of data,
then still be in effect after different data are read in).
Category sets may be taken from a data field or specified explicitly.
Category labels should always be unique within an axis, and are normally
displayed in the same order as specified.
The default maximum number of categories is 250 in X and 250 in Y.
These limits can be raised using the listsize attribute.
Example
See the
boxplot1
gallery example.
Attributes
Some attributes need to be specified in a certain order, unlike most other ploticus procs.
The axis attribute must be specified before any other attribute.
Also, #clone is not supported.
axis
x | y
Which axis the category set is associated with.
This attribute must be the first one specified.
Example: axis: x
datafield
dfield
Specify a data field to get category labels from.
Example: datafield: measnum
Example: datafield: 2
categories
multi-line text
List of category labels, one per line. Terminated with a blank line.
Example:
categories:
red
blue
orange
select
select expression
Allows data rows to be selected for inclusion as categories using a selection expression.
This only has an effect when used with datafield, and it must be specified before datafield.
Example: select: @4 != null
extracategory
text
Allows an extra category to be added explicitly. For example, this attribute might
be useful when categories are being set by a data field and it is desired to have an additional
"Total" category.
This position of this attribute relative to others is important.
If specified before the category set is defined,
the extra category will be added to the beginning of the category list and it will appear
near the axis min. If specified after, the extra category will be
appended to the category list and appear near the axis max.
This attribute may be specified as many times as necessary, with each adding an additional category.
Example: extracategory: Total
checkuniq
yes|no
Default is yes.
The only situation where one might set this to no is with
data sets where each category tag is guaranteed to appear once and only once.. to get a tiny gain in efficiency--
because incoming category tags won't be checked against the list of known tags.
Since the max # of categories is a
few hundred this doesn't amount to much savings anyway.
comparemethod
exact | beginslike | length=n
When data points are being plotted using category scaletype, a given data
field is compared against each defined category label until a match is found,
then the data is plotted at that location.
This attribute controls the method used for matching.
Default is exact.
To compare for only the length of the data field, use beginslike.
To compare for a specific length, use length=n, where n is
the number of characters.
roundrobin
yes | no
Default is yes. Normally a round-robin style lookup algorithm is used,
which is most efficient when the category labels are encountered in the same
order as defined. In practice this is most often the case.
However, this attribute can be set to no which will cause the lookup to
be sequential starting each time at the begining of the list.
This might perform better in certain situations.
Example: roundrobin: no
slideamount
n
Adjust category locations by a small amount. For categories in X
this attribute shifts the location of all categories rightward when given a positive slideamount value,
and leftward for negative values. For categories in Y this attribute shifts categories
downward when given positive slideamount values, and upward for negative values (which may be contrary to
what you'd expect).
This attribute is often used to set up category "bins". For example in X,
the first category is located at X=1, the second at X=2, and so on by default. For certain data displays
it's nice to have the first category located at X=0.5, the second at X=1.5, and so on, so that the first
category is immediately adjacent to the origin. slideamount allows you to do this.
Another common use is to display pairs or clusters. slideamount can be used to shift a bit to one side
to do the first member of a pair, then shift back the other way to do the second member.
Note: when areadef sets up the plotting area and scaling it cancels any
slideamount currently in effect. So slideamount must be specified in a separate
#proc categories block after #proc areadef, as shown below.
The following will slide the categorical X axis 0.1 scale units to the left:
#proc categories
axis: x
...
#proc areadef
...
#proc categories
axis: x
slideamount: -0.1
listsize
n
Specify the size of the category list. Default capacity is 250 categories per axis.
If you need more categories, you can specify the upper limit here.
This attribute may be specified only one time per script, and must be given before
any categories are defined for the axis.
Example:
proc categories
axis: x
listsize: 1000
datafield: 2
Old syntax for setting up categories
Here is a summary of the old syntax used within
proc areadef
to specify categories. This syntax will continue to be supported, but new work
should use proc categories (above).
xcategories datafield=dfield [selectrows=conditional expression]
..OR..
xcategories
multi-line text
Defines a set of categories for use on the X axis.
To take categories from a data field, use the construct
datafield=dfield
where dfield is a
data field specification.
Or, category names may be listed explicitly one per line, terminating with a blank line.
An optional
select expression
may be supplied if taking categories from data field,
to use selected data rows only (new in 2.03.. see example 2 below).
Example 1: xcategories: datafield=1
Example 2: xcategories: datafield=1 selectrows=@3 like S*
Example 3: xcategories: Red
Blue
Green
ycategories datafield=dfield [selectrows=conditional expression]
..OR..
ycategories
multi-line text
Specify categories for use in Y, one per line.
Same syntax as xcategories above.
Default orientation of categories along Y is from top to bottom.
xextracategory
text
Allows an extra X axis category to be added explicitly. For example, this attribute might
be useful when categories are
being set by a data field and it is desired to have an additional "Total" category.
Unlike most other ploticus attributes, its behavior is position-dependent,
and it may be specified more than once.
If specified before (above) xcategories in the proc areadef attributes,
the extra category will be added to the beginning of the category list and it will appear
near the X axis min. If specified after, the extra category will be
appended to the category list and appear near the X max.
This attribute may be specified one or more times, with each
adding a category.
Example: xextracategory: Total
xextracategory: Weekly average
yextracategory
text
Same as xextracategory above, but for the Y axis.
catcompmethod beginswith | exact | length=N
Control the details of how category comparisons are done.
The default is beginswith for backward compatibility; exact
is highly recommeded for new work.
In all cases, the comparisons are case-insensitive, and work from the beginning of the
categories list to the end, stopping when a match is found.
beginswith = the comparison is successful if the data item matches
the category name but only to the length of the data item.
exact = the comparison is successful if the data item exactly
matches the category name.
length=N = the comparison is successful if the first N characters
of the data item match the first N characters of the category name.
Old syntax for proc catslide
Here's an example of the old syntax for proc catslide, which has been superseded
by the slideamount attribute:
#proc catslide
axis: x
amount: -0.1