ploticus: proc categories

Search

Ploticus > Scripts >

proc categories

proc categories defines a category set to be associated with the X or Y axis, for handling categorical data, typically for bar positioning, etc. This proc also controls attributes related to category processing.
Categories can be defined using this proc before invoking proc areadef. In older Ploticus versions categories were defined within proc areadef using an old syntax described below which will continue to be supported. However, additional capabilities and higher capacities are available via proc categories.
Proc categories also assumes the functionality of the old proc catslide (see the slideamount attribute).

Category sets
The category scaletype allows positioning of data points using categorical bins rather than a continuous scale, often useful in positioning bars, rangebars, etc.
Category names are alphanumeric labels, and are generally short (less than 40 chars long). Embedded whitespace in a category name is allowed.
Categories are often used as the basis for an axis, and when this is done the category name can be given as a locvalue to position labels, etc (for example in proc annotate's location parameter). When category names are used this way, and the category names contain embedded spaces, use underscores instead of spaces.
During plotting, data are categorized by comparing a given data field with each defined category label until a match is found, then the point is plotted at that location. If no match is found nothing is plotted, and an error is issued if the -showbad command line option is in effect.
One category set may be defined for the X axis, and one for the Y axis. Category sets and associated attributes are independent of individual plotting areas (thus categories may be defined one time and then used in several different plotting areas). Category sets are also completely independent from input data sets (thus categories may be defined from one set of data, then still be in effect after different data are read in).
Category sets may be taken from a data field or specified explicitly. Category labels should always be unique within an axis, and are normally displayed in the same order as specified.
The default maximum number of categories is 250 in X and 250 in Y. These limits can be raised using the listsize attribute.

Example
See the boxplot1 gallery example.

Attributes
Some attributes need to be specified in a certain order, unlike most other ploticus procs. The axis attribute must be specified before any other attribute. Also, #clone is not supported.
axis x | y
Which axis the category set is associated with. This attribute must be the first one specified.
Example: axis: x

datafield dfield
Specify a data field to get category labels from.
Example: datafield: measnum
Example: datafield: 2

categories multi-line text
List of category labels, one per line. Terminated with a blank line. Example:
categories:
    red
    blue
    orange
select select expression
Allows data rows to be selected for inclusion as categories using a selection expression. This only has an effect when used with datafield, and it must be specified before datafield.
Example: select: @4 != null

extracategory text
Allows an extra category to be added explicitly. For example, this attribute might be useful when categories are being set by a data field and it is desired to have an additional "Total" category.

This position of this attribute relative to others is important. If specified before the category set is defined, the extra category will be added to the beginning of the category list and it will appear near the axis min. If specified after, the extra category will be appended to the category list and appear near the axis max.

This attribute may be specified as many times as necessary, with each adding an additional category.
Example: extracategory: Total

checkuniq yes|no
Default is yes. The only situation where one might set this to no is with data sets where each category tag is guaranteed to appear once and only once.. to get a tiny gain in efficiency-- because incoming category tags won't be checked against the list of known tags. Since the max # of categories is a few hundred this doesn't amount to much savings anyway.

comparemethod exact | beginslike | length=n
When data points are being plotted using category scaletype, a given data field is compared against each defined category label until a match is found, then the data is plotted at that location. This attribute controls the method used for matching. Default is exact. To compare for only the length of the data field, use beginslike. To compare for a specific length, use length=n, where n is the number of characters.

roundrobin yes | no
Default is yes. Normally a round-robin style lookup algorithm is used, which is most efficient when the category labels are encountered in the same order as defined. In practice this is most often the case. However, this attribute can be set to no which will cause the lookup to be sequential starting each time at the begining of the list. This might perform better in certain situations.
Example: roundrobin: no

slideamount n
Adjust category locations by a small amount. For categories in X this attribute shifts the location of all categories rightward when given a positive slideamount value, and leftward for negative values. For categories in Y this attribute shifts categories downward when given positive slideamount values, and upward for negative values (which may be contrary to what you'd expect).

This attribute is often used to set up category "bins". For example in X, the first category is located at X=1, the second at X=2, and so on by default. For certain data displays it's nice to have the first category located at X=0.5, the second at X=1.5, and so on, so that the first category is immediately adjacent to the origin. slideamount allows you to do this.

Another common use is to display pairs or clusters. slideamount can be used to shift a bit to one side to do the first member of a pair, then shift back the other way to do the second member.
Note: when areadef sets up the plotting area and scaling it cancels any slideamount currently in effect. So slideamount must be specified in a separate #proc categories block after #proc areadef, as shown below. The following will slide the categorical X axis 0.1 scale units to the left:
   #proc categories
    axis: x
    ...
  
   #proc areadef
    ...
  
   #proc categories
    axis: x
    slideamount: -0.1
listsize n
Specify the size of the category list. Default capacity is 250 categories per axis. If you need more categories, you can specify the upper limit here. This attribute may be specified only one time per script, and must be given before any categories are defined for the axis. Example:
proc categories
  axis: x
  listsize: 1000
  datafield: 2
Old syntax for setting up categories
Here is a summary of the old syntax used within proc areadef to specify categories. This syntax will continue to be supported, but new work should use proc categories (above).

xcategories datafield=dfield [selectrows=conditional expression]
..OR..
xcategories multi-line text
Defines a set of categories for use on the X axis. To take categories from a data field, use the construct datafield=dfield where dfield is a data field specification. Or, category names may be listed explicitly one per line, terminating with a blank line. An optional select expression may be supplied if taking categories from data field, to use selected data rows only (new in 2.03.. see example 2 below).
Example 1:   xcategories: datafield=1

Example 2:   xcategories: datafield=1  selectrows=@3 like S*

Example 3:   xcategories: Red
   			  Blue
   			  Green
ycategories datafield=dfield [selectrows=conditional expression]
..OR..
ycategories multi-line text
Specify categories for use in Y, one per line. Same syntax as xcategories above. Default orientation of categories along Y is from top to bottom.

xextracategory text
Allows an extra X axis category to be added explicitly. For example, this attribute might be useful when categories are being set by a data field and it is desired to have an additional "Total" category. Unlike most other ploticus attributes, its behavior is position-dependent, and it may be specified more than once. If specified before (above) xcategories in the proc areadef attributes, the extra category will be added to the beginning of the category list and it will appear near the X axis min. If specified after, the extra category will be appended to the category list and appear near the X max. This attribute may be specified one or more times, with each adding a category.
Example: 	xextracategory: Total
		xextracategory: Weekly average
yextracategory text
Same as xextracategory above, but for the Y axis.

catcompmethod beginswith | exact | length=N
Control the details of how category comparisons are done. The default is beginswith for backward compatibility; exact is highly recommeded for new work. In all cases, the comparisons are case-insensitive, and work from the beginning of the categories list to the end, stopping when a match is found.

beginswith = the comparison is successful if the data item matches the category name but only to the length of the data item.

exact = the comparison is successful if the data item exactly matches the category name.

length=N = the comparison is successful if the first N characters of the data item match the first N characters of the category name.

Old syntax for proc catslide
Here's an example of the old syntax for proc catslide, which has been superseded by the slideamount attribute:
#proc catslide
  axis: x
  amount: -0.1






 

Ploticus 2.42 ... May 2013