Search


Ploticus > Scripts >
proc categories


proc categories defines a category set to be associated with the X or Y axis, for handling categorical data, typically for bar positioning, etc. This proc also controls attributes related to category processing.

Categories can be defined using this proc before invoking proc areadef. In older Ploticus versions categories were defined within proc areadef using an old syntax described below which will continue to be supported. However, additional capabilities and higher capacities are available via proc categories.

Proc categories also assumes the functionality of the old proc catslide (see the slideamount attribute).




Category sets

The category scaletype allows positioning of data points using categorical bins rather than a continuous scale, often useful in positioning bars, rangebars, etc.

Category names are alphanumeric labels, and are generally short (less than 40 chars long). Embedded whitespace in a category name is allowed.

Categories are often used as the basis for an axis, and when this is done the category name can be given as a locvalue to position labels, etc (for example in proc annotate's location parameter). When category names are used this way, and the category names contain embedded spaces, use underscores instead of spaces.

During plotting, data are categorized by comparing a given data field with each defined category label until a match is found, then the point is plotted at that location. If no match is found nothing is plotted, and an error is issued if the -showbad command line option is in effect.

One category set may be defined for the X axis, and one for the Y axis. Category sets and associated attributes are independent of individual plotting areas (thus categories may be defined one time and then used in several different plotting areas). Category sets are also completely independent from input data sets (thus categories may be defined from one set of data, then still be in effect after different data are read in).

Category sets may be taken from a data field or specified explicitly. Category labels should always be unique within an axis, and are normally displayed in the same order as specified.

The default maximum number of categories is 250 in X and 250 in Y. These limits can be raised using the listsize attribute.




Example

See the boxplot1 gallery example.




Attributes

Some attributes need to be specified in a certain order, unlike most other ploticus procs. The axis attribute must be specified before any other attribute. Also, #clone is not supported.

axis     x | y

    Which axis the category set is associated with. This attribute must be the first one specified.
    Example: axis: x

datafield     dfield

    Specify a data field to get category labels from.
    Example: datafield: measnum
    Example: datafield: 2

categories     multi-line text

    List of category labels, one per line. Terminated with a blank line. Example:
    categories:
        red
        blue
        orange
    


select     select expression

    Allows data rows to be selected for inclusion as categories using a selection expression. This only has an effect when used with datafield, and it must be specified before datafield.
    Example: select: @4 != null

extracategory     text

    Allows an extra category to be added explicitly. For example, this attribute might be useful when categories are being set by a data field and it is desired to have an additional "Total" category.
    This position of this attribute relative to others is important. If specified before the category set is defined, the extra category will be added to the beginning of the category list and it will appear near the axis min. If specified after, the extra category will be appended to the category list and appear near the axis max.
    This attribute may be specified as many times as necessary, with each adding an additional category.
    Example: extracategory: Total

checkuniq     yes|no

    Default is yes. The only situation where one might set this to no is with data sets where each category tag is guaranteed to appear once and only once.. to get a tiny gain in efficiency-- because incoming category tags won't be checked against the list of known tags. Since the max # of categories is a few hundred this doesn't amount to much savings anyway.

comparemethod     exact | beginslike | length=n

    When data points are being plotted using category scaletype, a given data field is compared against each defined category label until a match is found, then the data is plotted at that location. This attribute controls the method used for matching. Default is exact. To compare for only the length of the data field, use beginslike. To compare for a specific length, use length=n, where n is the number of characters.

roundrobin     yes | no

    Default is yes. Normally a round-robin style lookup algorithm is used, which is most efficient when the category labels are encountered in the same order as defined. In practice this is most often the case. However, this attribute can be set to no which will cause the lookup to be sequential starting each time at the begining of the list. This might perform better in certain situations.
    Example: roundrobin: no

slideamount     n

    Adjust category locations by a small amount. For categories in X this attribute shifts the location of all categories rightward when given a positive slideamount value, and leftward for negative values. For categories in Y this attribute shifts categories downward when given positive slideamount values, and upward for negative values (which may be contrary to what you'd expect).
    This attribute is often used to set up category "bins". For example in X, the first category is located at X=1, the second at X=2, and so on by default. For certain data displays it's nice to have the first category located at X=0.5, the second at X=1.5, and so on, so that the first category is immediately adjacent to the origin. slideamount allows you to do this.
    Another common use is to display pairs or clusters. slideamount can be used to shift a bit to one side to do the first member of a pair, then shift back the other way to do the second member.
    Note: when areadef sets up the plotting area and scaling it cancels any slideamount currently in effect. So slideamount must be specified in a separate #proc categories block after #proc areadef, as shown below. The following will slide the categorical X axis 0.1 scale units to the left:
       #proc categories
        axis: x
        ...
      
       #proc areadef
        ...
      
       #proc categories
        axis: x
        slideamount: -0.1
    


listsize     n

    Specify the size of the category list. Default capacity is 250 categories per axis. If you need more categories, you can specify the upper limit here. This attribute may be specified only one time per script, and must be given before any categories are defined for the axis. Example:
    proc categories
      axis: x
      listsize: 1000
      datafield: 2
    





Old syntax for setting up categories

Here is a summary of the old syntax used within proc areadef to specify categories. This syntax will continue to be supported, but new work should use proc categories (above).

xcategories datafield=dfield [selectrows=conditional expression]
..OR..
xcategories multi-line text

    Defines a set of categories for use on the X axis. To take categories from a data field, use the construct datafield=dfield where dfield is a data field specification. Or, category names may be listed explicitly one per line, terminating with a blank line. An optional select expression may be supplied if taking categories from data field, to use selected data rows only (new in 2.03.. see example 2 below).
    Example 1:   xcategories: datafield=1
    
    Example 2:   xcategories: datafield=1  selectrows=@3 like S*
    
    Example 3:   xcategories: Red
       			  Blue
       			  Green
    

ycategories datafield=dfield [selectrows=conditional expression]
..OR..
ycategories multi-line text

    Specify categories for use in Y, one per line. Same syntax as xcategories above. Default orientation of categories along Y is from top to bottom.

xextracategory text

    Allows an extra X axis category to be added explicitly. For example, this attribute might be useful when categories are being set by a data field and it is desired to have an additional "Total" category. Unlike most other ploticus attributes, its behavior is position-dependent, and it may be specified more than once. If specified before (above) xcategories in the proc areadef attributes, the extra category will be added to the beginning of the category list and it will appear near the X axis min. If specified after, the extra category will be appended to the category list and appear near the X max. This attribute may be specified one or more times, with each adding a category.
    Example: 	xextracategory: Total
    		xextracategory: Weekly average
    


yextracategory text

    Same as xextracategory above, but for the Y axis.

catcompmethod beginswith | exact | length=N

    Control the details of how category comparisons are done. The default is beginswith for backward compatibility; exact is highly recommeded for new work. In all cases, the comparisons are case-insensitive, and work from the beginning of the categories list to the end, stopping when a match is found.
    beginswith = the comparison is successful if the data item matches the category name but only to the length of the data item.
    exact = the comparison is successful if the data item exactly matches the category name.
    length=N = the comparison is successful if the first N characters of the data item match the first N characters of the category name.




Old syntax for proc catslide

Here's an example of the old syntax for proc catslide, which has been superseded by the slideamount attribute:
#proc catslide
  axis: x
  amount: -0.1





 


Ploticus 2.42 ... May 2013 Terms of use /