Search


Ploticus > Scripts >
proc curvefit


proc curvefit uses the current data set to compute a curve which it then renders in the current plotting area. Available curve types are: moving average, average, linear regression, bspline, and interpolated curves. Typical uses are to clarify overall trends in the data, or for smoothing. (If you just want to draw a line connecting your data points, without any smoothing, use proc lineplot.) The data do not have to be in X order.. they will be sorted on X as part of the process (except with the interpolated curve type). See also the gallery curvefit examples.

Limitations: The maximum number of input points for a bspline curve is 100. The default maxiumum number of input points for all other curve types is 1000.. to raise this limit use the proc curvefit attribute maxinpoints. Generated curve points are placed into the plotting vector; its size can be controlled using command line argument -maxvect.




Attributes

The yfield attribute MUST be specified.

yfield     dfield

    Data field to use for Y values. Example: yfield: 1

xfield     dfield

    Data field to use for X values. If not given, sequential unit locations in X will be used. Example: xfield: 4

curvetype     movingavg | regression | bspline | avg | interpolated

    The type of curve fitting computation to perform.
    movingavg - for each point, it takes the average of the current point and n-1 points to the left (or as many points as are available). n is controlled by the order attribute. Often used in finance.
    regression - Computes the linear regression for the set of points. The result will be a straight line expressing the relationship between X and Y. Often used with scatterplots. The variables REGRESSION_LINE and CORRELATION will be set (see VARIABLES above).
    bspline - draws a curve using the bspline algorithm. The order and resolution attributes control the appearance of the result. May be used to fit a curve to a histogram.
    avg - similar to movingavg except that it also includes n-1 points to the right of the current point (or as many points as are available) in the average. Thus, for a point that is far from either edge, 2n-1 points will be averaged.
    interpolated - a spline interpolation between the given data points, ie. the curve will pass through all input data points (this type is new in version 2.20, code contibuted by Oliver Koch)
    Example: curvetype: movingavg

maxinpoints     n

    Maximum number of input points for curve types other than bspline. Default is 1000. (ver 2.30+)




Details of curve appearance

order     n

    For bspline curves, this is a value between 2 and 20; a lower value yields a more jagged curve, while a higher value gives a smoother curve. The number of data points must be at least this value for a bspline curve to be possible.
    For movingavg curves, this defines the number of points to include in each average computation. For avg curves, 2n - 1 points will be considered, where n = the order value.
    This attribute has no effect with regression or interpolated curve.
    Default order for either type of curve is 4.
    Example: order: 8

resolution     n

    Only relevant for bspline curves. For every input point, n result points will be generated. Default is 5.0.

linedetails     linedetails

    Appearance details for the curve. Note that dash patterns may not be effective with generated curves (other than regression curves) because of point density.
    Example: linedetails: color=red width=2.0

xsort     yes | no

    Whether or not to sort the input data on xfield before generating curves of the interpolated type. Default is no.




Range control & selecting data rows

select     select expression

    Allows selected data points to be included in curve computation.
    Example: select: @@3 > 0

calcrange     min     [max]

    Data within this X range will be included in curve calculation. If only one value is given, it will be taken as the range minima and the maxima will be the plottable maxima. If not specified all data rows will be included.

linerange     min     [max]

    Controls the X range (in scaled units) within which the curve will be rendered. Data points falling outside this range will not be rendered. If accumulation is being done, points outside the range will contribute to the accumulated total. If only one value is given, it will be taken as the range minima and the maxima will be the plottable maxima. If not specified all data rows will be plotted.
    For regression curves, this attribute may be used to limit the X range of the regression line, or to create a regression line that extends beyond the X range of the data. In this case, min and max should both be given.
    Example: linerange: 1

clip     yes | no

    Default is no. If set to yes, generated curve will be clipped to the plotting area in Y. (Regression curves are always clipped.) (2.30+)




Legend

legendlabel     text

    A label to be associated with the curve in the legend. proc legend must be executed later in order to render the legend. The \\n construct can be used to force a line break or the label can be wordwrapped using proc legend wraplen attribute (2.32+). If proc getdata field names are being used, the special symbol #usefname causes the field name of yfield to be automatically used as the legend label (2.04+).
    Example: legendlabel: Northeast region
    Example: legendlabel: #usefname




Accessing the coordinates of the generated curve

showresults     yes | no

    If yes, a listing of the points in the computed curve will be written to the diagnostic stream (-diag).

statsonly     yes | no

    If yes, don't draw the result curve. This is for situations where the user only wants the computed curve values (showresults) or the REGRESSION_LINE and CORRELATION variables to be set.




Variables that are set by proc curvefit

REGRESSION_LINE

    If curvetype is regression, this variable will be set to display the formula for the regression line.

CORRELATION

    If curvetype is regression, this variable will be set to display the Pearson correlation coefficient (r), which ranges from -1.0 to 1.0, where 1.0 is a strong correlation (positive slope), -1.0 is a strong correlation (negative slope), and 0 is no correlation.

CORRELATION_P

    If curvetype is regression, this variable will be set to display the p value on r (Pearson). New in 2.42

XFINAL and YFINAL

    are set to the final location (in scaled space) of the end of the drawn curve.

 


Ploticus 2.42 ... May 2013 Terms of use /