Search
Ploticus >
Scripts >
proc curvefit uses the
current data set
to compute a curve which it then renders in the
current plotting area.
Available curve types are: moving average, average, linear regression, bspline, and interpolated curves.
Typical uses are to clarify overall trends in the data, or for smoothing.
(If you just want to draw a line connecting your data points, without any smoothing, use
proc lineplot.)
The data do not have to be in X order.. they will be sorted on X as part of the process
(except with the interpolated curve type). See also the
gallery curvefit examples.
Limitations:
The maximum number of input points for a bspline curve is 100.
The default maxiumum number of input points for all other curve types
is 1000.. to raise this limit use the proc curvefit attribute maxinpoints.
Generated curve points are placed into the plotting vector; its size can
be controlled using command line argument -maxvect.
Attributes
The yfield attribute MUST be specified.
yfield
dfield
Data field to use for Y values. Example: yfield: 1
xfield
dfield
Data field to use for X values.
If not given, sequential unit locations in X will be used.
Example: xfield: 4
curvetype
movingavg | regression | bspline | avg | interpolated
The type of curve fitting computation to perform.
movingavg - for each point, it takes the average of the current point and n-1
points to the left (or as many points as are available).
n is controlled by the order attribute.
Often used in finance.
regression - Computes the linear regression for the set of points. The result will be a
straight line expressing the relationship between X and Y. Often used with scatterplots.
The variables REGRESSION_LINE and CORRELATION will be set (see VARIABLES above).
bspline - draws a curve using the bspline algorithm. The order and resolution
attributes control the appearance of the result. May be used to fit a curve to a histogram.
avg - similar to movingavg except that it also includes n-1 points to the right
of the current point (or as many points as are available) in the average.
Thus, for a point that is far from either edge, 2n-1 points will be averaged.
interpolated - a spline interpolation between the given data points, ie. the curve will pass through
all input data points (this type is new in version 2.20, code contibuted by Oliver Koch)
Example: curvetype: movingavg
maxinpoints
n
Maximum number of input points for curve types other than bspline. Default is 1000. (ver 2.30+)
Details of curve appearance
order
n
For bspline curves, this is a value between 2 and 20; a lower value
yields a more jagged curve, while a higher value gives a smoother curve.
The number of data points must be at least this value for a bspline curve
to be possible.
For movingavg curves, this defines the number of points
to include in each average computation. For avg curves, 2n - 1
points will be considered, where n = the order value.
This attribute has no effect with regression or interpolated curve.
Default order for either type of curve is 4.
resolution
n
Only relevant for bspline curves.
For every input point, n result points will be generated.
Default is 5.0.
linedetails
linedetails
Appearance details for the curve.
Note that dash patterns may not be effective with generated curves (other than regression curves)
because of point density.
Example: linedetails: color=red width=2.0
xsort
yes | no
Whether or not to sort the input data on xfield
before generating curves of the interpolated type.
Default is no.
Range control & selecting data rows
select
select expression
Allows selected data points to be included in curve computation.
Example: select: @@3 > 0
calcrange
min
[max]
Data within this X range will be included in curve calculation.
If only one value is given, it will be taken as the range
minima and the maxima will be the plottable maxima.
If not specified all data rows will be included.
linerange
min
[max]
Controls the X range (in scaled units) within which the curve will be rendered.
Data points falling outside this range will not be rendered.
If accumulation is being done, points outside the range will contribute
to the accumulated total.
If only one value is given, it will be taken as the range
minima and the maxima will be the plottable maxima.
If not specified all data rows will be plotted.
For regression curves, this attribute may be used to limit
the X range of the regression line, or to create a regression line that extends
beyond the X range of the data.
In this case, min and max should both be given.
clip
yes | no
Default is no. If set to yes, generated curve will be clipped to the
plotting area in Y. (Regression curves are always clipped.) (2.30+)
Legend
legendlabel
text
A label to be associated with the curve in the legend.
proc legend must be executed later in order to
render the legend.
The \\n construct can be used to force a line break
or the label can be wordwrapped using proc legend wraplen attribute (2.32+).
If
proc getdata field names
are being used,
the special symbol #usefname causes the field name of yfield
to be automatically used as the legend label (2.04+).
Example: legendlabel: Northeast region
Example: legendlabel: #usefname
Accessing the coordinates of the generated curve
showresults
yes | no
If yes, a listing of the points in the computed curve will
be written to the diagnostic stream (-diag).
statsonly
yes | no
If yes, don't draw the result curve. This is for situations where the user only wants
the computed curve values (showresults) or the
REGRESSION_LINE and CORRELATION variables to be set.
Variables that are set by proc curvefit
REGRESSION_LINE
If curvetype is regression, this variable will be set to display
the formula for the regression line.
CORRELATION
If curvetype is regression, this variable will be set to display
the Pearson correlation coefficient (r), which ranges from -1.0 to 1.0, where
1.0 is a strong correlation (positive slope), -1.0 is a strong correlation
(negative slope), and 0 is no correlation.
CORRELATION_P
If curvetype is regression, this variable will be set to display the p value on r (Pearson).
New in 2.42
XFINAL and YFINAL
are set to the final location (in scaled space) of the end of the drawn curve.
|