| Title: | Data Sharpening |
|---|---|
| Description: | Functions and data sets inspired by data sharpening - data perturbation to achieve improved performance in nonparametric estimation, as described in Choi, E., Hall, P. and Rousson, V. (2000). Capabilities for enhanced local linear regression function and derivative estimation are included, as well as an asymptotically correct iterated data sharpening estimator for any degree of local polynomial regression estimation. A cross-validation-based bandwidth selector is included which, in concert with the iterated sharpener, will often provide superior performance, according to a median integrated squared error criterion. Sample data sets are provided to illustrate function usage. |
| Authors: | W. John Braun <[email protected]> |
| Maintainer: | W.J. Braun <[email protected]> |
| License: | Unlimited |
| Version: | 1.4 |
| Built: | 2026-05-15 06:54:57 UTC |
| Source: | https://github.com/cran/sharpData |
The burnRate data frame contains laboratory data on the
proportion of remaining fuel in a piece of wood that has burned
for a fixed period of time subjected to a fixed windspeed.
data(burnRate)data(burnRate)
This data frame contains the following columns:
a numeric vector
ratio of windspeed, multiplied by density of air, to density of firebrand
factor listing tree species
numeric vector of diameter of burned particle in cm
windspeed in cm per second
length of test in seconds
Albini, F. USDA Forest Service General Technical Report INT-56, 1979.
Cross-validation bandwidth selector for iterated sharpened responses for bias reduction in function estimation.
CVsharp(x, y, deg, nsteps)CVsharp(x, y, deg, nsteps)
x |
a numeric vector containing the predictor variable values. |
y |
a numeric vector containing the response variable values. |
deg |
a numeric vector containing the local polynomial degree used. |
nsteps |
a numeric vector containing the number of iteration steps. |
If nsteps is specified to be 0, then the CV bandwidth for conventional local polynomial regression is provided.
a list containing 3 elements: the candidate bandwidths; the corresponding CV scores; the selected optimal bandwidth.
W.J. Braun
locpoly
speed <- MPG[, 1] mpg <- MPG[, 2] h <- CVsharp(speed, mpg, 0, 0)$CVh # conventional local constant regression bandwidth mpg.l0 <- locpoly(speed, mpg, bandwidth=h, degree=0) h <- CVsharp(speed, mpg, 0, 1)$CVh # 1-sharpened local constant regression bandwidth mpgSharp <- sharpiteration(speed, mpg, 0, h, 1) mpg.l1 <- locpoly(speed, mpgSharp[[1]], bandwidth=h, degree=0) h <- CVsharp(speed, mpg, 0, 5)$CVh # 5-sharpened local constant regression bandwidth mpgSharp <- sharpiteration(speed, mpg, 0, h, 5) mpg.l5 <- locpoly(speed, mpgSharp[[5]], bandwidth=h, degree=0) plot(mpg ~ speed) lines(mpg.l0) # unsharpened function estimation lines(mpg.l1, col=2, lty=2) # sharpened function estimation (1 steps) lines(mpg.l5, col=4, lty=3) # sharpened function estimation (5 steps)speed <- MPG[, 1] mpg <- MPG[, 2] h <- CVsharp(speed, mpg, 0, 0)$CVh # conventional local constant regression bandwidth mpg.l0 <- locpoly(speed, mpg, bandwidth=h, degree=0) h <- CVsharp(speed, mpg, 0, 1)$CVh # 1-sharpened local constant regression bandwidth mpgSharp <- sharpiteration(speed, mpg, 0, h, 1) mpg.l1 <- locpoly(speed, mpgSharp[[1]], bandwidth=h, degree=0) h <- CVsharp(speed, mpg, 0, 5)$CVh # 5-sharpened local constant regression bandwidth mpgSharp <- sharpiteration(speed, mpg, 0, h, 5) mpg.l5 <- locpoly(speed, mpgSharp[[5]], bandwidth=h, degree=0) plot(mpg ~ speed) lines(mpg.l0) # unsharpened function estimation lines(mpg.l1, col=2, lty=2) # sharpened function estimation (1 steps) lines(mpg.l5, col=4, lty=3) # sharpened function estimation (5 steps)
Calculation of sharpened responses for bias reduction in function and first derivative estimation, assuming a gaussian kernel is used in bivariate scatterplot smoothing.
LLsharpen(x, y, h)LLsharpen(x, y, h)
x |
a numeric vector containing the predictor variable values. |
y |
a numeric vector containing the response variable values. |
h |
a numeric vector containing the (scalar) bandwidth. |
a vector containing the sharpened (i.e. perturbed) response values, ready for input into a local linear regression estimator.
W.J. Braun
Choi, E., Hall, P. and Rousson, V. (2000) Data sharpening methods for bias reduction in nonparametric regression. Annals of Statistics 28(5) 1339-1355.
locpoly
speed <- MPG[, 1] mpg <- MPG[, 2] h <- dpill(speed, mpg)*2 mpgSharp <- LLsharpen(speed, mpg, h) mpg.lS <- locpoly(speed, mpgSharp, bandwidth=h, drv=1, degree=1) mpg.lX <- locpoly(speed, mpg, bandwidth=h, drv=1, degree=1) plot(mpg.lX, type="l") # unsharpened derivative estimation lines(mpg.lS, col=2, lty=2) # sharpened derivative estimationspeed <- MPG[, 1] mpg <- MPG[, 2] h <- dpill(speed, mpg)*2 mpgSharp <- LLsharpen(speed, mpg, h) mpg.lS <- locpoly(speed, mpgSharp, bandwidth=h, drv=1, degree=1) mpg.lX <- locpoly(speed, mpg, bandwidth=h, drv=1, degree=1) plot(mpg.lX, type="l") # unsharpened derivative estimation lines(mpg.lS, col=2, lty=2) # sharpened derivative estimation
Local constant and local linear regression are applied to bivariate data. The response is ‘sharpened’ or perturbed in a way to render a monotonically increasing curve estimate.
Monolpoly(x, y, h, d=1, xgrid, numgrid = 401, ...)Monolpoly(x, y, h, d=1, xgrid, numgrid = 401, ...)
x |
a vector of explanatory variable observations |
y |
binary vector of responses |
h |
bandwidth |
d |
degree, can be either 0 or 1 |
xgrid |
gridpoints on x-axis where monotonicity constraint is enforced |
numgrid |
number of equally-spaced gridpoints (if xgrid not specified) |
... |
other arguments for locpoly |
Data are perturbed the smallest possible L2 distance subject to the constraint that the local linear estimate is monotonically increasing.
x |
locations of function estimate evaluations |
y |
function estimate evaluations (sharpened - monotonized) |
ysharp |
sharpened responses |
W.J.Braun
Braun, W.J. and Hall, P., Data Sharpening for Nonparametric Estimation Subject to Constraints, Journal of Computational and Graphical Statistics, 2001
gridpts <- seq(1, 10, length=101) x <- seq(1, 10, length=51) p <- exp(-1 + .2*x)/(1 + exp(-1 + .2*x)) y <- rbinom(51, 1, p) plot(x, y) lines(Monolpoly(x, y, h=0.6, xgrid=gridpts)) ## plot(faithful) with(faithful, lines(Monolpoly(eruptions, waiting, h=0.1, d=1, range=c(1.55,5.15))))gridpts <- seq(1, 10, length=101) x <- seq(1, 10, length=51) p <- exp(-1 + .2*x)/(1 + exp(-1 + .2*x)) y <- rbinom(51, 1, p) plot(x, y) lines(Monolpoly(x, y, h=0.6, xgrid=gridpts)) ## plot(faithful) with(faithful, lines(Monolpoly(eruptions, waiting, h=0.1, d=1, range=c(1.55,5.15))))
This computes a matrix of coefficients of the first derivatives of monotonic local linear sharpening problem.
MonoMat(xgrid, x, h, d)MonoMat(xgrid, x, h, d)
xgrid |
numeric vector of locations where monotonicity constraint is to be enforced |
x |
numeric explanatory vector |
h |
numeric bandwidth |
d |
local polynomial degree, can be either 0 or 1 |
a list containing the A matrix and the number of rows in A.
W.J. Braun
The MPG data frame has 15 rows and 10 columns.
data(MPG)data(MPG)
This data frame contains the following columns:
a numeric vector of cruising speeds in miles per hour
miles per gallon for a 1988 Corsica
miles per gallon for a 1993 Legacy
miles per gallon for a 1994 Oldsmobile
miles per gallon for a 1994 Oldsmobile Cutlass
miles per gallon for a 1994 Chevrolet Pickup
miles per gallon for a 1994 Jeep Cherokee
miles per gallon for a 1994 Villager
miles per gallon for a 1995 Prizm
miles per gallon for a 1997 Toyota Celica
B.H. West, R.N. McGill, J.W. Hodgson, S.S. Sluder, D.E. Smith, Development and Verification of Light-Duty Modal Emissions and Fuel Consumption Values for Traffic Models, Washington, DC, April 1997, and additional project data, April 1998.
data(MPG) plot(celica97 ~ speed, data = MPG)data(MPG) plot(celica97 ~ speed, data = MPG)
Calculation of sharpened responses for bias reduction in function and estimation, assuming a gaussian kernel is used in bivariate scatterplot smoothing.
sharpiteration(x, y, deg, h, nsteps, na.rm, ...)sharpiteration(x, y, deg, h, nsteps, na.rm, ...)
x |
a numeric vector containing the predictor variable values. |
y |
a numeric vector containing the response variable values. |
deg |
a numeric vector containing the local polynomial degree used. |
h |
a numeric vector containing the (scalar) bandwidth. |
nsteps |
a numeric vector containing the number of iteration steps. |
na.rm |
a logical value indicating whether to remove missing values from fitted vectors |
... |
additional arguments to locpoly |
a list with elements containing the sharpened (i.e. perturbed) response values, ready for input into a local polynomial regression estimator. The ith list element corresponds to i steps of data sharpening.
W.J. Braun
locpoly
speed <- MPG[, 1] mpg <- MPG[, 2] h <- dpill(speed, mpg) mpgSharp <- sharpiteration(speed, mpg, 1, h, 2) mpg.lS <- locpoly(speed, mpgSharp[[2]], bandwidth=h, degree=1) mpg.lX <- locpoly(speed, mpg, bandwidth=h, degree=1) plot(mpg ~ speed) lines(mpg.lX) # unsharpened function estimation lines(mpg.lS, col=2, lty=2) # sharpened function estimation (2 steps)speed <- MPG[, 1] mpg <- MPG[, 2] h <- dpill(speed, mpg) mpgSharp <- sharpiteration(speed, mpg, 1, h, 2) mpg.lS <- locpoly(speed, mpgSharp[[2]], bandwidth=h, degree=1) mpg.lX <- locpoly(speed, mpg, bandwidth=h, degree=1) plot(mpg ~ speed) lines(mpg.lX) # unsharpened function estimation lines(mpg.lS, col=2, lty=2) # sharpened function estimation (2 steps)
Nursing times for a baby beluga whale.
data(whale)data(whale)
A data frame with 228 observations on the following 3 variables.
a numeric vector
a numeric vector
a factor with levels 0 104 118 119 126 127 132 135 137 14 144 146 150 151 153 156 157 160 166 167 168 169 170 171 172 174 175 176 180 186 187 189 191 192 193 196 197 198 199 200 204 205 216 218 222 223 225 226 228 229 230 231 232 236 239 243 244 247 252 253 255 257 260 267 271 274 275 277 284 285 286 288 291 292 299 308 320 323 326 332 338 339 340 344 345 349 351 353 354 359 360 362 371 372 377 380 386 404 409 411 419 423 426 429 430 432 433 435 438 440 441 442 443 444 445 446 449 450 453 456 462 463 464 470 473 477 48 485 491 492 494 495 497 504 506 509 51 513 515 524 528 533 537 538 541 565 579 59 590 600 605 613 644 648 659 68 688 69 693 694 702 714 72 720 737 74 750 756 772 80 805 813 825 84 85 870 873 888 92 93 954 96 98 M
Simonoff, J. Smoothing Methods in Statistics, Springer, 1996.