Title: | Data Sharpening |
---|---|
Description: | Functions and data sets inspired by data sharpening - data perturbation to achieve improved performance in nonparametric estimation, as described in Choi, E., Hall, P. and Rousson, V. (2000). Capabilities for enhanced local linear regression function and derivative estimation are included, as well as an asymptotically correct iterated data sharpening estimator for any degree of local polynomial regression estimation. A cross-validation-based bandwidth selector is included which, in concert with the iterated sharpener, will often provide superior performance, according to a median integrated squared error criterion. Sample data sets are provided to illustrate function usage. |
Authors: | W. John Braun <[email protected]> |
Maintainer: | W.J. Braun <[email protected]> |
License: | Unlimited |
Version: | 1.4 |
Built: | 2025-02-19 03:13:06 UTC |
Source: | https://github.com/cran/sharpData |
The burnRate
data frame contains laboratory data on the
proportion of remaining fuel in a piece of wood that has burned
for a fixed period of time subjected to a fixed windspeed.
data(burnRate)
data(burnRate)
This data frame contains the following columns:
a numeric vector
ratio of windspeed, multiplied by density of air, to density of firebrand
factor listing tree species
numeric vector of diameter of burned particle in cm
windspeed in cm per second
length of test in seconds
Albini, F. USDA Forest Service General Technical Report INT-56, 1979.
Cross-validation bandwidth selector for iterated sharpened responses for bias reduction in function estimation.
CVsharp(x, y, deg, nsteps)
CVsharp(x, y, deg, nsteps)
x |
a numeric vector containing the predictor variable values. |
y |
a numeric vector containing the response variable values. |
deg |
a numeric vector containing the local polynomial degree used. |
nsteps |
a numeric vector containing the number of iteration steps. |
If nsteps is specified to be 0, then the CV bandwidth for conventional local polynomial regression is provided.
a list containing 3 elements: the candidate bandwidths; the corresponding CV scores; the selected optimal bandwidth.
W.J. Braun
locpoly
speed <- MPG[, 1] mpg <- MPG[, 2] h <- CVsharp(speed, mpg, 0, 0)$CVh # conventional local constant regression bandwidth mpg.l0 <- locpoly(speed, mpg, bandwidth=h, degree=0) h <- CVsharp(speed, mpg, 0, 1)$CVh # 1-sharpened local constant regression bandwidth mpgSharp <- sharpiteration(speed, mpg, 0, h, 1) mpg.l1 <- locpoly(speed, mpgSharp[[1]], bandwidth=h, degree=0) h <- CVsharp(speed, mpg, 0, 5)$CVh # 5-sharpened local constant regression bandwidth mpgSharp <- sharpiteration(speed, mpg, 0, h, 5) mpg.l5 <- locpoly(speed, mpgSharp[[5]], bandwidth=h, degree=0) plot(mpg ~ speed) lines(mpg.l0) # unsharpened function estimation lines(mpg.l1, col=2, lty=2) # sharpened function estimation (1 steps) lines(mpg.l5, col=4, lty=3) # sharpened function estimation (5 steps)
speed <- MPG[, 1] mpg <- MPG[, 2] h <- CVsharp(speed, mpg, 0, 0)$CVh # conventional local constant regression bandwidth mpg.l0 <- locpoly(speed, mpg, bandwidth=h, degree=0) h <- CVsharp(speed, mpg, 0, 1)$CVh # 1-sharpened local constant regression bandwidth mpgSharp <- sharpiteration(speed, mpg, 0, h, 1) mpg.l1 <- locpoly(speed, mpgSharp[[1]], bandwidth=h, degree=0) h <- CVsharp(speed, mpg, 0, 5)$CVh # 5-sharpened local constant regression bandwidth mpgSharp <- sharpiteration(speed, mpg, 0, h, 5) mpg.l5 <- locpoly(speed, mpgSharp[[5]], bandwidth=h, degree=0) plot(mpg ~ speed) lines(mpg.l0) # unsharpened function estimation lines(mpg.l1, col=2, lty=2) # sharpened function estimation (1 steps) lines(mpg.l5, col=4, lty=3) # sharpened function estimation (5 steps)
Calculation of sharpened responses for bias reduction in function and first derivative estimation, assuming a gaussian kernel is used in bivariate scatterplot smoothing.
LLsharpen(x, y, h)
LLsharpen(x, y, h)
x |
a numeric vector containing the predictor variable values. |
y |
a numeric vector containing the response variable values. |
h |
a numeric vector containing the (scalar) bandwidth. |
a vector containing the sharpened (i.e. perturbed) response values, ready for input into a local linear regression estimator.
W.J. Braun
Choi, E., Hall, P. and Rousson, V. (2000) Data sharpening methods for bias reduction in nonparametric regression. Annals of Statistics 28(5) 1339-1355.
locpoly
speed <- MPG[, 1] mpg <- MPG[, 2] h <- dpill(speed, mpg)*2 mpgSharp <- LLsharpen(speed, mpg, h) mpg.lS <- locpoly(speed, mpgSharp, bandwidth=h, drv=1, degree=1) mpg.lX <- locpoly(speed, mpg, bandwidth=h, drv=1, degree=1) plot(mpg.lX, type="l") # unsharpened derivative estimation lines(mpg.lS, col=2, lty=2) # sharpened derivative estimation
speed <- MPG[, 1] mpg <- MPG[, 2] h <- dpill(speed, mpg)*2 mpgSharp <- LLsharpen(speed, mpg, h) mpg.lS <- locpoly(speed, mpgSharp, bandwidth=h, drv=1, degree=1) mpg.lX <- locpoly(speed, mpg, bandwidth=h, drv=1, degree=1) plot(mpg.lX, type="l") # unsharpened derivative estimation lines(mpg.lS, col=2, lty=2) # sharpened derivative estimation
Local constant and local linear regression are applied to bivariate data. The response is ‘sharpened’ or perturbed in a way to render a monotonically increasing curve estimate.
Monolpoly(x, y, h, d=1, xgrid, numgrid = 401, ...)
Monolpoly(x, y, h, d=1, xgrid, numgrid = 401, ...)
x |
a vector of explanatory variable observations |
y |
binary vector of responses |
h |
bandwidth |
d |
degree, can be either 0 or 1 |
xgrid |
gridpoints on x-axis where monotonicity constraint is enforced |
numgrid |
number of equally-spaced gridpoints (if xgrid not specified) |
... |
other arguments for locpoly |
Data are perturbed the smallest possible L2 distance subject to the constraint that the local linear estimate is monotonically increasing.
x |
locations of function estimate evaluations |
y |
function estimate evaluations (sharpened - monotonized) |
ysharp |
sharpened responses |
W.J.Braun
Braun, W.J. and Hall, P., Data Sharpening for Nonparametric Estimation Subject to Constraints, Journal of Computational and Graphical Statistics, 2001
gridpts <- seq(1, 10, length=101) x <- seq(1, 10, length=51) p <- exp(-1 + .2*x)/(1 + exp(-1 + .2*x)) y <- rbinom(51, 1, p) plot(x, y) lines(Monolpoly(x, y, h=0.6, xgrid=gridpts)) ## plot(faithful) with(faithful, lines(Monolpoly(eruptions, waiting, h=0.1, d=1, range=c(1.55,5.15))))
gridpts <- seq(1, 10, length=101) x <- seq(1, 10, length=51) p <- exp(-1 + .2*x)/(1 + exp(-1 + .2*x)) y <- rbinom(51, 1, p) plot(x, y) lines(Monolpoly(x, y, h=0.6, xgrid=gridpts)) ## plot(faithful) with(faithful, lines(Monolpoly(eruptions, waiting, h=0.1, d=1, range=c(1.55,5.15))))
This computes a matrix of coefficients of the first derivatives of monotonic local linear sharpening problem.
MonoMat(xgrid, x, h, d)
MonoMat(xgrid, x, h, d)
xgrid |
numeric vector of locations where monotonicity constraint is to be enforced |
x |
numeric explanatory vector |
h |
numeric bandwidth |
d |
local polynomial degree, can be either 0 or 1 |
a list containing the A matrix and the number of rows in A.
W.J. Braun
The MPG
data frame has 15 rows and 10 columns.
data(MPG)
data(MPG)
This data frame contains the following columns:
a numeric vector of cruising speeds in miles per hour
miles per gallon for a 1988 Corsica
miles per gallon for a 1993 Legacy
miles per gallon for a 1994 Oldsmobile
miles per gallon for a 1994 Oldsmobile Cutlass
miles per gallon for a 1994 Chevrolet Pickup
miles per gallon for a 1994 Jeep Cherokee
miles per gallon for a 1994 Villager
miles per gallon for a 1995 Prizm
miles per gallon for a 1997 Toyota Celica
B.H. West, R.N. McGill, J.W. Hodgson, S.S. Sluder, D.E. Smith, Development and Verification of Light-Duty Modal Emissions and Fuel Consumption Values for Traffic Models, Washington, DC, April 1997, and additional project data, April 1998.
data(MPG) plot(celica97 ~ speed, data = MPG)
data(MPG) plot(celica97 ~ speed, data = MPG)
Calculation of sharpened responses for bias reduction in function and estimation, assuming a gaussian kernel is used in bivariate scatterplot smoothing.
sharpiteration(x, y, deg, h, nsteps, na.rm, ...)
sharpiteration(x, y, deg, h, nsteps, na.rm, ...)
x |
a numeric vector containing the predictor variable values. |
y |
a numeric vector containing the response variable values. |
deg |
a numeric vector containing the local polynomial degree used. |
h |
a numeric vector containing the (scalar) bandwidth. |
nsteps |
a numeric vector containing the number of iteration steps. |
na.rm |
a logical value indicating whether to remove missing values from fitted vectors |
... |
additional arguments to locpoly |
a list with elements containing the sharpened (i.e. perturbed) response values, ready for input into a local polynomial regression estimator. The ith list element corresponds to i steps of data sharpening.
W.J. Braun
locpoly
speed <- MPG[, 1] mpg <- MPG[, 2] h <- dpill(speed, mpg) mpgSharp <- sharpiteration(speed, mpg, 1, h, 2) mpg.lS <- locpoly(speed, mpgSharp[[2]], bandwidth=h, degree=1) mpg.lX <- locpoly(speed, mpg, bandwidth=h, degree=1) plot(mpg ~ speed) lines(mpg.lX) # unsharpened function estimation lines(mpg.lS, col=2, lty=2) # sharpened function estimation (2 steps)
speed <- MPG[, 1] mpg <- MPG[, 2] h <- dpill(speed, mpg) mpgSharp <- sharpiteration(speed, mpg, 1, h, 2) mpg.lS <- locpoly(speed, mpgSharp[[2]], bandwidth=h, degree=1) mpg.lX <- locpoly(speed, mpg, bandwidth=h, degree=1) plot(mpg ~ speed) lines(mpg.lX) # unsharpened function estimation lines(mpg.lS, col=2, lty=2) # sharpened function estimation (2 steps)
Nursing times for a baby beluga whale.
data(whale)
data(whale)
A data frame with 228 observations on the following 3 variables.
a numeric vector
a numeric vector
a factor with levels 0
104
118
119
126
127
132
135
137
14
144
146
150
151
153
156
157
160
166
167
168
169
170
171
172
174
175
176
180
186
187
189
191
192
193
196
197
198
199
200
204
205
216
218
222
223
225
226
228
229
230
231
232
236
239
243
244
247
252
253
255
257
260
267
271
274
275
277
284
285
286
288
291
292
299
308
320
323
326
332
338
339
340
344
345
349
351
353
354
359
360
362
371
372
377
380
386
404
409
411
419
423
426
429
430
432
433
435
438
440
441
442
443
444
445
446
449
450
453
456
462
463
464
470
473
477
48
485
491
492
494
495
497
504
506
509
51
513
515
524
528
533
537
538
541
565
579
59
590
600
605
613
644
648
659
68
688
69
693
694
702
714
72
720
737
74
750
756
772
80
805
813
825
84
85
870
873
888
92
93
954
96
98
M
Simonoff, J. Smoothing Methods in Statistics, Springer, 1996.