| Title: | Data Sets from Montgomery, Peck and Vining |
|---|---|
| Description: | Most of this package consists of data sets from the textbook Introduction to Linear Regression Analysis (3rd ed), by Montgomery, Peck and Vining. Some additional data sets and functions are also included. |
| Authors: | W.J. Braun [aut, cre], S. MacQueen [aut] |
| Maintainer: | W.J. Braun <[email protected]> |
| License: | Unlimited |
| Version: | 2.0 |
| Built: | 2026-05-26 07:13:02 UTC |
| Source: | https://github.com/cran/MPV |
Numbers of aberrant crypt foci (ACF) in colons of 66 rats subjected to a various numbers of dose of the carcinogen azoxymethane (AOM), sacrificed at 3 different times.
ACFACF
This data frame contains the following columns:
The number of carcinogen injections
Time of sacrifice, in weeks following injection of AOM
The number of ACF observed in each rat colon
Ranjana P. Bird, Faculty of Human Ecology, University of Manitoba, Winnipeg, Canada.
E.A. McLellan, A. Medline and R.P. Bird. Dose response and proliferative characteristics of aberrant crypt foci: putative preneoplastic lesions in rat colon. Carcinogenesis, 12(11): 2093-2098, 1991.
sapply(split(ACF$COUNT,ACF$T),var)sapply(split(ACF$COUNT,ACF$T),var)
The airconditioner data frame has 20 observations on 3
variables related to measurements on electricity usage during
a summer month for four different kinds of air conditioning
systems. The measurements were taken in houses that were
randomly selected from five different home types which depended
on factors such as floor space, etc.
data(airconditioner)data(airconditioner)
This data frame contains the following columns:
a factor representing type of home
a factor representing the air conditioning system
a numeric vector representing electricity usage in KWh
Devore, J.L., and Farnum, N. (2005) Applied Statistics for Engineers and Scientists. 2nd Edition, Thomson.
Flight distances (in meters) for 12 paper airplanes of varying weights.
data("airplane")data("airplane")
A data frame with 12 observations on 2 variables.
weightfactor with 3 levels
distancenumeric flight distances
Simulated flight distances (in meters) for 12 paper airplanes of varying weights. These data were generated under the assumption that there is no difference in mean flight difference due to differences in the weight of the paper. The noise variance was assumed to be 0.96.
data("airplane.sim01")data("airplane.sim01")
A data frame with 12 observations on 2 variables.
weightfactor with 3 levels
distancenumeric flight distances
Simulated flight distances (in meters) for 12 paper airplanes of varying weights. These data were generated under the assumption that there is no difference in mean flight difference due to differences in the weight of the paper. The noise variance was assumed to be 0.96.
data("airplane.sim01")data("airplane.sim01")
A data frame with 12 observations on 2 variables.
weightfactor with 3 levels
distancenumeric flight distances
Simulated flight distances (in meters) for 12 paper airplanes of varying weights. These data were generated under the assumption that there are differences in mean flight difference due to differences in the weight of the paper. The noise variance was assumed to be 0.96.
data("airplane.sim01")data("airplane.sim01")
A data frame with 12 observations on 2 variables.
weightfactor with 3 levels
distancenumeric flight distances
Flight distances (in meters) for 20 paper airplanes of varying weights.
data("airplane2")data("airplane2")
A data frame with 20 observations on 2 variables.
weightfactor with 4 levels
distancenumeric flight distances
Flight distances (in meters) for 20 paper airplanes of varying weights.
data("airplane3")data("airplane3")
A data frame with 20 observations on 2 variables.
weightfactor with 4 levels
distancenumeric flight distances
Graphs of confidence interval estimates for bias and standard deviation of in bias-corrected local polynomial regression curve estimates.
BCCIPlot(data, k1=1, k2=2, h, h2, output, g, layout, incl.biasplot, plotdata)BCCIPlot(data, k1=1, k2=2, h, h2, output, g, layout, incl.biasplot, plotdata)
data |
A data frame, whose first column must be the explanatory variable and whose second column must be the response variable. |
k1 |
degree of local polynomial used in curve estimator. |
k2 |
degree of local polynomial used in bias estimator. |
h |
bandwidth for regression estimator. |
h2 |
bandwidth for bias estimator. |
output |
if TRUE, numeric output is printed to the console window. |
g |
the target function, if known (for use in simulations). |
layout |
if TRUE, a 2x1 layout of plots is sent to the graphics device. |
incl.biasplot |
if TRUE, the confidence intervals for the bias of the uncorrected estimate are plotted. |
plotdata |
if TRUE, the data points are plotted as a scatter plot. |
A list containing the confidence interval limits, pointwise estimates of bias, standard deviation of bias, curve estimate, standard deviation of curve estimate, and approximate confidence limits for curve estimates. Graphs of the curve estimate confidence limits and the bias confidence limits.
W. John Braun and Wenkai Ma
Confidence interval estimates for bias in local polynomial regression.
BCLPBias(xy,k1,k2,h,h2,numgrid=401,alpha=.95)BCLPBias(xy,k1,k2,h,h2,numgrid=401,alpha=.95)
xy |
A data frame, whose first column must be the explanatory variable and whose second column must be the response variable. |
k1 |
degree of local polynomial used in curve estimator. |
k2 |
degree of local polynomial used in bias estimator. |
h |
bandwidth for regression estimator. |
h2 |
bandwidth for bias estimator. |
numgrid |
number of gridpoints used in the curve estimator. |
alpha |
nominal confidence level. |
A list containing the confidence interval limits, pointwise estimates of bias, standard deviation of bias, curve estimate, standard deviation of curve estimate, and approximate confidence limits for curve estimates and corresponding bias-corrected estimates.
W. John Braun and Wenkai Ma
Graphs of confidence interval estimates for bias and standard deviation of in local polynomial regression curve estimates.
BiasVarPlot(data, k1=1, k2=2, h, h2, output=FALSE, g, layout=TRUE)BiasVarPlot(data, k1=1, k2=2, h, h2, output=FALSE, g, layout=TRUE)
data |
A data frame, whose first column must be the explanatory variable and whose second column must be the response variable. |
k1 |
degree of local polynomial used in curve estimator. |
k2 |
degree of local polynomial used in bias estimator. |
h |
bandwidth for regression estimator. |
h2 |
bandwidth for bias estimator. |
output |
if true, numeric output is printed to the console window. |
g |
the target function, if known (for use in simulations). |
layout |
if true, a 2x1 layout of plots is sent to the graphics device. |
A list containing the confidence interval limits, pointwise estimates of bias, standard deviation of bias, curve estimate, standard deviation of curve estimate, and approximate confidence limits for curve estimates. Graphs of the curve estimate confidence limits and the bias confidence limits.
W. John Braun and Wenkai Ma
The BioOxyDemand data frame has 14 rows and 2 columns.
data(BioOxyDemand)data(BioOxyDemand)
This data frame contains the following columns:
a numeric vector
a numeric vector
Devore, J. L. (2000) Probability and Statistics for Engineering and the Sciences (5th ed), Duxbury
plot(BioOxyDemand) summary(lm(y ~ x, data = BioOxyDemand))plot(BioOxyDemand) summary(lm(y ~ x, data = BioOxyDemand))
Systolic and diastolic blood pressure measurement readings were taken on a 56-year-old male over a 39 day period, sometimes in the mornings (AM) and sometimes in the evening (PM). Varying number of replicate measurements were taken at each time point.
bpbp
A data frame with 121 observations on the following 4 variables.
TimeofDayfactor with levels AM and PM
Datenumeric
Systolicnumeric
Diastolicnumeric
require(lattice) xyplot(Date ~ Diastolic|TimeofDay, groups=cut(Systolic, c(0, 130, 140, 200)), data = bp, col=c(3, 1, 2), pch=16) matplot(bp[, c(3, 4)], type="l", lwd=2, ylab="Pressure") n <- nrow(bp) abline(v=(1:n)[bp[,1]=="PM"]-.5, col="grey") abline(v=(1:n)[bp[,1]=="PM"], col="grey") abline(v=(1:n)[bp[,1]=="PM"]+.5, col="grey") bp.stk <- stack(bp, c("Systolic", "Diastolic")) bp.tmp <- rbind(bp[,1:2], bp[,1:2]) bp.stk <- cbind(bp.tmp, bp.stk) names(bp.stk) <- c("TimeofDay", "Date", "Pressure", "Type") reps <- NULL for (j in rle(paste(bp.stk$Date, bp.stk$TimeofDay))$lengths) reps <- c(reps, (1:j)) bp.stk$Rep <- reps xyplot(Pressure ~ I(Date+Rep/24)|TimeofDay, groups=Type, data = bp.stk, xlab="Date", pch=16)require(lattice) xyplot(Date ~ Diastolic|TimeofDay, groups=cut(Systolic, c(0, 130, 140, 200)), data = bp, col=c(3, 1, 2), pch=16) matplot(bp[, c(3, 4)], type="l", lwd=2, ylab="Pressure") n <- nrow(bp) abline(v=(1:n)[bp[,1]=="PM"]-.5, col="grey") abline(v=(1:n)[bp[,1]=="PM"], col="grey") abline(v=(1:n)[bp[,1]=="PM"]+.5, col="grey") bp.stk <- stack(bp, c("Systolic", "Diastolic")) bp.tmp <- rbind(bp[,1:2], bp[,1:2]) bp.stk <- cbind(bp.tmp, bp.stk) names(bp.stk) <- c("TimeofDay", "Date", "Pressure", "Type") reps <- NULL for (j in rle(paste(bp.stk$Date, bp.stk$TimeofDay))$lengths) reps <- c(reps, (1:j)) bp.stk$Rep <- reps xyplot(Pressure ~ I(Date+Rep/24)|TimeofDay, groups=Type, data = bp.stk, xlab="Date", pch=16)
The cement data frame has 13 rows and 5 columns.
data(cement)data(cement)
This data frame contains the following columns:
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(cement) pairs(cement)data(cement) pairs(cement)
On a university campus there are a number of areas designated for smoking. Outside of those areas, smoking is not permitted. One of the smoking areas is towards the north end of the campus near some parking lots and a large walkway towards one of the residences. Along the walkway, cigarette butts are visible in the nearby grass. Numbers of cigarette butts were counted at various distances from the smoking area in 200x80 square-cm quadrats located just west of the walkway.
data("cigbutts")data("cigbutts")
A data frame with 15 observations on the following 2 variables.
distancedistance from gazebo
countobserved number of butts
Strength measurements of 5 bolts of cloth, each treated with varying amounts of a chemical.
ClothStrengthClothStrength
This data frame contains the following columns:
a factor with 5 levels
a factor with 4 levels
a numeric vector
The earthquake data frame contains measurements of latitude, longitude, focal depth and magnitude for all earthquakes having magnitude greater than 5.8 between 1964 and 1985.
earthquakeearthquake
This data frame contains 2178 observations on the following columns:
numeric vector of focal depths.
latitudinal coordinate.
longitudinal coordinate.
numeric vector of magnitudes.
Jeffrey S. Simonoff (1996), Smoothing Methods in Statistics, Springer-Verlag, New York.
summary(earthquake)summary(earthquake)
Rate of spread measurements (inches/s) in each direction: East, West, North and South for each of 31 experimental runs at given slopes, measured over the given time period of each (measured in seconds).
firesfires
A data frame with 31 observations on the following 7 variables.
Runnumeric
Slopenumeric: vertical rise divided by horizontal run, inclined from East to West
ROS_Enumeric: rate of spread measured in easterly direction
ROS_Wnumeric: rate of spread measured in westerly direction
ROS_Snumeric: rate of spread measured in southerly direction
ROS_Nnumeric: rate of spread measured in northerly direction
Timenumeric
Braun, W.J. and Woolford, D.G. (2013) Assessing a stochastic fire spread simulator. Journal of Environmental Informatics. 22:1-12.
Graphical analysis of one-way ANOVA data. It allows visualization of the usual F-test.
GANOVA(dataset, var.equal=TRUE, type="QQ", center=TRUE, shift=0)GANOVA(dataset, var.equal=TRUE, type="QQ", center=TRUE, shift=0)
dataset |
A data frame, whose first column must be the factor variable and whose second column must be the response variable. |
var.equal |
Logical: if TRUE, within-sample variances are assumed to be equal |
type |
"QQ" or "hist" |
center |
if TRUE, center and scale the means to match the scale of the errors |
shift |
on the histogram, lift the points representing the means above the horizontal axis by this amount. |
A QQ-plot or a histogram and rugplot
W. John Braun and Sarah MacQueen
Braun, W.J. 2013. Naive Analysis of Variance. Journal of Statistics Education.
This data frame contains the average monthly volume of natural gas used in the furnace of a 1600 square foot house located in London, Ontario, for each month from 2006 until 2011. It also contains the average temperature for each month, and a measure of degree days. Insulation was added to the roof on one occasions, the walls were insulated on a second occasion, and the mid-efficiency furnace was replaced with a high-efficiency furnace on a third occasion.
data("gasdata")data("gasdata")
A data frame with 70 observations on the following 9 variables.
monthnumeric 1=January, 12=December
degreedaysnumeric, Celsius
cubicmetrestotal volume of gas used in a month
dailyusageaverage amount of gas used per day
tempaverage temperature in Celsius
yearnumeric
I1indicator that roof insulation is present
I2indicator that wasll insulation is present
I3indicator that high efficiency furnace is present
This function analyzes regression data graphically. It allows visualization of the usual F-test for significance of regression.
GFplot(X, y, plotIt=TRUE, sortTrt=FALSE, type="hist", includeIntercept=TRUE, labels=FALSE)GFplot(X, y, plotIt=TRUE, sortTrt=FALSE, type="hist", includeIntercept=TRUE, labels=FALSE)
X |
The design matrix. |
y |
A numeric vector containing the response. |
plotIt |
Logical: if TRUE, a graph is drawn. |
sortTrt |
Logical: if TRUE, an attempt is made at sorting the predictor effects in descending order. |
type |
"QQ" or "hist" |
includeIntercept |
Logical: if TRUE, the intercept effect is plotted; otherwise, it is omitted from the plot. |
labels |
logical: if TRUE, names of predictor variables are used as labels; otherwise, the design matrix column numbers are used as labels |
A QQ-plot or a histogram and rugplot, or a list if plotIt=FALSE
W. John Braun
Braun, W.J. 2013. Regression Analysis and the QR Decomposition. Preprint.
# Example 1 X <- p4.18[,-4] y <- p4.18[,4] GFplot(X, y, type="hist", includeIntercept=FALSE) title("Evidence of Regression in the Jojoba Oil Data") # Example 2 set.seed(4571) Z <- matrix(rnorm(400), ncol=10) A <- matrix(rnorm(81), ncol=9) simdata <- data.frame(Z[,1], crossprod(t(Z[,-1]),A)) names(simdata) <- c("y", paste("x", 1:9, sep="")) GFplot(simdata[,-1], simdata[,1], type="hist", includeIntercept=FALSE) title("Evidence of Regression in Simulated Data Set") # Example 3 GFplot(table.b1[,-1], table.b1[,1], type="hist", includeIntercept=FALSE) title("Evidence of Regression in NFL Data Set") # An example where stepwise AIC selects the complement # of the set of variables that are actually in the true model: X <- pathoeg[,-10] y <- pathoeg[,10] par(mfrow=c(2,2)) GFplot(X, y) GFplot(X, y, sortTrt=TRUE) GFplot(X, y, type="QQ") GFplot(X, y, sortTrt=TRUE, type="QQ") X <- table.b1[,-1] # NFL data y <- table.b1[,1] GFplot(X, y)# Example 1 X <- p4.18[,-4] y <- p4.18[,4] GFplot(X, y, type="hist", includeIntercept=FALSE) title("Evidence of Regression in the Jojoba Oil Data") # Example 2 set.seed(4571) Z <- matrix(rnorm(400), ncol=10) A <- matrix(rnorm(81), ncol=9) simdata <- data.frame(Z[,1], crossprod(t(Z[,-1]),A)) names(simdata) <- c("y", paste("x", 1:9, sep="")) GFplot(simdata[,-1], simdata[,1], type="hist", includeIntercept=FALSE) title("Evidence of Regression in Simulated Data Set") # Example 3 GFplot(table.b1[,-1], table.b1[,1], type="hist", includeIntercept=FALSE) title("Evidence of Regression in NFL Data Set") # An example where stepwise AIC selects the complement # of the set of variables that are actually in the true model: X <- pathoeg[,-10] y <- pathoeg[,10] par(mfrow=c(2,2)) GFplot(X, y) GFplot(X, y, sortTrt=TRUE) GFplot(X, y, type="QQ") GFplot(X, y, sortTrt=TRUE, type="QQ") X <- table.b1[,-1] # NFL data y <- table.b1[,1] GFplot(X, y)
This function analyzes regression data graphically. It allows visualization of the usual F-test for significance of regression.
GRegplot(X, y, sortTrt=FALSE, includeIntercept=TRUE, type="hist")GRegplot(X, y, sortTrt=FALSE, includeIntercept=TRUE, type="hist")
X |
The design matrix. |
y |
A numeric vector containing the response. |
sortTrt |
Logical: if TRUE, an attempt is made at sorting the predictor effects in descending order. |
includeIntercept |
Logical: if TRUE, the intercept effect is plotted; otherwise, it is omitted from the plot. |
type |
Character: hist, for histogram; dot, for stripchart |
A histogram or dotplot and rugplot
W. John Braun
Braun, W.J. 2014. Visualization of Evidence in Regression Analysis with the QR Decomposition. Preprint.
# Example 1 X <- p4.18[,-4] y <- p4.18[,4] GRegplot(X, y, includeIntercept=FALSE) title("Evidence of Regression in the Jojoba Oil Data") # Example 2 set.seed(4571) Z <- matrix(rnorm(400), ncol=10) A <- matrix(rnorm(81), ncol=9) simdata <- data.frame(Z[,1], crossprod(t(Z[,-1]),A)) names(simdata) <- c("y", paste("x", 1:9, sep="")) GRegplot(simdata[,-1], simdata[,1], includeIntercept=FALSE) title("Evidence of Regression in Simulated Data Set") # Example 3 GRegplot(table.b1[,-1], table.b1[,1], includeIntercept=FALSE) title("Evidence of Regression in NFL Data Set") # An example where stepwise AIC selects the complement # of the set of variables that are actually in the true model: X <- pathoeg[,-10] y <- pathoeg[,10] par(mfrow=c(2,1)) GRegplot(X, y) GRegplot(X, y, sortTrt=TRUE) X <- table.b1[,-1] # NFL data y <- table.b1[,1] GRegplot(X, y)# Example 1 X <- p4.18[,-4] y <- p4.18[,4] GRegplot(X, y, includeIntercept=FALSE) title("Evidence of Regression in the Jojoba Oil Data") # Example 2 set.seed(4571) Z <- matrix(rnorm(400), ncol=10) A <- matrix(rnorm(81), ncol=9) simdata <- data.frame(Z[,1], crossprod(t(Z[,-1]),A)) names(simdata) <- c("y", paste("x", 1:9, sep="")) GRegplot(simdata[,-1], simdata[,1], includeIntercept=FALSE) title("Evidence of Regression in Simulated Data Set") # Example 3 GRegplot(table.b1[,-1], table.b1[,1], includeIntercept=FALSE) title("Evidence of Regression in NFL Data Set") # An example where stepwise AIC selects the complement # of the set of variables that are actually in the true model: X <- pathoeg[,-10] y <- pathoeg[,10] par(mfrow=c(2,1)) GRegplot(X, y) GRegplot(X, y, sortTrt=TRUE) X <- table.b1[,-1] # NFL data y <- table.b1[,1] GRegplot(X, y)
Juliet has 28 rows and 9 columns. The data is of the input and output of the Spirit Still "Juliet" from Endless Summer Distillery. It is suggested to split the data by the Batch factor for ease of use.
JulietJuliet
The data frame contains the following 9 columns.
Batcha Factor determing how many times the volume has been through the still.
Vol1Volume in litres, initial
P1Percent alcohol present, initial
LAA1Litres Absolute Alcohol initial, Vol1*P1
Vol2Volume in litres, final
P2Percent alcohol present, final
LAA2Litres Absolute Alcohol final, Vol2*P2
YieldPercent yield obtained, LAA2/LAA1
DateCharacter, Date of run
The purpose of this information is to determine the optimal initial volume and percentage. The information is broken down by Batch. A batch factor 1 means that it
is the first time the liquid has gone through the spirit still. The first run through the still should have the most loss due to the "heads" and "tails".
Literature states that the first run through a spirit still should yield 70 percent.
A batch factor 2 means that it is the second time the liquid has gone through the spirit still.
A batch factor 3 means that it is the third time or more that the liquid has gone through the spirit still.
Each subsequent distillation should result in a higher yield, never to exceed 95 percent.
Charisse Woods, Endless Summer Distillery, (2015).
summary(Juliet) #Split apart the Batch factor for easier use. juliet<-split(Juliet,Juliet$Batch) juliet1<-juliet$'1' juliet2<-juliet$'2' juliet3<-juliet$'3' plot(LAA1~LAA2,data=Juliet) plot(LAA1~LAA2,data=juliet1)summary(Juliet) #Split apart the Batch factor for easier use. juliet<-split(Juliet,Juliet$Batch) juliet1<-juliet$'1' juliet2<-juliet$'2' juliet3<-juliet$'3' plot(LAA1~LAA2,data=Juliet) plot(LAA1~LAA2,data=juliet1)
The lengthguesses list consists of 2 numeric vectors, one
giving the metric-converted length guesses (in feet) of an auditorium
whose actual length (in meters) was 13.1m, and the other containing
the length guesses of 69 others (in meters).
data(lengthguesses)data(lengthguesses)
This list contains the following columns:
a numeric vector of 69 student guesses as to the length of an auditorium using the imperial system, converted to meters.
a numeric vector of 44 student guesses as to the length of an auditorium using the metric system.
Hills, M. and the M345 Course Team (1986) M345 Statistical Methods, Unit 1: Data, distributions and uncertainty, Milton Keynes: The Open University. Tables 2.1 and 2.4.
Hand, D.J., Daly, F., Lunn, A.D., McConway, K.J. and Ostrowski, E. (1994) A Handbook of Small Data Sets. Boca Raton: Chapman & Hall/CRC.
with(lengthguesses, t.test(imperial, metric))with(lengthguesses, t.test(imperial, metric))
Numbers of aberrant crypt foci (ACF) in each of six cross-sectional regions of the colons of 66 rats subjected to varying doses of the carcinogen azoxymethane (AOM), sacrificed at 3 different times.
lesionslesions
This data frame contains the following columns:
Incubation time factor, levels: 6, 12 and 18 weeks
Number of injections
Section of colon, a factor with levels 1 through 6, where 1 denotes the proximal end of the colon and 6 denotes the distal end
Label for animal within a particular T-INJ factor level combination
Total number of ACF lesions in a section of a rat's colon
Sum of ACF multiplicities for a section of a rat's colon
Identifier for each of the 66 rats.
Ranjana P. Bird, University of Northern British Columbia, Prince George, Canada.
E.A. McLellan, A. Medline and R.P. Bird. Dose response and proliferative characteristics of aberrant crypt foci: putative preneoplastic lesions in rat colon. Carcinogenesis, 12(11): 2093-2098, 1991.
summary(lesions) ACF.All <- aggregate(ACF.Total ~ id + INJ + T, FUN=sum, data = lesions) lesions.glm <- glm(ACF.Total ~ INJ * T, data = ACF.All, family=poisson) summary(lesions.glm) lesions.qp <- glm(ACF.Total ~ INJ * T, data = ACF.All, family=quasipoisson) summary(lesions.qp) lesions.noInt <- glm(ACF.Total ~ INJ + T, data = ACF.All, family=quasipoisson) summary(lesions.noInt)summary(lesions) ACF.All <- aggregate(ACF.Total ~ id + INJ + T, FUN=sum, data = lesions) lesions.glm <- glm(ACF.Total ~ INJ * T, data = ACF.All, family=poisson) summary(lesions.glm) lesions.qp <- glm(ACF.Total ~ INJ * T, data = ACF.All, family=quasipoisson) summary(lesions.qp) lesions.noInt <- glm(ACF.Total ~ INJ + T, data = ACF.All, family=quasipoisson) summary(lesions.noInt)
Confidence interval estimates for bias in local polynomial regression.
LPBias(xy,k1,k2,h,h2,numgrid=401,alpha=.95)LPBias(xy,k1,k2,h,h2,numgrid=401,alpha=.95)
xy |
A data frame, whose first column must be the explanatory variable and whose second column must be the response variable. |
k1 |
degree of local polynomial used in curve estimator. |
k2 |
degree of local polynomial used in bias estimator. |
h |
bandwidth for regression estimator. |
h2 |
bandwidth for bias estimator. |
numgrid |
number of gridpoints used in the curve estimator. |
alpha |
nominal confidence level. |
A list containing the confidence interval limits, pointwise estimates of bias, standard deviation of bias, curve estimate, standard deviation of curve estimate, and approximate confidence limits for curve estimates.
W. John Braun and Wenkai Ma
Noise measurements for 5 samples of motors, each sample based on a different brand of bearing.
data("motor")data("motor")
A data frame with 5 columns.
Brand 1A numeric vector length 6
Brand 2A numeric vector length 6
Brand 3A numeric vector length 6
Brand 4A numeric vector length 6
Brand 5A numeric vector length 6
Devore, J. and N. Farnum (2005) Applied Statistics for Engineers and Scientists. Thomson.
The noisyimage is a list. The third component is
noisy version of the third component of tarimage.
data(noisyimage)data(noisyimage)
This list contains the following elements:
a numeric vector having 101 elements.
a numeric vector having 101 elements.
a numeric matrix having 101 rows and columns
with(noisyimage, image(x, y, xy))with(noisyimage, image(x, y, xy))
The oldwash dataframe has 49 rows and 8 columns.
The data are from the start up of a wash still considering the amount of time it takes to heat up to a specified temperature and possible influencing factors.
data("oldwash")data("oldwash")
A data frame with 49 observations on the following 8 variables.
Datecharacter, the date of the run
startTdegrees Celsius, numeric, initial temperature
endTdegrees Celsius, numeric, final temperature
timein minutes, numeric, amount of time to reach final temperature
Volin litres, numeric, amount of liqiud in the tank (max 2000L)
alcnumeric, the percentage of alcohol present in the liquid
whocharacter, relates to the person who ran the still
batchfactor with levels 1 = first time through, 2 = second time through
The purpose of the wash still is to increase the percentage of alcohol and strip out unwanted particulate. It can take a long time to heat up and this can lead to problems in meeting production time limits.
Charisse Woods, Endless Summer Distillery (2014)
oldwash.lm<-lm(log(time)~startT+endT+Vol+alc+who+batch,data=oldwash) summary(oldwash.lm) par(mfrow=c(2,2)) plot(oldwash.lm) data2<-subset(oldwash,batch==2) hist(data2$time) data1<-subset(oldwash,batch==1) hist(data1$time) oldwash.lmc<-lm(time~startT+endT+Vol+alc+who+batch,data=data1) summary(oldwash.lmc) plot(oldwash.lmc) oldwash.lmd<-lm(time~startT+endT+Vol+alc+who+batch,data=data2) summary(oldwash.lmd) plot(oldwash.lmd)oldwash.lm<-lm(log(time)~startT+endT+Vol+alc+who+batch,data=oldwash) summary(oldwash.lm) par(mfrow=c(2,2)) plot(oldwash.lm) data2<-subset(oldwash,batch==2) hist(data2$time) data1<-subset(oldwash,batch==1) hist(data1$time) oldwash.lmc<-lm(time~startT+endT+Vol+alc+who+batch,data=data1) summary(oldwash.lmc) plot(oldwash.lmc) oldwash.lmd<-lm(time~startT+endT+Vol+alc+who+batch,data=data2) summary(oldwash.lmd) plot(oldwash.lmd)
The p11.12 data frame has 19 observations on satellite cost.
data(p11.12)data(p11.12)
This data frame contains the following columns:
first-unit satellite cost
weight of the electronics suite
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Simpson and Montgomery (1998)
data(p11.12) attach(p11.12) plot(cost~x) detach(p11.12)data(p11.12) attach(p11.12) plot(cost~x) detach(p11.12)
The p11.15 data frame has 9 rows and 2 columns.
data(p11.15)data(p11.15)
This data frame contains the following columns:
a numeric vector
a numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Ryan (1997), Stefanski (1991)
data(p11.15) plot(p11.15) attach(p11.15) lines(lowess(x,y)) detach(p11.15)data(p11.15) plot(p11.15) attach(p11.15) lines(lowess(x,y)) detach(p11.15)
The p12.11 data frame has 44 observations on the fraction
of active chlorine in a chemical product as a function of time
after manufacturing.
data(p12.11)data(p12.11)
This data frame contains the following columns:
time
available chlorine
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p12.11) plot(p12.11) lines(lowess(p12.11))data(p12.11) plot(p12.11) lines(lowess(p12.11))
The p12.12 data frame has 18 observations on an
chemical experiment. A nonlinear model relating concentration to
reaction time and temperature with an additive error is proposed to
fit these data.
data(p12.12)data(p12.12)
This data frame contains the following columns:
reaction time (in minutes)
temperature (in degrees Celsius)
concentration (in grams/liter)
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p12.12) attach(p12.12) # fitting the linearized model logy.lm <- lm(I(log(y))~I(log(x1))+I(log(x2))) summary(logy.lm) plot(logy.lm, which=1) # checking the residuals # fitting the nonlinear model y.nls <- nls(y ~ theta1*I(x1^theta2)*I(x2^theta3), start=list(theta1=.95, theta2=.76, theta3=.21)) summary(y.nls) plot(resid(y.nls)~fitted(y.nls)) # checking the residualsdata(p12.12) attach(p12.12) # fitting the linearized model logy.lm <- lm(I(log(y))~I(log(x1))+I(log(x2))) summary(logy.lm) plot(logy.lm, which=1) # checking the residuals # fitting the nonlinear model y.nls <- nls(y ~ theta1*I(x1^theta2)*I(x2^theta3), start=list(theta1=.95, theta2=.76, theta3=.21)) summary(y.nls) plot(resid(y.nls)~fitted(y.nls)) # checking the residuals
The p12.8 data frame has 14 rows and 2 columns.
data(p12.8)data(p12.8)
This data frame contains the following columns:
a numeric vector
a numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p12.8)data(p12.8)
The p13.1 data frame has 25 observation on the
test-firing results for surface-to-air missiles.
data(p13.1)data(p13.1)
This data frame contains the following columns:
target speed (in Knots)
hit (=1) or miss (=0)
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p13.1)data(p13.1)
The p13.16 data frame has 16 rows and 5 columns.
data(p13.16)data(p13.16)
This data frame contains the following columns:
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p13.16)data(p13.16)
The p13.2 data frame has 20 observations on home ownership.
data(p13.2)data(p13.2)
This data frame contains the following columns:
family income
home ownership (1 = yes, 0 = no)
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p13.2)data(p13.2)
The p13.20 data frame has 30 rows and 2 columns.
data(p13.20)data(p13.20)
This data frame contains the following columns:
a numeric vector
a numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p13.20)data(p13.20)
The p13.3 data frame has 10 observations on the
compressive strength of an alloy fastener used in
aircraft construction.
data(p13.3)data(p13.3)
This data frame contains the following columns:
load (in psi)
sample size
number failing
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p13.3)data(p13.3)
The p13.4 data frame has 11 observations on the
effectiveness of a price discount coupon on the
purchase of a two-litre beverage.
data(p13.4)data(p13.4)
This data frame contains the following columns:
discount
sample size
number redeemed
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p13.4)data(p13.4)
The p13.5 data frame has 20 observations on
new automobile purchases.
data(p13.5)data(p13.5)
This data frame contains the following columns:
income
age of oldest vehicle
new purchase less than 6 months later (1=yes, 0=no)
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p13.5)data(p13.5)
The p13.6 data frame has 15 observations
on the number of failures of a particular type of valve
in a processing unit.
data(p13.6)data(p13.6)
This data frame contains the following columns:
type of valve
number of failures
months
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p13.6)data(p13.6)
The p13.7 data frame has 44 observations on the coal
mines of the Appalachian region of western Virginia.
data(p13.7)data(p13.7)
This data frame contains the following columns:
number of fractures in upper seams of coal mines
inner burden thickness (in feet), shortest distance between seam floor and the lower seam
percent extraction of the lower previously mined seam
lower seam height (in feet)
time that the mine has been in operation (in years)
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Myers (1990)
data(p13.7)data(p13.7)
The p14.1 data frame has 15 rows and 3 columns.
data(p14.1)data(p14.1)
This data frame contains the following columns:
a numeric vector
a numeric vector
a numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p14.1)data(p14.1)
The p14.2 data frame has 18 rows and 3 columns.
data(p14.2)data(p14.2)
This data frame contains the following columns:
a numeric vector
a numeric vector
a numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p14.2)data(p14.2)
The p15.4 data frame has 40 rows and 4 columns.
data(p15.4)data(p15.4)
This data frame contains the following columns:
a numeric vector
a numeric vector
a numeric vector
a factor with levels e and p
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p15.4)data(p15.4)
The p2.10 data frame has 26 observations on weight and
systolic blood pressure for randomly selected males in the 25-30
age group.
data(p2.10)data(p2.10)
This data frame contains the following columns:
in pounds
systolic blood pressure
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p2.10) attach(p2.10) cor.test(weight, sysbp, method="pearson") # tests rho=0 # and computes 95% CI for rho # using Fisher's Z-transformdata(p2.10) attach(p2.10) cor.test(weight, sysbp, method="pearson") # tests rho=0 # and computes 95% CI for rho # using Fisher's Z-transform
The p2.12 data frame has 12 observations on
the number of pounds of steam used per month at a plant and
the average monthly ambient temperature.
data(p2.12)data(p2.12)
This data frame contains the following columns:
ambient temperature (in degrees F)
usage (in thousands of pounds)
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p2.12) attach(p2.12) usage.lm <- lm(usage ~ temp) summary(usage.lm) predict(usage.lm, newdata=data.frame(temp=58), interval="prediction") detach(p2.12)data(p2.12) attach(p2.12) usage.lm <- lm(usage ~ temp) summary(usage.lm) predict(usage.lm, newdata=data.frame(temp=58), interval="prediction") detach(p2.12)
The p2.13 data frame has 16 observations on the number
of days the ozone levels exceeded 0.2 ppm in the
South Coast Air Basin of California for the years 1976 through
1991. It is believed that these levels are related to temperature.
data(p2.13)data(p2.13)
This data frame contains the following columns:
number of days ozone levels exceeded 0.2 ppm
a seasonal meteorological index giving the seasonal average 850 millibar temperature.
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Davidson, A. (1993) Update on Ozone Trends in California's South Coast Air Basin. Air Waste, 43, 226-227.
data(p2.13) attach(p2.13) plot(days~index, ylim=c(-20,130)) ozone.lm <- lm(days ~ index) summary(ozone.lm) # plots of confidence and prediction intervals: ozone.conf <- predict(ozone.lm, interval="confidence") lines(sort(index), ozone.conf[order(index),2], col="red") lines(sort(index), ozone.conf[order(index),3], col="red") ozone.pred <- predict(ozone.lm, interval="prediction") lines(sort(index), ozone.pred[order(index),2], col="blue") lines(sort(index), ozone.pred[order(index),3], col="blue") detach(p2.13)data(p2.13) attach(p2.13) plot(days~index, ylim=c(-20,130)) ozone.lm <- lm(days ~ index) summary(ozone.lm) # plots of confidence and prediction intervals: ozone.conf <- predict(ozone.lm, interval="confidence") lines(sort(index), ozone.conf[order(index),2], col="red") lines(sort(index), ozone.conf[order(index),3], col="red") ozone.pred <- predict(ozone.lm, interval="prediction") lines(sort(index), ozone.pred[order(index),2], col="blue") lines(sort(index), ozone.pred[order(index),3], col="blue") detach(p2.13)
The p2.14 data frame has 8 observations on the molar
ratio of sebacic acid and the intrinsic viscosity of copolyesters.
One is interested in predicting viscosity from the sebacic acid ratio.
data(p2.14)data(p2.14)
This data frame contains the following columns:
molar ratio
viscosity
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Hsuie, Ma, and Tsai (1995) Separation and Characterizations of Thermotropic Copolyesters of p-Hydroxybenzoic Acid, Sebacic Acid and Hydroquinone. Journal of Applied Polymer Science, 56, 471-476.
data(p2.14) attach(p2.14) plot(p2.14, pch=16, ylim=c(0,1)) visc.lm <- lm(visc ~ ratio) summary(visc.lm) visc.conf <- predict(visc.lm, interval="confidence") lines(ratio, visc.conf[,2], col="red") lines(ratio, visc.conf[,3], col="red") visc.pred <- predict(visc.lm, interval="prediction") lines(ratio, visc.pred[,2], col="blue") lines(ratio, visc.pred[,3], col="blue") detach(p2.14)data(p2.14) attach(p2.14) plot(p2.14, pch=16, ylim=c(0,1)) visc.lm <- lm(visc ~ ratio) summary(visc.lm) visc.conf <- predict(visc.lm, interval="confidence") lines(ratio, visc.conf[,2], col="red") lines(ratio, visc.conf[,3], col="red") visc.pred <- predict(visc.lm, interval="prediction") lines(ratio, visc.pred[,2], col="blue") lines(ratio, visc.pred[,3], col="blue") detach(p2.14)
The p2.15 data frame has 8 observations on the impact
of temperature on the viscosity of toluene-tetralin blends.
This particular data set deals with blends with a 0.4 molar
fraction of toluene.
data(p2.15)data(p2.15)
This data frame contains the following columns:
temperature (in degrees Celsius)
viscosity (mPa s)
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Byers and Williams (1987) Viscosities of Binary and Ternary Mixtures of Polynomatic Hydrocarbons. Journal of Chemical and Engineering Data, 32, 349-354.
data(p2.15) attach(p2.15) plot(visc ~ temp, pch=16) visc.lm <- lm(visc ~ temp) plot(visc.lm, which=1) detach(p2.15)data(p2.15) attach(p2.15) plot(visc ~ temp, pch=16) visc.lm <- lm(visc ~ temp) plot(visc.lm, which=1) detach(p2.15)
The p2.16 data frame has 33 observations on the
pressure in a tank the volume of liquid.
data(p2.16)data(p2.16)
This data frame contains the following columns:
volume of liquid
pressure in the tank
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Carroll and Spiegelman (1986) The Effects of Ignoring Small Measurement Errors in Precision Instrument Calibration. Journal of Quality Technology, 18, 170-173.
data(p2.16) attach(p2.16) plot(pressure ~ volume, pch=16) pressure.lm <- lm(pressure ~ volume) plot(pressure.lm, which=1) summary(pressure.lm) detach(p2.16)data(p2.16) attach(p2.16) plot(pressure ~ volume, pch=16) pressure.lm <- lm(pressure ~ volume) plot(pressure.lm, which=1) summary(pressure.lm) detach(p2.16)
The p2.17 data frame has 17 observations on the
boiling point of water (in Fahrenheit degrees)
for various barometric pressures (in inches of mercury).
data(p2.17)data(p2.17)
This data frame contains the following columns:
numeric vector
numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
Atkinson, A.C. (1985) Plots, Transformations and Regression, Clarendon Press, Oxford.
data(p2.17) attach(p2.17) plot(BoilingPoint ~ BarometricPressure, pch=16) detach(p2.17)data(p2.17) attach(p2.17) plot(BoilingPoint ~ BarometricPressure, pch=16) detach(p2.17)
The p2.18 data frame has 21 observations on the
advertising expenses (in millions of US dollars) and retain
impressions (in millions per week)
for various companies.
data(p2.18)data(p2.18)
This data frame contains the following columns:
character vector
numeric vector
numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
data(p2.18) attach(p2.18) plot(Returned.Impressions ~ Amount.Spent, pch=16) detach(p2.18)data(p2.18) attach(p2.18) plot(Returned.Impressions ~ Amount.Spent, pch=16) detach(p2.18)
The p2.7 data frame has 20 observations on the
purity of oxygen produced by a fractionation process. It
is thought that oxygen purity is related to the percentage
of hydrocarbons in the main condensor of the processing
unit.
data(p2.7)data(p2.7)
This data frame contains the following columns:
oxygen purity (percentage)
hydrocarbon (percentage)
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p2.7) attach(p2.7) purity.lm <- lm(purity ~ hydro) summary(purity.lm) # confidence interval for mean purity at 1% hydrocarbon: predict(purity.lm,newdata=data.frame(hydro = 1.00),interval="confidence") detach(p2.7)data(p2.7) attach(p2.7) purity.lm <- lm(purity ~ hydro) summary(purity.lm) # confidence interval for mean purity at 1% hydrocarbon: predict(purity.lm,newdata=data.frame(hydro = 1.00),interval="confidence") detach(p2.7)
The p2.9 data frame has 25 rows and 2 columns. See
help on softdrink for details.
data(p2.9)data(p2.9)
This data frame contains the following columns:
a numeric vector: time
a numeric vector: cases stocked
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p2.9)data(p2.9)
The p4.18 data frame has 13 observations on an
experiment to produce a synthetic analogue to jojoba oil.
data(p4.18)data(p4.18)
This data frame contains the following columns:
reaction temperature
initial amount of catalyst
pressure
yield
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Coteron, Sanchez, Matinez, and Aracil (1993) Optimization of the Synthesis of an Analogue of Jojoba Oil Using a Fully Central Composite Design. Canadian Journal of Chemical Engineering.
data(p4.18) y.lm <- lm(y ~ x1 + x2 + x3, data=p4.18) summary(y.lm) y.lm <- lm(y ~ x1, data=p4.18)data(p4.18) y.lm <- lm(y ~ x1 + x2 + x3, data=p4.18) summary(y.lm) y.lm <- lm(y ~ x1, data=p4.18)
The p4.19 data frame has 14 observations on
a designed experiment studying the relationship
between abrasion index for a tire tread compound
and three factors.
data(p4.19)data(p4.19)
This data frame contains the following columns:
hydrated silica level
silane coupling agent level
sulfur level
abrasion index for a tire tread compound
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Derringer and Suich (1980) Simultaneous Optimization of Several Response Variables. Journal of Quality Technology.
data(p4.19) attach(p4.19) y.lm <- lm(y ~ x1 + x2 + x3) summary(y.lm) plot(y.lm, which=1) y.lm <- lm(y ~ x1) detach(p4.19)data(p4.19) attach(p4.19) y.lm <- lm(y ~ x1 + x2 + x3) summary(y.lm) plot(y.lm, which=1) y.lm <- lm(y ~ x1) detach(p4.19)
The p4.20 data frame has 26 observations
on a designed experiment to determine the influence
of five factors on the whiteness of rayon.
data(p4.20)data(p4.20)
This data frame contains the following columns:
acid bath temperature
cascade acid concentration
water temperature
sulfide concentration
amount of chlorine bleach
a measure of the whiteness of rayon
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Myers and Montgomery (1995) Response Surface Methodology, pp. 267-268.
data(p4.20) y.lm <- lm(y ~ acidtemp, data=p4.20) summary(y.lm)data(p4.20) y.lm <- lm(y ~ acidtemp, data=p4.20) summary(y.lm)
The p5.1 data frame has 8 observations on the impact
of temperature on the viscosity of toluene-tetralin blends.
data(p5.1)data(p5.1)
This data frame contains the following columns:
temperature
viscosity
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Byers and Williams (1987) Viscosities of Binary and Ternary Mixtures of Polyaromatic Hydrocarbons. Journal of Chemical and Engineering Data, 32, 349-354.
data(p5.1) plot(p5.1)data(p5.1) plot(p5.1)
The p5.10 data frame has 27 observations on the
effect of three factors on a printing machine's ability
to apply coloring inks on package labels.
data(p5.10)data(p5.10)
This data frame contains the following columns:
speed
pressure
distance
response 1
response 2
response 3
average response
standard deviation of the 3 responses
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p5.10) attach(p5.10) y.lm <- lm(ybar.i ~ x1 + x2 + x3) plot(y.lm, which=1) detach(p5.10)data(p5.10) attach(p5.10) y.lm <- lm(ybar.i ~ x1 + x2 + x3) plot(y.lm, which=1) detach(p5.10)
The p5.11 data frame has 8 observations on an
experiment with a catapult.
data(p5.11)data(p5.11)
This data frame contains the following columns:
hook
arm length
start angle
stop angle
response 1
response 2
response 3
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p5.11) attach(p5.11) ybar.i <- apply(p5.11[,5:7], 1, mean) sd.i <- apply(p5.11[,5:7], 1, sd) y.lm <- lm(ybar.i ~ x1 + x2 + x3 + x4) plot(y.lm, which=1) detach(p5.11)data(p5.11) attach(p5.11) ybar.i <- apply(p5.11[,5:7], 1, mean) sd.i <- apply(p5.11[,5:7], 1, sd) y.lm <- lm(ybar.i ~ x1 + x2 + x3 + x4) plot(y.lm, which=1) detach(p5.11)
The p5.12 data frame has 27 observations on 9
variables.
data(p5.12)data(p5.12)
This data frame contains the following columns:
a numeric vector
a numeric vector
a numeric vector
a numeric vector
response 1
response 2
response 3
a numeric vector
a numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p5.11) attach(p5.11) ybar.i <- apply(p5.11[,5:7], 1, mean) sd.i <- apply(p5.11[,5:7], 1, sd) y.lm <- lm(ybar.i ~ x1 + x2 + x3 + x4) plot(y.lm, which=1) detach(p5.11)data(p5.11) attach(p5.11) ybar.i <- apply(p5.11[,5:7], 1, mean) sd.i <- apply(p5.11[,5:7], 1, sd) y.lm <- lm(ybar.i ~ x1 + x2 + x3 + x4) plot(y.lm, which=1) detach(p5.11)
The p5.2 data frame has 11 observations on the vapor
pressure of water for various temperatures.
data(p5.2)data(p5.2)
This data frame contains the following columns:
temperature (K)
vapor pressure (mm Hg)
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p5.2) plot(p5.2)data(p5.2) plot(p5.2)
The p5.3 data frame has 12 observations on the
number of bacteria surviving in a canned food product and the
number of minutes of exposure to 300 degree Fahrenheit heat.
data(p5.3)data(p5.3)
This data frame contains the following columns:
number of surviving bacteria
number of minutes of exposure
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p5.3) plot(bact~min, data=p5.3)data(p5.3) plot(bact~min, data=p5.3)
The p5.4 data frame has 8 observations on 2 variables.
data(p5.4)data(p5.4)
This data frame contains the following columns:
a numeric vector
a numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p5.4) plot(y ~ x, data=p5.4)data(p5.4) plot(y ~ x, data=p5.4)
The p5.5 data frame has 14 observations on the average
number of defects per 10000 bottles due to stones in the bottle
wall and the number of weeks since the last furnace overhaul.
data(p5.5)data(p5.5)
This data frame contains the following columns:
a numeric vector
a numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p5.5) defects.lm <- lm(defects~weeks, data=p5.5) plot(defects.lm, which=1)data(p5.5) defects.lm <- lm(defects~weeks, data=p5.5) plot(defects.lm, which=1)
The p7.1 data frame has 10 observations on a predictor variable.
data(p7.1)data(p7.1)
This data frame contains the following columns:
a numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p7.1) attach(p7.1) x2 <- x^2 detach(p7.1)data(p7.1) attach(p7.1) x2 <- x^2 detach(p7.1)
The p7.11 data frame has 11 observations on production cost
versus production lot size.
data(p7.11)data(p7.11)
This data frame contains the following columns:
production lot size
average production cost per unit
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p7.11) plot(y ~ x, data=p7.11)data(p7.11) plot(y ~ x, data=p7.11)
The p7.15 data frame has 6 observations
on vapor pressure of water at various temperatures.
data(p7.15)data(p7.15)
This data frame contains the following columns:
vapor pressure (mm Hg)
temperature (degrees Celsius)
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p7.15) y.lm <- lm(y ~ x, data=p7.15) plot(y ~ x, data=p7.15) abline(coef(y.lm)) plot(y.lm, which=1)data(p7.15) y.lm <- lm(y ~ x, data=p7.15) plot(y ~ x, data=p7.15) abline(coef(y.lm)) plot(y.lm, which=1)
The p7.16 data frame has 26 observations on the
observed mole fraction solubility of a solute at a
constant temperature.
data(p7.16)data(p7.16)
This data frame contains the following columns:
negative logarithm of the mole fraction solubility
dispersion partial solubility
dipolar partial solubility
hydrogen bonding Hansen partial solubility
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
(1991) Journal of Pharmaceutical Sciences 80, 971-977.
data(p7.16) pairs(p7.16)data(p7.16) pairs(p7.16)
The p7.19 data frame has 10 observations on the concentration
of green liquor and paper machine speed from a kraft paper
machine.
data(p7.19)data(p7.19)
This data frame contains the following columns:
green liquor (g/l)
paper machine speed (ft/min)
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
(1986) Tappi Journal.
data(p7.19) y.lm <- lm(y ~ x + I(x^2), data=p7.19) summary(y.lm)data(p7.19) y.lm <- lm(y ~ x + I(x^2), data=p7.19) summary(y.lm)
The p7.2 data frame has 10 observations on solid-fuel
rocket propellant weight loss.
data(p7.2)data(p7.2)
This data frame contains the following columns:
months since production
weight loss (kg)
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p7.2) y.lm <- lm(y ~ x + I(x^2), data=p7.2) summary(y.lm) plot(y ~ x, data=p7.2)data(p7.2) y.lm <- lm(y ~ x + I(x^2), data=p7.2) summary(y.lm) plot(y ~ x, data=p7.2)
The p7.4 data frame has 12 observations on two variables.
data(p7.4)data(p7.4)
This data frame contains the following columns:
a numeric vector
a numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p7.4) y.lm <- lm(y ~ x + I(x^2), data = p7.4) summary(y.lm)data(p7.4) y.lm <- lm(y ~ x + I(x^2), data = p7.4) summary(y.lm)
The p7.6 data frame has 12 observations on softdrink
carbonation.
data(p7.6)data(p7.6)
This data frame contains the following columns:
carbonation
temperature
pressure
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p7.6) y.lm <- lm(y ~ x1 + I(x1^2) + x2 + I(x2^2) + I(x1*x2), data=p7.6) summary(y.lm)data(p7.6) y.lm <- lm(y ~ x1 + I(x1^2) + x2 + I(x2^2) + I(x1*x2), data=p7.6) summary(y.lm)
The p8.11 data frame has 25 observations on the tensile
strength of synthetic fibre used for men's shirts.
data(p8.11)data(p8.11)
This data frame contains the following columns:
tensile strength
percentage of cotton
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Montgomery (2001)
data(p8.11) y.lm <- lm(y ~ percent, data=p8.11) model.matrix(y.lm)data(p8.11) y.lm <- lm(y ~ percent, data=p8.11) model.matrix(y.lm)
The p8.3 data frame has 25 observations on delivery
times taken by a vending machine route driver.
data(p8.3)data(p8.3)
This data frame contains the following columns:
delivery time (in minutes)
number of cases of product stocked
distance walked by route driver
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(p8.3) pairs(p8.3)data(p8.3) pairs(p8.3)
The p9.10 data frame has 31 observations
on the rut depth of asphalt pavements prepared under
different conditions.
data(p9.10)data(p9.10)
This data frame contains the following columns:
change in rut depth/million wheel passes (log scale)
viscosity (log scale)
percentage of asphalt in surface course
percentage of asphalt in base course
indicator
percentage of fines in surface course
percentage of voids in surface course
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Gorman and Toman (1966)
data(p9.10) pairs(p9.10)data(p9.10) pairs(p9.10)
Artificial regression data which causes stepwise regression with AIC to produce a highly non-parsimonious model. The true model used to simulate the data has only one real predictor (x8).
pathoegpathoeg
This data frame contains the following columns:
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
Padding an unstacked data frame with missing values to ensure equal length vectors in resulting list. This list is then coerced into a data frame for ease of producing tables.
postunstack(x, form, ...)postunstack(x, form, ...)
x |
A list or data frame to be stacked or unstacked. |
form |
a two-sided formula whose left side evaluates to the vector to be unstacked and whose right side evaluates to the indicator of the groups to create. Defaults to 'formula(x)' in the data frame method for 'unstack'. |
... |
further arguments passed to or from other methods. |
a data frame of columns according to the formula 'form'. If the columns do not all have the same length, the resulting list is coerced to a data frame by padding with missing values.
W. John Braun
Computation of Allen's PRESS statistic for an lm object.
PRESS(x)PRESS(x)
x |
An |
Allen's PRESS statistic.
W.J. Braun
lm
data(p4.18) attach(p4.18) y.lm <- lm(y ~ x1 + I(x1^2)) PRESS(y.lm) detach(p4.18)data(p4.18) attach(p4.18) y.lm <- lm(y ~ x1 + I(x1^2)) PRESS(y.lm) detach(p4.18)
This function is used to display the weight of the evidence against null main effects in data coming from a 1 factor design, using a QQ plot. In practice this method is often called via the function GANOVA.
qqANOVA(x, y, plot.it = TRUE, xlab = deparse(substitute(x)), ylab = deparse(substitute(y)), ...)qqANOVA(x, y, plot.it = TRUE, xlab = deparse(substitute(x)), ylab = deparse(substitute(y)), ...)
x |
numeric vector of errors |
y |
numeric vector of scaled responses |
plot.it |
logical vector indicating whether to plot or not |
xlab |
character, x-axis label |
ylab |
character, y-axis label |
... |
any other arguments for the plot function |
A QQ plot is drawn.
W. John Braun
Overlays a quadratic curve to a fitted quadratic model.
quadline(lm.obj, ...)quadline(lm.obj, ...)
lm.obj |
A |
... |
Other arguments to the |
The function superimposes a quadratic curve onto an existing scatterplot.
W.J. Braun
lm
data(p4.18) attach(p4.18) y.lm <- lm(y ~ x1 + I(x1^2)) plot(x1, y) quadline(y.lm) detach(p4.18)data(p4.18) attach(p4.18) y.lm <- lm(y ~ x1 + I(x1^2)) plot(x1, y) quadline(y.lm) detach(p4.18)
This function analyzes regression data graphically. It allows visualization of the usual F-test for significance of regression.
Qyplot(X, y, plotIt=TRUE, sortTrt=FALSE, type="hist", includeIntercept=TRUE, labels=FALSE)Qyplot(X, y, plotIt=TRUE, sortTrt=FALSE, type="hist", includeIntercept=TRUE, labels=FALSE)
X |
The design matrix. |
y |
A numeric vector containing the response. |
plotIt |
Logical: if TRUE, a graph is drawn. |
sortTrt |
Logical: if TRUE, an attempt is made at sorting the predictor effects in descending order. |
type |
"QQ" or "hist" |
includeIntercept |
Logical: if TRUE, the intercept effect is plotted; otherwise, it is omitted from the plot. |
labels |
logical: if TRUE, names of predictor variables are used as labels; otherwise, the design matrix column numbers are used as labels |
A QQ-plot or a histogram and rugplot, or a list if plotIt=FALSE
W. John Braun
Braun, W.J. 2013. Regression Analysis and the QR Decomposition. Preprint.
# Example 1 X <- p4.18[,-4] y <- p4.18[,4] Qyplot(X, y, type="hist", includeIntercept=FALSE) title("Evidence of Regression in the Jojoba Oil Data") # Example 2 set.seed(4571) Z <- matrix(rnorm(400), ncol=10) A <- matrix(rnorm(81), ncol=9) simdata <- data.frame(Z[,1], crossprod(t(Z[,-1]),A)) names(simdata) <- c("y", paste("x", 1:9, sep="")) Qyplot(simdata[,-1], simdata[,1], type="hist", includeIntercept=FALSE) title("Evidence of Regression in Simulated Data Set") # Example 3 Qyplot(table.b1[,-1], table.b1[,1], type="hist", includeIntercept=FALSE) title("Evidence of Regression in NFL Data Set") # An example where stepwise AIC selects the complement # of the set of variables that are actually in the true model: X <- pathoeg[,-10] y <- pathoeg[,10] par(mfrow=c(2,2)) Qyplot(X, y) Qyplot(X, y, sortTrt=TRUE) Qyplot(X, y, type="QQ") Qyplot(X, y, sortTrt=TRUE, type="QQ") X <- table.b1[,-1] # NFL data y <- table.b1[,1] Qyplot(X, y)# Example 1 X <- p4.18[,-4] y <- p4.18[,4] Qyplot(X, y, type="hist", includeIntercept=FALSE) title("Evidence of Regression in the Jojoba Oil Data") # Example 2 set.seed(4571) Z <- matrix(rnorm(400), ncol=10) A <- matrix(rnorm(81), ncol=9) simdata <- data.frame(Z[,1], crossprod(t(Z[,-1]),A)) names(simdata) <- c("y", paste("x", 1:9, sep="")) Qyplot(simdata[,-1], simdata[,1], type="hist", includeIntercept=FALSE) title("Evidence of Regression in Simulated Data Set") # Example 3 Qyplot(table.b1[,-1], table.b1[,1], type="hist", includeIntercept=FALSE) title("Evidence of Regression in NFL Data Set") # An example where stepwise AIC selects the complement # of the set of variables that are actually in the true model: X <- pathoeg[,-10] y <- pathoeg[,10] par(mfrow=c(2,2)) Qyplot(X, y) Qyplot(X, y, sortTrt=TRUE) Qyplot(X, y, type="QQ") Qyplot(X, y, sortTrt=TRUE, type="QQ") X <- table.b1[,-1] # NFL data y <- table.b1[,1] Qyplot(X, y)
Percentage of radon from water released in showers with orifices of various diameters. Four replicates were obtained, but it should be noted that the temperatures for the replicates (in degrees Celsius) are 21, 30, 38, and 46, respectively. This information should really be accounted for in any serious analysis of the data.
data("radon")data("radon")
A data frame with 15 observations on the following 2 variables.
diametershower orifice diameter in mm
rep 1percentage radon released in first run
rep 2percentage radon released in second run
rep 3percentage radon released in third run
rep 4percentage radon released in fourth run
Hazin, C.A. and Eichholz, G.G. (1992) Influence of Water Temperature and Shower Head Orifice Size on the Release of Radon During Showering, Environment International, 18, 363-369.
Observations of heights, widths and diagonal lengths of several rectangular objects, such as books, photographs, and so on were measured. Only the data in MPV versions 1.62 and later can be trusted; there were errors in the third column in previous versions.
rectanglesrectangles
A data frame with 51 observations on the following 4 variables.
hnumeric, heights in centimeters
wnumeric, widths in centimeters
dnumeric, diagonal lengths in centimeters
indexnumeric, sum of squares of heights and widths
x <- sqrt(rectangles$index) y <- rectangles$d y.lp <- locpoly(x, y, bandwidth=dpill(x,y), degree=1) plot(y ~ x) lines(y.lp, col=2, lty=2) abline(0,1) # y = x + measurement error plot(y.lp$y - y.lp$x, type="l", col=2)x <- sqrt(rectangles$index) y <- rectangles$d y.lp <- locpoly(x, y, bandwidth=dpill(x,y), degree=1) plot(y ~ x) lines(y.lp, col=2, lty=2) abline(0,1) # y = x + measurement error plot(y.lp$y - y.lp$x, type="l", col=2)
The seismictimings data frame has 504 rows and 3 columns.
Thickness of a layer of Alberta substratum as measured by
several transects of geophones.
seismictimingsseismictimings
This data frame contains the following columns:
longitudinal coordinate of geophone.
latitudinal coordinate of geophone.
time for signal to pass through substratum.
plot(y ~ x, data = seismictimings)plot(y ~ x, data = seismictimings)
The softdrink data frame has 25 rows and 3 columns.
data(softdrink)data(softdrink)
This data frame contains the following columns:
a numeric vector
a numeric vector
a numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(softdrink)data(softdrink)
Percent soil moisture measurements at 26 different locations in a forest in southwestern British Columbia. Some of the locations were in stands that had been thinned.
data("soilstudy")data("soilstudy")
A data frame with 26 observations on the following 3 variables.
locationcharacter vector identifying forest stand
moisturenumeric vector, percentage moisture content
treatmentcharacter vector identifying fuel treatment: thinned or unthinned
Millikin, R.L., Braun, W.J., Alexander, M.E., Fani, S. (2024), The Impact of Fuel Thinning on the Microclimate in Coastal Rainforest Stands of Southwestern British Columbia, Canada. Fire. Vol 7(8), 2024, pp 285-309.
The solar data frame has 29 rows and 6 columns.
data(solar)data(solar)
This data frame contains the following columns:
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(solar)data(solar)
Data on an experiment to remove ketchup stains from white cotton
fabric by soaking the stained fabric in one of five substrates for
one hour. Remaining stains were scored visually and subjectively
according to a 6-point scale (0 = completely clean, 5 = no change)
The stain data frame has 15 rows and 2 columns.
data(stain)data(stain)
This data frame contains the following columns:
a factor
a numeric vector
data(stain)data(stain)
The table.b1 data frame has 28 observations on National
Football League 1976 Team Performance.
data(table.b1)data(table.b1)
This data frame contains the following columns:
Games won in a 14 game season
Rushing yards
Passing yards
Punting average (yards/punt)
Field Goal Percentage (FGs made/FGs attempted)
Turnover differential (turnovers acquired - turnovers lost)
Penalty yards
Percent rushing (rushing plays/total plays)
Opponents' rushing yards
Opponents' passing yards
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(table.b1) attach(table.b1) y.lm <- lm(y ~ x2 + x7 + x8) summary(y.lm) # over-all F-test: y.null <- lm(y ~ 1) anova(y.null, y.lm) # partial F-test for x7: y7.lm <- lm(y ~ x2 + x8) anova(y7.lm, y.lm) detach(table.b1)data(table.b1) attach(table.b1) y.lm <- lm(y ~ x2 + x7 + x8) summary(y.lm) # over-all F-test: y.null <- lm(y ~ 1) anova(y.null, y.lm) # partial F-test for x7: y7.lm <- lm(y ~ x2 + x8) anova(y7.lm, y.lm) detach(table.b1)
The table.b10 data frame has 40 observations
on kinematic viscosity of a certain solvent system.
data(table.b10)data(table.b10)
This data frame contains the following columns:
Ratio of 2-methoxyethanol to 1,2-dimethoxyethane
Temperature (in degrees Celsius)
Kinematic viscosity (.000001 m2/s
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Viscosimetric Studies on 2-Methoxyethanol + 1, 2-Dimethoxyethane Binary Mixtures from -10 to 80C. Canadian Journal of Chemical Engineering, 75, 494-501.
data(table.b10) attach(table.b10) y.lm <- lm(y ~ x1 + x2) summary(y.lm) detach(table.b10)data(table.b10) attach(table.b10) y.lm <- lm(y ~ x1 + x2) summary(y.lm) detach(table.b10)
The table.b11 data frame has 38 observations on the
quality of Pinot Noir wine.
data(table.b11)data(table.b11)
This data frame contains the following columns:
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(table.b11) attach(table.b11) Quality.lm <- lm(Quality ~ Clarity + Aroma + Body + Flavor + Oakiness + factor(Region)) summary(Quality.lm) detach(table.b11)data(table.b11) attach(table.b11) Quality.lm <- lm(Quality ~ Clarity + Aroma + Body + Flavor + Oakiness + factor(Region)) summary(Quality.lm) detach(table.b11)
The table.b12 data frame has 32 rows and 6 columns.
data(table.b12)data(table.b12)
This data frame contains the following columns:
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(table.b12)data(table.b12)
The table.b13 data frame has 40 rows and 7 columns.
data(table.b13)data(table.b13)
This data frame contains the following columns:
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(table.b13)data(table.b13)
The table.b14 data frame has 25 observations on the transient
points of an electronic inverter.
data(table.b14)data(table.b14)
This data frame contains the following columns:
width of the NMOS Device
length of the NMOS Device
width of the PMOS Device
length of the PMOS Device
a numeric vector
transient point of PMOS-NMOS Inverters
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(table.b14) y.lm <- lm(y ~ x1 + x2 + x3 + x4, data=table.b14) plot(y.lm, which=1)data(table.b14) y.lm <- lm(y ~ x1 + x2 + x3 + x4, data=table.b14) plot(y.lm, which=1)
The table.b15 data frame has 60 observations on the mortality, environment, and demographic variables for a sample of American cities.
data(table.b15)data(table.b15)
This data frame contains the following columns:
character vector
numeric vector, age-adjusted mortality from all causes per 100000
numeric vector, precipitation in inches
numeric vector, median number of school years completed
numeric vector, percentage of 1960 population that is nonwhite
numeric vector, relative pollution potential of nitrous oxides
numeric vector, relative pollution potential of sulfur dioxide
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
McDonald, G. C. and Ayers, J.A. [1978], "Some applications of Chernoff faces: A technique for graphically representing multivariate data", in Graphical Representation of Multivariate Data, Academic Press, New York.
data(table.b15) pairs(table.b15[,-1])data(table.b15) pairs(table.b15[,-1])
The table.b16 data frame has 38 observations on 6 variables. Each observation
corresponds to an individual country.
data(table.b16)data(table.b16)
This data frame contains the following columns:
character vector
numeric vector, in years
numeric vector
numeric vector
numeric vector, in years
numeric vector, in years
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
The table.b17 data frame has 25 observations on 5 variables.
data(table.b17)data(table.b17)
This data frame contains the following columns:
numeric vector
numeric vector, in years
numeric vector
numeric vector
numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
The table.b18 data frame has 16 observations on 9 variables.
data(table.b18)data(table.b18)
This data frame contains the following columns:
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
The table.b19 data frame has 32 observations on 11 variables.
data(table.b19)data(table.b19)
This data frame contains the following columns:
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
The table.b2 data frame has 29 rows and 6 columns.
data(table.b2)data(table.b2)
This data frame contains the following columns:
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
data(table.b2)data(table.b2)
The table.b20 data frame has 18 observations on 6 variables.
data(table.b20)data(table.b20)
This data frame contains the following columns:
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
pairs(table.b20)pairs(table.b20)
The table.b22 data frame has 30 observations on 12 variables.
data(table.b22)data(table.b22)
This data frame contains the following columns:
character vector
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
pairs(table.b22[,-1])pairs(table.b22[,-1])
The table.b23 data frame has 59 observations on 8 variables.
data(table.b23)data(table.b23)
This data frame contains the following columns:
character vector
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
character vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
The table.b24 data frame has 51 observations on 6 variables.
data(table.b24)data(table.b24)
This data frame contains the following columns:
character vector
numeric vector
numeric vector
numeric vector
numeric vector
numeric vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
The table.b25 data frame has 50 observations on 6 variables.
data(table.b25)data(table.b25)
This data frame contains the following columns:
character vector
numeric vector
character vector
character vector
character vector
character vector
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
The table.b3 data frame has observations on gasoline
mileage performance for 32 different automobiles.
data(table.b3)data(table.b3)
This data frame contains the following columns:
Miles/gallon
Displacement (cubic in)
Horsepower (ft-lb)
Torque (ft-lb)
Compression ratio
Rear axle ratio
Carburetor (barrels)
No. of transmission speeds
Overall length (in)
Width (in)
Weight (lb)
Type of transmission (1=automatic, 0=manual)
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Motor Trend, 1975
data(table.b3) attach(table.b3) y.lm <- lm(y ~ x1 + x6) summary(y.lm) # testing for the significance of the regression: y.null <- lm(y ~ 1) anova(y.null, y.lm) # 95% CI for mean gas mileage: predict(y.lm, newdata=data.frame(x1=275, x6=2), interval="confidence") # 95% PI for gas mileage: predict(y.lm, newdata=data.frame(x1=275, x6=2), interval="prediction") detach(table.b3)data(table.b3) attach(table.b3) y.lm <- lm(y ~ x1 + x6) summary(y.lm) # testing for the significance of the regression: y.null <- lm(y ~ 1) anova(y.null, y.lm) # 95% CI for mean gas mileage: predict(y.lm, newdata=data.frame(x1=275, x6=2), interval="confidence") # 95% PI for gas mileage: predict(y.lm, newdata=data.frame(x1=275, x6=2), interval="prediction") detach(table.b3)
The table.b4 data frame has 24 observations on property
valuation.
data(table.b4)data(table.b4)
This data frame contains the following columns:
sale price of the house (in thousands of dollars)
taxes (in thousands of dollars)
number of baths
lot size (in thousands of square feet)
living space (in thousands of square feet)
number of garage stalls
number of rooms
number of bedrooms
age of the home (in years)
number of fireplaces
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Narula, S.C. and Wellington (1980) Prediction, Linear Regression and Minimum Sum of Relative Errors. Technometrics, 19, 1977.
data(table.b4) attach(table.b4) y.lm <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9) summary(y.lm) detach(table.b4)data(table.b4) attach(table.b4) y.lm <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9) summary(y.lm) detach(table.b4)
The table.b5 data frame has 27 observations on liquefaction.
data(table.b5)data(table.b5)
This data frame contains the following columns:
CO2
Space time (in min)
Temperature (in degrees Celsius)
Percent solvation
Oil yield (g/100g MAF)
Coal total
Solvent total
Hydrogen consumption
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
(1978) Belle Ayr Liquefaction Runs with Solvent. Industrial Chemical Process Design Development, 17, 3.
data(table.b5) attach(table.b5) y.lm <- lm(y ~ x6 + x7) summary(y.lm) detach(table.b5)data(table.b5) attach(table.b5) y.lm <- lm(y ~ x6 + x7) summary(y.lm) detach(table.b5)
The table.b6 data frame has 28 observations on
a tube-flow reactor.
data(table.b6)data(table.b6)
This data frame contains the following columns:
Nb0Cl3 concentration (g-mol/l)
COCl2 concentration (g-mol/l)
Space time (s)
Molar density (g-mol/l)
Mole fraction CO2
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
(1972) Kinetics of Chlorination of Niobium oxychloride by Phosgene in a Tube-Flow Reactor. Industrial and Engineering Chemistry, Process Design Development, 11(2).
data(table.b6) # Partial Solution to Problem 3.9 attach(table.b6) y.lm <- lm(y ~ x1 + x4) summary(y.lm) detach(table.b6)data(table.b6) # Partial Solution to Problem 3.9 attach(table.b6) y.lm <- lm(y ~ x1 + x4) summary(y.lm) detach(table.b6)
The table.b7 data frame has 16 observations on
oil extraction from peanuts.
data(table.b7)data(table.b7)
This data frame contains the following columns:
CO2 pressure (bar)
CO2 temperature (in degrees Celsius)
peanut moisture (percent by weight)
CO2 flow rate (L/min)
peanut particle size (mm)
total oil yield
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Kilgo, M.B. An Application of Fractional Experimental Designs. Quality Engineering, 1, 19-23.
data(table.b7) attach(table.b7) # partial solution to Problem 3.11: peanuts.lm <- lm(y ~ x1 + x2 + x3 + x4 + x5) summary(peanuts.lm) detach(table.b7)data(table.b7) attach(table.b7) # partial solution to Problem 3.11: peanuts.lm <- lm(y ~ x1 + x2 + x3 + x4 + x5) summary(peanuts.lm) detach(table.b7)
The table.b8 data frame has 36 observations on Clathrate
formation.
data(table.b8)data(table.b8)
This data frame contains the following columns:
Amount of surfactant (mass percentage)
Time (min)
Clathrate formation (mass percentage)
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Tanii, T., Minemoto, M., Nakazawa, K., and Ando, Y. Study on a Cool Storage System Using HCFC-14 lb Clathrate. Canadian Journal of Chemical Engineering, 75, 353-360.
data(table.b8) attach(table.b8) clathrate.lm <- lm(y ~ x1 + x2) summary(clathrate.lm) detach(table.b8)data(table.b8) attach(table.b8) clathrate.lm <- lm(y ~ x1 + x2) summary(clathrate.lm) detach(table.b8)
The table.b9 data frame has 62 observations on an
experimental pressure drop.
data(table.b9)data(table.b9)
This data frame contains the following columns:
Superficial fluid velocity of the gas (cm/s)
Kinematic viscosity
Mesh opening (cm)
Dimensionless number relating superficial fluid velocity of the gas to the superficial fluid velocity of the liquid
Dimensionless factor for the pressure drop through a bubble cap
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Liu, C.H., Kan, M., and Chen, B.H. A Correlation of Two-Phase Pressure Drops in Screen-Plate Bubble Column. Canadian Journal of Chemical Engineering, 71, 460-463.
data(table.b9) attach(table.b9) # Partial Solution to Problem 3.13: y.lm <- lm(y ~ x1 + x2 + x3 + x4) summary(y.lm) detach(table.b9)data(table.b9) attach(table.b9) # Partial Solution to Problem 3.13: y.lm <- lm(y ~ x1 + x2 + x3 + x4) summary(y.lm) detach(table.b9)
The tarimage is a list.
Most of the values are 0, but there are small regions of 1's.
data(tarimage)data(tarimage)
This list contains the following elements:
a numeric vector having 101 elements.
a numeric vector having 101 elements.
a numeric matrix having 101 rows and columns
with(tarimage, image(x, y, xy))with(tarimage, image(x, y, xy))
This function analyzes regression data graphically. It allows visualization of the usual t-tests for individual regression coefficients.
tplot(X, y, plotIt=TRUE, type="hist", includeIntercept=TRUE)tplot(X, y, plotIt=TRUE, type="hist", includeIntercept=TRUE)
X |
The design matrix. |
y |
A numeric vector containing the response. |
plotIt |
Logical: if TRUE, a graph is drawn. |
type |
"QQ" or "hist" |
includeIntercept |
Logical: if TRUE, the intercept effect is plotted; otherwise, it is omitted from the plot. |
A QQ-plot or a histogram and rugplot, or a list if plotIt=FALSE
W. John Braun
# Jojoba oil data set X <- p4.18[,-4] y <- p4.18[,4] tplot(X, y, type="hist", includeIntercept=FALSE) title("Tests for Individual Coefficients in the Jojoba Oil Regression") # Simulated data set where none of the predictors are in the true model: set.seed(4571) Z <- matrix(rnorm(400), ncol=10) A <- matrix(rnorm(81), ncol=9) simdata <- data.frame(Z[,1], crossprod(t(Z[,-1]),A)) names(simdata) <- c("y", paste("x", 1:9, sep="")) X <- simdata[,-1] y <- simdata[,1] tplot(X, y, type="hist", includeIntercept=FALSE) title("Tests for Individual Coefficients for the Simulated Data Set") # NFL Data set: X <- table.b1[,-1] y <- table.b1[,1] tplot(X, y, type="hist", includeIntercept=FALSE) title("Tests for Individual Coefficients for the NFL Data Set") # Simulated Data set where x8 is the only predictor in the true model: X <- pathoeg[,-10] y <- pathoeg[,10] par(mfrow=c(2,2)) tplot(X, y) tplot(X, y, type="QQ")# Jojoba oil data set X <- p4.18[,-4] y <- p4.18[,4] tplot(X, y, type="hist", includeIntercept=FALSE) title("Tests for Individual Coefficients in the Jojoba Oil Regression") # Simulated data set where none of the predictors are in the true model: set.seed(4571) Z <- matrix(rnorm(400), ncol=10) A <- matrix(rnorm(81), ncol=9) simdata <- data.frame(Z[,1], crossprod(t(Z[,-1]),A)) names(simdata) <- c("y", paste("x", 1:9, sep="")) X <- simdata[,-1] y <- simdata[,1] tplot(X, y, type="hist", includeIntercept=FALSE) title("Tests for Individual Coefficients for the Simulated Data Set") # NFL Data set: X <- table.b1[,-1] y <- table.b1[,1] tplot(X, y, type="hist", includeIntercept=FALSE) title("Tests for Individual Coefficients for the NFL Data Set") # Simulated Data set where x8 is the only predictor in the true model: X <- pathoeg[,-10] y <- pathoeg[,10] par(mfrow=c(2,2)) tplot(X, y) tplot(X, y, type="QQ")
A random sample of observations taken from the 'Loblolly' data frame, one per Seed.
data("tree.sample")data("tree.sample")
A data frame with 12 observations on the following 2 variables.
heighttree heights (ft)
agetree ages (yr)
This function graphically displays the coefficient multipliers used in the Regression Plot for the given predictor.
Uplot(X.qr, Xcolumn = 1, ...)Uplot(X.qr, Xcolumn = 1, ...)
X.qr |
The design matrix or the QR decomposition of the design matrix. |
Xcolumn |
The column(s) of the design matrix under study; this can be either integer valued or a character string. |
... |
Additional arguments to barchart. |
A bar plot is displayed.
W. John Braun
# Jojoba oil data set X <- p4.18[,-4] Uplot(X, 1:4) # NFL data set; see GFplot result first X <- table.b1[,-1] Uplot(X, c(2,3,9)) # In this example, x8 is the only predictor in # the true model: X <- pathoeg[,-10] y <- pathoeg[,10] pathoeg.F <- GFplot(X, y, plotIt=FALSE) Uplot(X, "x8") Uplot(X, 9) # same as above Uplot(pathoeg.F$QR, 9) # same as above X <- table.b1[,-1] Uplot(X, c("x2", "x3", "x9"))# Jojoba oil data set X <- p4.18[,-4] Uplot(X, 1:4) # NFL data set; see GFplot result first X <- table.b1[,-1] Uplot(X, c(2,3,9)) # In this example, x8 is the only predictor in # the true model: X <- pathoeg[,-10] y <- pathoeg[,10] pathoeg.F <- GFplot(X, y, plotIt=FALSE) Uplot(X, "x8") Uplot(X, 9) # same as above Uplot(pathoeg.F$QR, 9) # same as above X <- table.b1[,-1] Uplot(X, c("x2", "x3", "x9"))
Measurements in centimeters of the widths of a random collection of books.
widthswidths
A numeric vector of length 24.
The windWin80 data frame has 366 observations on midnight and noon windspeed
at the Winnipeg International Airport for the year 1980.
data(windWin80)data(windWin80)
This data frame contains the following columns:
a numeric vector containing the wind speeds at midnight.
a numeric vector containing the wind spees at the following noon.
data(windWin80) ts.plot(windWin80$h12^2)data(windWin80) ts.plot(windWin80$h12^2)
The Wpgtemp data frame has 7671 observations on
daily maximum temperatures at the Winnipeg International Airport for the years 1960
through 1980.
data(Wpgtemp)data(Wpgtemp)
This data frame contains the following columns:
A numeric vector containing the temperatures in degrees Celsius
A numeric vector denoting the observation date in numbers of days after December 31, 1959
Environment Canada
summary(Wpgtemp)summary(Wpgtemp)
Daily observations taken from 2012 through 2021 on temperature, rain, snow and wind for Fort Frances, Kenora and Dryden, Ontario.
wxNWOwxNWO
A data frame with 10959 observations on the following 31 variables.
Longitudenumeric
Latitudenumeric
Station.Namecharacter
Climate.IDnumeric
Date.Timenumeric
Yearnumeric
Monthnumeric
Daynumeric
Data.Qualitynumeric
Max.Tempnumeric
Max.Temp.Flagnumeric
Min.Tempnumeric
Min.Temp.Flagnumeric
Mean.Tempnumeric
Mean.Temp.Flagnumeric
Heat.Deg.Daysnumeric
Heat.Deg.Days.Flagnumeric
Cool.Deg.Daysnumeric
Cool.Deg.Days.Flagnumeric
Total.Rainnumeric
Total.Rain.Flagnumeric
Total.Snownumeric
Total.Snow.Flagnumeric
Total.Precipnumeric
Total.Precip.Flagnumeric
Snow.on.Groundnumeric
Snow.on.Ground.Flagnumeric
Dir.of.Max.Gustnumeric
Dir.of.Max.Gust.Flagnumeric
Speed.of.Max.Gustnumeric
Speed.of.Max.Gust.Flagnumeric
Environment Canada