Contents
preface
PART ONE
SIMPLE LINEAR REGRESSION 1
Chapter 1
Linear Regression with One Predictor
Variable 2
1.1 Relations between Variables 2
Functional Relation between Two
Variables 2
Statistical Relation between Two Variables 3
1.2 Regression Models and Their Uses 5
Historical Origins 5
Basic Concepts 5
Construction of Regression Models 7
Uses of Regression Analysis 8
Regression and Causality 8
Use of Computers 9
1.3 Simple Linear Regression Model
with Distribution of Error Terms
Unspecified 9
Formal Statement of Model 9
Important Features of Model 9
Meaning of Regression Parameters 11
Alternative Versions of Regression Model 12
1.4 Data for Regression Analysis 12
Observational Data 12
Experimental Data 13
Completely Randomized Design 13
1.5 Overview of Steps in Regression
Analysis 13
1.6 Estimation of Regression Function 15
Method of Least Squares 15
Point Estimation of Mean Response 21
Residuals 22
Properties of Fitted Regression Line 23
1.7 Estimation of Error Terms Variance 2 24
Point Estimator of 2 24
1.8 Normal Error Regression Model 26
Model 26
Estimation of Parameters by Method
of Maximum Likelihood 27
Cited References 33
Problems 33
Exercises 37
Projects 38
Chapter 2
Inferences in Regression and Correlation
Analysis 40
2.1 Inferences Concerning 1 40
Sampling Distribution of b1 41
Sampling Distribution of b1 -1s{b1} 44
Confidence Interval for 1 45
Tests Concerning 1 47
2.2 Inferences Concerning 0 48
Sampling Distribution of b0 48
Sampling Distribution of b0 -0s{b0} 49
Confidence Interval for 0 49
2.3 Some Considerations on Making Inferences
Concerning 0 and 1 50
Effects of Departures from Normality 50
Interpretation of Confidence Coefficient
and Risks of Errors 50
Spacing of the X Levels 50
Power of Tests 50
2.4 Interval Estimation of E{Yh} 52
Sampling Distribution of Y
h 52
Sampling Distribution of
Y
h - E{Yh}s{ Y
h} 54
Confidence Interval for E{Yh} 54
2.5 Prediction of New Observation 55
Prediction Interval for Yhnew when
Parameters Known 56
Prediction Interval for Yhnew when
Parameters Unknown 57
Prediction of Mean of m New Observations
for Given Xh 60
2.6 Confidence Band for Regression Line 61
2.7 Analysis of Variance Approach
to Regression Analysis 63
Partitioning of Total Sum of Squares 63
Breakdown of Degrees of Freedom 66
x
Contents xi
Mean Squares 66
Analysis of Variance Table 67
Expected Mean Squares 68
F Test of 1 = 0 versus 1 _= 0
69
2.8 General Linear Test Approach 72
Full Model 72
Reduced Model 72
Test Statistic 73
Summary 73
2.9 Descriptive Measures of Linear Association
between X and Y 74
Coefficient of Determination 74
Limitations of R2 75
Coefficient of Correlation 76
2.10 Considerations in Applying Regression
Analysis 77
2.11 Normal Correlation Models 78
Distinction between Regression and
Correlation Model 78
Bivariate Normal Distribution 78
Conditional Inferences 80
Inferences on Correlation Coefficients 83
Spearman Rank Correlation Coefficient 87
Cited References 89
Problems 89
Exercises 97
Projects 98
Chapter 3
Diagnostics and Remedial Measures 100
3.1 Diagnostics for Predictor Variable 100
3.2 Residuals 102
Properties of Residuals 102
Semistudentized Residuals 103
Departures from Model to Be Studied by
Residuals 103
3.3 Diagnostics for Residuals 103
Nonlinearity of Regression Function 104
Nonconstancy of Error Variance 107
Presence of Outliers 108
Nonindependence of Error Terms 108
Nonnormality of Error Terms 110
Omission of Important Predictor
Variables 112
Some Final Comments 114
3.4 Overview of Tests Involving
Residuals 114
Tests for Randomness 114
Tests for Constancy of Variance 115
Tests for Outliers 115
Tests for Normality 115
3.5 Correlation Test for Normality 115
3.6 Tests for Constancy of Error
Variance 116
Brown-Forsythe Test 116
Breusch-Pagan Test 118
3.7 F Test for Lack of Fit 119
Assumptions 119
Notation 121
Full Model 121
Reduced Model 123
Test Statistic 123
ANOVA Table 124
3.8 Overview of Remedial Measures 127
Nonlinearity of Regression
Function 128
Nonconstancy of Error Variance 128
Nonindependence of Error Terms 128
Nonnormality of Error Terms 128
Omission of Important Predictor
Variables 129
Outlying Observations 129
3.9 Transformations 129
Transformations for Nonlinear
Relation Only 129
Transformations for Nonnormality
and Unequal Error Variances 132
Box-Cox Transformations 134
3.10 Exploration of Shape of Regression
Function 137
Lowess Method 138
Use of Smoothed Curves to Confirm Fitted
Regression Function 139
3.11 Case ExamplePlutonium
Measurement 141
Cited References 146
Problems 146
Exercises 151
Projects 152
Case Studies 153
xii Contents
Chapter 4
Simultaneous Inferences and Other
Topics in Regression Analysis 154
4.1 Joint Estimation of 0 and 1 154
Need for Joint Estimation 154
Bonferroni Joint Confidence Intervals 155
4.2 Simultaneous Estimation of Mean
Responses 157
Working-Hotelling Procedure 158
Bonferroni Procedure 159
4.3 Simultaneous Prediction Intervals
for New Observations 160
4.4 Regression through Origin 161
Model 161
Inferences 161
Important Cautions for Using Regression
through Origin 164
4.5 Effects of Measurement Errors 165
Measurement Errors in Y 165
Measurement Errors in X 165
Berkson Model 167
4.6 Inverse Predictions 168
4.7 Choice of X Levels 170
Cited References 172
Problems 172
Exercises 175
Projects 175
Chapter 5
Matrix Approach to Simple
Linear Regression Analysis 176
5.1 Matrices 176
Definition of Matrix 176
Square Matrix 178
Vector 178
Transpose 178
Equality of Matrices 179
5.2 Matrix Addition and Subtraction 180
5.3 Matrix Multiplication 182
Multiplication of a Matrix by a Scalar 182
Multiplication of a Matrix by a Matrix 182
5.4 Special Types of Matrices 185
Symmetric Matrix 185
Diagonal Matrix 185
Vector and Matrix with All Elements
Unity 187
Zero Vector 187
5.5 Linear Dependence and Rank
of Matrix 188
Linear Dependence 188
Rank of Matrix 188
5.6 Inverse of a Matrix 189
Finding the Inverse 190
Uses of Inverse Matrix 192
5.7 Some Basic Results for Matrices 193
5.8 Random Vectors and Matrices 193
Expectation of Random Vector or Matrix 193
Variance-Covariance Matrix
of Random Vector 194
Some Basic Results 196
Multivariate Normal Distribution 196
5.9 Simple Linear Regression Model
in Matrix Terms 197
5.10 Least Squares Estimation
of Regression Parameters 199
Normal Equations 199
Estimated Regression Coefficients 200
5.11 Fitted Values and Residuals 202
Fitted Values 202
Residuals 203
5.12 Analysis of Variance Results 204
Sums of Squares 204
Sums of Squares as Quadratic
Forms 205
5.13 Inferences in Regression Analysis 206
Regression Coefficients 207
Mean Response 208
Prediction of New Observation 209
Cited Reference 209
Problems 209
Exercises 212
PART TWO
MULTIPLE LINEAR
REGRESSION 213
Chapter 6
Multiple Regression I 214
6.1 Multiple Regression Models 214
Contents xiii
Need for Several Predictor Variables 214
First-Order Model with Two Predictor
Variables 215
First-Order Model with More than Two
Predictor Variables 217
General Linear Regression Model 217
6.2 General Linear Regression Model in Matrix
Terms 222
6.3 Estimation of Regression Coefficients 223
6.4 Fitted Values and Residuals 224
6.5 Analysis of Variance Results 225
Sums of Squares and Mean Squares 225
F Test for Regression Relation 226
Coefficient of Multiple Determination 226
Coefficient of Multiple Correlation 227
6.6 Inferences about Regression
Parameters 227
Interval Estimation of k 228
Tests for k 228
Joint Inferences 228
6.7 Estimation of Mean Response and
Prediction of New Observation 229
Interval Estimation of E{Yh} 229
Confidence Region for Regression
Surface 229
Simultaneous Confidence Intervals for Several
Mean Responses 230
Prediction of New Observation Yhnew 230
Prediction of Mean of m New Observations
at Xh 230
Predictions of g New Observations 231
Caution about Hidden Extrapolations 231
6.8 Diagnostics and Remedial Measures 232
Scatter Plot Matrix 232
Three-Dimensional Scatter Plots 233
Residual Plots 233
Correlation Test for Normality 234
Brown-Forsythe Test for Constancy of Error
Variance 234
Breusch-Pagan Test for Constancy of Error
Variance 234
F Test for Lack of Fit 235
Remedial Measures 236
6.9 An ExampleMultiple Regression with
Two Predictor Variables 236
Setting 236
Basic Calculations 237
Estimated Regression Function 240
Fitted Values and Residuals 241
Analysis of Appropriateness of Model 241
Analysis of Variance 243
Estimation of Regression Parameters 245
Estimation of Mean Response 245
Prediction Limits for New Observations 247
Cited Reference 248
Problems 248
Exercises 253
Projects 254
Chapter 7
Multiple Regression II 256
7.1 Extra Sums of Squares 256
Basic Ideas 256
Definitions 259
Decomposition of SSR into Extra Sums
of Squares 260
ANOVA Table Containing Decomposition
of SSR 261
7.2 Uses of Extra Sums of Squares in Tests for
Regression Coefficients 263
Test whether a Single k = 0 263
Test whether Several k = 0 264
7.3 Summary of Tests Concerning Regression
Coefficients 266
Test whether All k = 0 266
Test whether a Single k = 0 267
Test whether Some k = 0 267
Other Tests 268
7.4 Coefficients of Partial Determination 268
Two Predictor Variables 269
General Case 269
Coefficients of Partial Correlation 270
7.5 Standardized Multiple Regression
Model 271
Roundoff Errors in Normal Equations
Calculations 271
Lack of Comparability in Regression
Coefficients 272
Correlation Transformation 272
Standardized Regression Model 273
X_X Matrix for Transformed Variables 274
xiv Contents
Estimated Standardized Regression
Coefficients 275
7.6 Multicollinearity and Its Effects 278
Uncorrelated Predictor Variables 279
Nature of Problem when Predictor Variables
Are Perfectly Correlated 281
Effects of Multicollinearity 283
Need for More Powerful Diagnostics for
Multicollinearity 289
Cited Reference 289
Problems 289
Exercise 292
Projects 293
Chapter 8
Regression Models for Quantitative
and Qualitative Predictors 294
8.1 Polynomial Regression Models 294
Uses of Polynomial Models 294
One Predictor VariableSecond Order 295
One Predictor VariableThird Order 296
One Predictor VariableHigher Orders 296
Two Predictor VariablesSecond Order 297
Three Predictor VariablesSecond
Order 298
Implementation of Polynomial Regression
Models 298
Case Example 300
Some Further Comments on Polynomial
Regression 305
8.2 Interaction Regression Models 306
Interaction Effects 306
Interpretation of Interaction Regression
Models with Linear Effects 306
Interpretation of Interaction Regression
Models with Curvilinear Effects 309
Implementation of Interaction Regression
Models 311
8.3 Qualitative Predictors 313
Qualitative Predictor with Two
Classes 314
Interpretation of Regression Coefficients 315
Qualitative Predictor with More than Two
Classes 318
Time Series Applications 319
8.4 Some Considerations in Using Indicator
Variables 321
Indicator Variables versus Allocated
Codes 321
Indicator Variables versus Quantitative
Variables 322
Other Codings for Indicator Variables 323
8.5 Modeling Interactions between Quantitative
and Qualitative Predictors 324
Meaning of Regression Coefficients 324
8.6 More Complex Models 327
More than One Qualitative Predictor
Variable 328
Qualitative Predictor Variables Only 329
8.7 Comparison of Two or More Regression
Functions 329
Soap Production Lines Example 330
Instrument Calibration Study Example 334
Cited Reference 335
Problems 335
Exercises 340
Projects 341
Case Study 342
Chapter 9
Building the Regression Model I:
Model Selection and Validation 343
9.1 Overview of Model-Building Process 343
Data Collection 343
Data Preparation 346
Preliminary Model Investigation 346
Reduction of Explanatory Variables 347
Model Refinement and Selection 349
Model Validation 350
9.2 Surgical Unit Example 350
9.3 Criteria for Model Selection 353
R2
p or SSEp Criterion 354
R2
a,p or MSEp Criterion 355
Mallows Cp Criterion 357
AICp and SBCp Criteria 359
PRESSp Criterion 360
9.4 Automatic Search Procedures for Model
Selection 361
Best Subsets Algorithm 361
Stepwise Regression Methods 364
Contents xv
Forward Stepwise Regression 364
Other Stepwise Procedures 367
9.5 Some Final Comments on Automatic
Model Selection Procedures 368
9.6 Model Validation 369
Collection of New Data to Check
Model 370
Comparison with Theory, Empirical
Evidence, or Simulation Results 371
Data Splitting 372
Cited References 375
Problems 376
Exercise 380
Projects 381
Case Studies 382
Chapter 10
Building the Regression Model II:
Diagnostics 384
10.1 Model Adequacy for a Predictor
VariableAdded-Variable Plots 384
10.2 Identifying Outlying Y Observations
Studentized Deleted Residuals 390
Outlying Cases 390
Residuals and Semistudentized
Residuals 392
Hat Matrix 392
Studentized Residuals 394
Deleted Residuals 395
Studentized Deleted Residuals 396
10.3 Identifying Outlying X ObservationsHat
Matrix Leverage Values 398
Use of Hat Matrix for Identifying Outlying
X Observations 398
Use of Hat Matrix to Identify Hidden
Extrapolation 400
10.4 Identifying Influential CasesDFFITS,
Cooks Distance, and DFBETAS
Measures 400
Influence on Single Fitted
ValueDFFITS 401
Influence on All Fitted ValuesCooks
Distance 402
Influence on the Regression
CoefficientsDFBETAS 404
Influence on Inferences 405
Some Final Comments 406
10.5 Multicollinearity DiagnosticsVariance
Inflation Factor 406
Informal Diagnostics 407
Variance Inflation Factor 408
10.6 Surgical Unit ExampleContinued 410
Cited References 414
Problems 414
Exercises 419
Projects 419
Case Studies 420
Chapter 11
Building the Regression Model III:
Remedial Measures 421
11.1 Unequal Error Variances Remedial
MeasuresWeighted Least Squares 421
Error Variances Known 422
Error Variances Known up to
Proportionality Constant 424
Error Variances Unknown 424
11.2 Multicollinearity Remedial
MeasuresRidge Regression 431
Some Remedial Measures 431
Ridge Regression 432
11.3 Remedial Measures for Influential
CasesRobust Regression 437
Robust Regression 438
IRLS Robust Regression 439
11.4 Nonparametric Regression: Lowess
Method and Regression Trees 449
Lowess Method 449
Regression Trees 453
11.5 Remedial Measures for Evaluating
Precision in Nonstandard
SituationsBootstrapping 458
General Procedure 459
Bootstrap Sampling 459
Bootstrap Confidence Intervals 460
11.6 Case ExampleMNDOT Traffic
Estimation 464
The AADT Database 464
Model Development 465
Weighted Least Squares Estimation 468
xvi Contents
Cited References 471
Problems 472
Exercises 476
Projects 476
Case Studies 480
Chapter 12
Autocorrelation in Time
Series Data 481
12.1 Problems of Autocorrelation 481
12.2 First-Order Autoregressive Error
Model 484
Simple Linear Regression 484
Multiple Regression 484
Properties of Error Terms 485
12.3 Durbin-Watson Test for
Autocorrelation 487
12.4 Remedial Measures for
Autocorrelation 490
Addition of Predictor Variables 490
Use of Transformed Variables 490
Cochrane-Orcutt Procedure 492
Hildreth-Lu Procedure 495
First Differences Procedure 496
Comparison of Three Methods 498
12.5 Forecasting with Autocorrelated Error
Terms 499
Cited References 502
Problems 502
Exercises 507
Projects 508
Case Studies 508
PART THREE
NONLINEAR REGRESSION 509
Chapter 13
Introduction to Nonlinear Regression
and Neural Networks 510
13.1 Linear and Nonlinear Regression
Models 510
Linear Regression Models 510
Nonlinear Regression Models 511
Estimation of Regression Parameters 514
13.2 Least Squares Estimation in Nonlinear
Regression 515
Solution of Normal Equations 517
Direct Numerical SearchGauss-Newton
Method 518
Other Direct Search Procedures 525
13.3 Model Building and Diagnostics 526
13.4 Inferences about Nonlinear Regression
Parameters 527
Estimate of Error Term Variance 527
Large-Sample Theory 528
When Is Large-Sample Theory
Applicable? 528
Interval Estimation of a Single k 531
Simultaneous Interval Estimation
of Several k 532
Test Concerning a Single k 532
Test Concerning Several k 533
13.5 Learning Curve Example 533
13.6 Introduction to Neural Network
Modeling 537
Neural Network Model 537
Network Representation 540
Neural Network as Generalization of Linear
Regression 541
Parameter Estimation: Penalized Least
Squares 542
Example: Ischemic Heart Disease 543
Model Interpretation and
Prediction 546
Some Final Comments on Neural Network
Modeling 547
Cited References 547
Problems 548
Exercises 552
Projects 552
Case Studies 554
Chapter 14
Logistic Regression, Poisson Regression,
and Generalized Linear Models 555
14.1 Regression Models with Binary Response
Variable 555
Meaning of Response Function when
Outcome Variable Is Binary 556
Contents xvii
Special Problems when Response Variable
Is Binary 557
14.2 Sigmoidal Response Functions
for Binary Responses 559
Probit Mean Response Function 559
Logistic Mean Response Function 560
Complementary Log-Log Response
Function 562
14.3 Simple Logistic Regression 563
Simple Logistic Regression Model 563
Likelihood Function 564
Maximum Likelihood Estimation 564
Interpretation of b1 567
Use of Probit and Complementary Log-Log
Response Functions 568
Repeat ObservationsBinomial
Outcomes 568
14.4 Multiple Logistic Regression 570
Multiple Logistic Regression Model 570
Fitting of Model 571
Polynomial Logistic Regression 575
14.5 Inferences about Regression
Parameters 577
Test Concerning a Single k: Wald
Test 578
Interval Estimation of a Single k 579
Test whether Several k = 0: Likelihood
Ratio Test 580
14.6 Automatic Model Selection
Methods 582
Model Selection Criteria 582
Best Subsets Procedures 583
Stepwise Model Selection 583
14.7 Tests for Goodness of Fit 586
Pearson Chi-Square Goodness
of Fit Test 586
Deviance Goodness of Fit Test 588
Hosmer-Lemeshow Goodness
of Fit Test 589
14.8 Logistic Regression Diagnostics 591
Logistic Regression Residuals 591
Diagnostic Residual Plots 594
Detection of Influential
Observations 598
14.9 Inferences about
Mean Response 602
Point Estimator 602
Interval Estimation 602
Simultaneous Confidence Intervals for
Several Mean Responses 603
14.10 Prediction of a New Observation 604
Choice of Prediction Rule 604
Validation of Prediction Error Rate 607
14.11 Polytomous Logistic Regression for
Nominal Response 608
Pregnancy Duration Data
with Polytomous Response 609
J - 1 Baseline-Category Logits for
Nominal Response 610
Maximum Likelihood Estimation 612
14.12 Polytomous Logistic Regression
for Ordinal Response 614
14.13 Poisson Regression 618
Poisson Distribution 618
Poisson Regression Model 619
Maximum Likelihood Estimation 620
Model Development 620
Inferences 621
14.14 Generalized Linear Models 623
Cited References 624
Problems 625
Exercises 634
Projects 635
Case Studies 640
Appendix A
Some Basic Results in Probability
and Statistics
Appendix B
Tables
Appendix C
Data Sets
Appendix D
Selected Bibliography
Index