Texas

Ridge regression is a method of estimating the coefficients of multiple-regression models in scenarios where the independent variables are highly correlated.^[1] It has been used in many fields including econometrics, chemistry, and engineering.^[2]

The theory was first introduced by Hoerl and Kennard in 1970 in their Technometrics papers “RIDGE regressions: biased estimation of nonorthogonal problems” and “RIDGE regressions: applications in nonorthogonal problems”.^[3]^[4]^[1] This was the result of ten years of research into the field of ridge analysis.^[5]

Ridge regression was developed as a possible solution to the imprecision of least square estimators when linear regression models have some multicollinear (highly correlated) independent variables—by creating a ridge regression estimator (RR). This provides a more precise ridge parameters estimate, as its variance and mean square estimator are often smaller than the least square estimators previously derived.^[6]^[2]

Mathematical details

In standard linear regression, an ${\textstyle n\times 1}$ column vector ${\textstyle y}$ is to be projected onto the column space of the ${\textstyle n\times p}$ design matrix ${\textstyle X}$ (typically ${\textstyle p\ll n}$ ) whose columns are highly correlated. The ordinary least squares estimator of the coefficients ${\textstyle \beta \in \mathbb {R} ^{p\times 1}}$ by which the columns are multiplied to get the orthogonal projection ${\textstyle X\beta }$ is

{\widehat {\beta }}=(X^{T}X)^{-1}X^{T}y

(where ${\textstyle X^{T}}$ is the transpose of ${\textstyle X}$ ).

In situations where the dependent variables of the regression problem (columns of $X$ ) are highly correlated, the inverse above may be difficult to compute (see Multicollinearity). So ridge regression might be used, in which the regression coefficients are computed using the alternate formula:

{\widehat {\beta }}_{\text{ridge}}=(X^{T}X+kI_{p})^{-1}X^{T}y

where ${\textstyle I_{p}}$ is the ${\textstyle p\times p}$ identity matrix and ${\textstyle k>0}$ is small. The name 'ridge' refers to the shape along the diagonal of I.

References

^ ^a ^b Hilt, Donald E.; Seegrist, Donald W. (1977). Ridge, a computer program for calculating ridge regression estimates. doi:10.5962/bhl.title.68934.^{[page needed]}
^ ^a ^b Gruber, Marvin (1998). Improving Efficiency by Shrinkage: The James--Stein and Ridge Regression Estimators. CRC Press. p. 2. ISBN 978-0-8247-0156-7.
^ Hoerl, Arthur E.; Kennard, Robert W. (1970). "Ridge Regression: Biased Estimation for Nonorthogonal Problems". Technometrics. 12 (1): 55–67. doi:10.2307/1267351. JSTOR 1267351.
^ Hoerl, Arthur E.; Kennard, Robert W. (1970). "Ridge Regression: Applications to Nonorthogonal Problems". Technometrics. 12 (1): 69–82. doi:10.2307/1267352. JSTOR 1267352.
^ Beck, James Vere; Arnold, Kenneth J. (1977). Parameter Estimation in Engineering and Science. James Beck. p. 287. ISBN 978-0-471-06118-2.
^ Jolliffe, I. T. (2006). Principal Component Analysis. Springer Science & Business Media. p. 178. ISBN 978-0-387-22440-4.

[Hilt-1] Hilt, Donald E.; Seegrist, Donald W. (1977). Ridge, a computer program for calculating ridge regression estimates. doi:10.5962/bhl.title.68934.^{[page needed]}

[Gruber-2] Gruber, Marvin (1998). Improving Efficiency by Shrinkage: The James--Stein and Ridge Regression Estimators. CRC Press. p. 2. ISBN 978-0-8247-0156-7.

[3] Hoerl, Arthur E.; Kennard, Robert W. (1970). "Ridge Regression: Biased Estimation for Nonorthogonal Problems". Technometrics. 12 (1): 55–67. doi:10.2307/1267351. JSTOR 1267351.

[4] Hoerl, Arthur E.; Kennard, Robert W. (1970). "Ridge Regression: Applications to Nonorthogonal Problems". Technometrics. 12 (1): 69–82. doi:10.2307/1267352. JSTOR 1267352.

[Beck-5] Beck, James Vere; Arnold, Kenneth J. (1977). Parameter Estimation in Engineering and Science. James Beck. p. 287. ISBN 978-0-471-06118-2.

[Jolliffe-6] Jolliffe, I. T. (2006). Principal Component Analysis. Springer Science & Business Media. p. 178. ISBN 978-0-387-22440-4.

[1]

[2]

[3]

[4]

[5]

[6]