Multi-collineartity,
Variance Inflation
and Orthogonalization in Regression
|
Chong Ho (Alex) Yu, Ph.D., D. Phil. (2022)
|
Objectives
This tutorial is in response to the following
common confusion and misconceptions:
- In regression we are looking for relationships. But
statistics learners are puzzled by the notion that strong relationships
among predictors are detrimental when the purpose of the regression
model is explanation rather than prediction.
In order to clarify this confusion, the concepts of multi-collinearity
and variation inflation factor will be explained in
both variable space and subject space.
In both spaces supporting physical objects will be used as an analogy
of supporting regression models.
- The problem of multi-collinearity is often caused
by including too many regressors in a regression model. It is a common
misconception that stepwise regression enables a
researcher to select a subset of variables based upon their relative
"importance." Indeed if variables are correlated, the "importance" of
the variables are tied to the selection order. Other variable selection
criteria such as maximum R-square, root mean square error, and Mallow's
Cp are recommended instead.
- Another confusion is the distinction between mathematical
dependence and logical dependence. In a
regression model involving interaction terms, the
interaction variable is highly related to other independent variables.
However, the problem of multi-collinearity does not invalidate the
regression model. It is because the interaction is only mathematically
dependent but not logically dependent on other predictors. Again, the
metaphor of supporting objects will be used to illustrate the above
difference.
- A polynomial regression
presents a similar confusion. In a polynomial regression the quadratic
term (X2), the cubic term (X3),
or the quartic term (X4) is certainly correlated
to the original variable (X). With a high degree of collinearity, how
can a researcher apply a legitimate polynomial regression? This
tutorial will address this problem.
- Using ridge regression analysis,
orthogonalization and centering
scores can counteract the threat of collinearity. However,
many students do not understand how these methodologies are related to
multi-collinearity. In this tutorial vectors in subject space are used
to clarify these concepts.
Navigation
Index
Simplified Navigation
Table of Contents
Search Engine
Contact
|