Multi-collineartity, Variance Inflation
and Orthogonalization in Regression

Chong Ho (Alex) Yu, Ph.D., D. Phil. (2022)

Introduction
The purpose of a regression model is to find out to what extent the outcome (dependent variable) can be predicted by the independent variables. The strength of the prediction is indicated by R², also known as variance explained or strength of determination.

It is important to notice that the value of R² alone cannot tell you how well your model is specified. Take the following four cases as examples. In the Venn diagrams below the overlapping area between Y and X(X1, X2) is the variance explained. In all four cases the superimposed areas between Y and X are almost the same. Numerically you cannot tell much difference when the R²s are .45, .48, .41, .40. Actually, all these models are very different.

<

In case 1, X₁ and X₂ are related; X₁ and Y are related, but X₂ and Y has no relationship. For example, the number of hours of study is related to test scores, the frequency of going to the restroom is related to study (you drink more coffee to stay up), but going to the bathroom is not related to the test performance.

In case 2, both X₁ and X₂ contribute to some unique variance explained to Y, but they also have some common variance explained. For example, drinking and smoking can cause cancer. And many smokers are also alcoholics.

In case 3, again both X₁ and X₂ contribute unique variance explained to Y, but X₁ and X₂ are totally unrelated (orthogonal). For instance, mathematical intelligence and verbal intelligence could predict competence in business, but these two types of intelligence have no relationship. A good speaker may not be able to count from one to ten.

In case 4, although both X₁ and X₂ could predict Y. The variance explained contributed by X₂ has been covered by X₁ because X₁ and X₂ are too correlated (collinear).

The above cases are not exhaustive. There are many other possible combinations between Y and Xs. Without looking at the relationship between regressors, the researcher runs a risk of mis-specify a regression model even though the R² looks good. This tutorial is focused on the last case-- collinearity.

Menu

Next

Navigation

Index

Simplified Navigation

Table of Contents

Search Engine

Contact

Multi-collineartity, Variance Inflation and Orthogonalization in Regression

Introduction

Menu

Next

Navigation

Index

Simplified Navigation

Table of Contents

Search Engine

Contact

Multi-collineartity, Variance Inflation
and Orthogonalization in Regression