84 The Basics of financial economeTrics
jth independent variable. Therefore, we need to regress the jth variable on the
remaining k − 1 variables. The resulting regression would look like
(^) xcj=+bx 11 ()j +...+++bxbxj()j−−++11 11jj()j j .... +=bxk()j k jk 12 ,,...,
Then we obtain the coefficient of determination of this regression, Rj^2.
This, again, is used to divide the original variance of the jth regression coef-
ficient estimate by a correction term. This correction term is called the vari-
ance inflation factor (VIF) and is expressed as
(^) VIF=
−
1
() 1 R^2 j
(4.2)
So, if there is no correlation present between independent variable j and
the other independent variables, the variance of bj will remain the same and
the t-test results will be unchanged. On the contrary, in the case of more
intense correlation, the variance will increase and most likely reject variable
xj as significant for the overall regression.
Consequently, prediction for the jth regression coefficient becomes less
precise since its confidence interval increases due to equation (4.2).^2 The
confidence interval for the regression coefficient at the level α is given by
btjb−+sbjjjbts
αα// 22 ⋅⋅,
(4.3)
where tα/2 is the critical value at level α of the t-distribution with n − k
degrees of freedom. This means that with probability 1−α, the true coeffi-
cient is inside of this interval.^3 Naturally, the result of some VIF > 1 leads to
a widening of the confidence interval given by equation (4.3).
As a rule of thumb, a benchmark for the VIF is often given as 10. A VIF
that exceeds 10 indicates a severe impact due to multicollinearity and the
independent variable is best removed from the regression.
Model Building Techniques
We now turn our attention to the model building process in the sense that
we attempt to find the independent variables that best explain the variation
in the dependent variable y. At the outset, we do not know how many and
(^2) The confidence level is often chosen as 1 – α = 0.99 or 1 – α = 0.95 such that the
parameter is inside of the interval with 0.95 or 0.99 probability, respectively.
(^3) This is based on the assumptions stated in the context of estimation.