236 The Basics of financial economeTrics
Assumptions of Linear Regression
First recall from Chapter 3 that all variables in a linear regression are observ-
able. In general, the explanatory variables, denoted by xi, are assumed to be
either deterministic or random variables while the dependent variable, denoted
by y, is assumed to be a random variable. This implies that the observed val-
ues of the explanatory variables xi are assumed to be the true values of the
variables while the dependent variable y is only known with some error.
Note that assuming that regressors are random variables is not the same
as assuming that there are errors in the regressors. Assuming random regres-
sors simply means assuming that the sample regressors data are not fixed
but can change with a given distribution. This fact leaves estimation tech-
niques unchanged but impacts the accuracy of estimates. Regression models
with errors in the regressors cannot be handled with the same estimation
techniques as models without errors.
A fundamental assumption of linear regression analysis is that the
residuals, denoted by ε, carry no information: residuals are assumed to be
independent draws from the same distribution. A critical assumption is that
residuals and regressors are uncorrelated. Under this assumption, the same
methods can be used to estimate deterministic or random regressors.
The estimation framework for linear regression assumes that we have
a sample formed by S observations ys, s = 1,... , S of the scalar dependent
variable y and by S observations [,...,]',xx 1 ipi iS=^1 , ..., of the p-vector of
explanatory variables [,...xx 1 ,]'p. The regression equation is:
ya=+bx 11 ++bxppε
Linear regression does not place constraints on the number of observations
that are required. Estimation follows the usual rules of statistics according
to which we make better estimates if we have larger samples. As we will see
later in this chapter, some factor models can be defined only in the limit of
infinite samples.
We can look at a regression equation as a static relationship between
variables without any temporal ordering involved. For example, suppose we
want to investigate whether there is a relationship between the earnings of
a firm as the dependent variable, and its level of capitalization and debt as
independent variables. We collect observations of earnings, capitalization,
and debt of N firms and estimate a regression. There is no temporal order
involved in our observations.
However, in many financial applications there is a temporal order-
ing of observations. Suppose we have a sample formed by T observations