Robust Regressions 159
(^) θβ 0 βε^2 β
11
2
,..., Nt
t
T
tjtj
j
N
t
()==YX−
==
∑∑
==
∑
1
T
(8.2)
or, equivalently, equating to zero their derivatives which imply solving the
system of N + 1 equations,
∂
∂
=−
==
= =
∑∑
θ
β
β
k
tjtj
j
N
t
T
YXXktk
1 1
00 ,,. 1 ...,N (8.3)
or, in matrix notation, X′Xβ = X′Y. The solution of this system is
(^) β=βˆ ()XX′′−^1 XY (8.4)
From equation (8.1), the fitted values (i.e., the LS estimates of the expec-
tations) of the Y are
YXˆ= ()XXXY′′−^1 =HY (8.5)
The H matrix is called the hat matrix because it puts a hat on; that is, it
computes the expectation Yˆ of the Y. The hat matrix H is a symmetric T × T
projection matrix; that is, the following relationship holds: HH = H. The
matrix H has N eigenvalues equal to 1 and T − N eigenvalues equal to 0. Its
diagonal elements, hi ≡ hii satisfy:
0 ≤ hi ≤ 1
and its trace (i.e., the sum of its diagonal elements)^1 is equal to N:
tr(H) = N
Under the assumption that the errors are independent and identically
distributed with mean zero and variance σ^2 , it can be demonstrated that the
Yˆ are consistent, that is, YYˆ→E() in probability when the sample becomes
infinite if and only if h = max(hi) → 0. Points where the hi have large values
are called leverage points. It can be demonstrated that the presence of lever-
age points signals that there are observations that might have a decisive
influence on the estimation of the regression parameters. A rule of thumb,
reported in Huber,^2 suggests that values hi ≤ 0.2 are safe, values 0.2 ≤ hi ≤
0.5 require careful attention, and higher values are to be avoided.
(^1) See Appendix D.
(^2) Peter J. Huber, Robust Statistics (New York: John Wiley & Sons, 1981).