Model Estimation 279
distributions of the data. If we do not know the distribution, we cannot
compute the likelihood function. ML methods can be difficult to apply in
practice in models that include hidden variables.^5 We will first give an intu-
ition for the method and then illustrate how MLE works for regressions.
The intuition of choosing those parameters that maximize the likeli-
hood of the sample is simple. For example, suppose we toss a coin 1,000
times and get 950 heads and 50 tails. What can we conclude about the
probability of obtaining heads or tails in tossing that coin? Although every
sequence is theoretically possible, intuition tells us that the coin is biased
and that it is reasonable to assume that the probability of a head is 95% and
the probability of a tail is only 5%.
The MLE principle formalizes this intuition. If we let p denote the prob-
ability of heads and q the probability of tails, then any particular sequence
that contains 950 heads and 50 tails has probability
Lp=−^950 ()p
50
1
This probability L is the likelihood of the sequence. To maximize likelihood,
we equate to zero the derivatives of the likelihood with respect to p:
dL
dp
=−pp−()−−pp()=p
−
950 950 1 1501 1
50 950 50 1 950
()−− −
−
p =
pp
50 950 50
1
0
This equation has three solutions:
()
=
=
−
−
=⇒ −= ⇒= ⇒=
p
p
pp
pp pp
0
1
950 50
1
0 950 15 0 950 1000 0.95
The first two solutions are not feasible and therefore the maximum likeli-
hood is obtained for p= 09 5. as suggested by intuition.
application of Mle to regression Models
Let’s now discuss how to apply the MLE principle to regressions, as well as
factor models described in the previous chapter. Let’s first see how the MLE
(^5) A hidden variable is a variable that is not observed but can be computed in function
of the data. For example, in ARCH/GARCH models described in Chapter 11, volatil-
ity is a hidden variable computed through the ARCH/GARCH model.