P1: Sqe Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-05 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 19:23
124 Data Mining Essentials
Therefore,
XTY=XTXW. (5.44)
Since XTX is invertible for any X, by multiplying both sides by
(XTX)−^1 , we get
W=(XTX)−^1 XTY. (5.45)
Alternatively, one can compute the singular value decomposition (SVD)
ofX=U VT:
W=(XTX)−^1 XTY
=(V UTU VT)−^1 V UTY
=(V 2 VT)−^1 V UTY
=V −^2 VTV UTY
=V −^1 UTY, (5.46)
and since we can have zero singular values,
W=V +UTY, (5.47)
where +is the submatrix of with nonzero singular values.
Logistic Regression
Logistic regression provides a probabilistic view of regression. For simplic-
ity, let us assume that the class attribute can only take values of 0 and 1.
Formally, logistic regression finds probabilitypsuch that
P(Y= 1 |X)=p, (5.48)
whereXis the vector of features andYis the class attribute. We can use
linear regression to approximatep. In other words, we can assume that
probabilitypdepends onX; that is,
p=βX, (5.49)
whereβis a vector of coefficients. Unfortunately,βXcan take unbounded
values becauseXcan take on any value and there are no constraints on how
β’s are chosen. However, probabilitypmust be in range [0,1]. SinceβX
is unbounded, we can perform a transformationg(.)onpsuch that it also