Social Media Mining: An Introduction

(Axel Boer) #1

P1: Sqe Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-05 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 19:23


5.4 Supervised Learning 123

a dataset where attributes are represented usingx 1 ,x 2 ,...,xm(also known
asregressors) and class attribute is represented usingY(also known as the
dependent variable), where the class attribute is a real number. We want to
find the relation betweenYand the vectorX=(x 1 ,x 2 ,...,xm). We discuss
two basic regression techniques: linear regression and logistic regression.

Linear Regression
In linear regression, we assume that the class attributeYhas a linear relation
with the regressors (feature set)Xby considering a linear error. In other
words,

Y=XW+
, (5.40)

whereWrepresents the vector of regression coefficients. The problem of
regression can be solved by estimatingWusing the training dataset and
its labelsYsuch that fitting error is minimized. A variety of methods
have been introduced to solve the linear regression problem, most of which
use least squares or maximum-likelihood estimation. We employ the least
squares technique here. Interested readers can refer to the bibliographic
notes for more detailed analyses. In the least square method, we findW
using regressorsXand labelsYsuch that the square of fitting errorepsilon
is minimized.


2 =||
2 || = ||Y−XW||^2. (5.41)

To minimize , we compute the gradient and set it to zero to find the
optimalW:

∂||Y−XW||^2
∂W

= 0. (5.42)


We know that for anyX,||X||^2 =(XTX); therefore,

∂||Y−XW||^2
∂W

=


∂(Y−XW)T(Y−XW)


∂W


=


∂(YT−WTXT)(Y−XW)


∂W


=


∂(YTY−YTXW−WTXTY+WTXTXW)


∂W


=− 2 XTY+ 2 XTXW= 0. (5.43)

Free download pdf