Social Media Mining: An Introduction

(Axel Boer) #1

P1: Sqe Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-05 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 19:23


5.4 Supervised Learning 119

Algorithm 5.1k-Nearest Neighbor Classifier
Require:Instancei, A Dataset of Real-Value Attributes,k(number of
neighbors), distance measured
1: return Class label for instancei
2: Computeknearest neighbors of instanceibased on distance mea-
sured.
3: l=the majority class label among neighbors of instancei.Ifmore
than one majority label, select one randomly.
4: Classify instanceias classl

Since 63 P^4 (i 8 )> 28 P^1 (i 8 ), for instance i 8 , and based on NBC calculations,
we have Play Golf=N.

5.4.3 Nearest Neighbor Classifier
As the name suggests,k-nearest neighbor orkNNuses theknearest
instances, called neighbors, to perform classification. The instance being
classified is assigned the label (class attribute value) that the majority of
itskneighbors are assigned. The algorithm is outlined in Algorithm5.1.
Whenk=1, the closest neighbor’s label is used as the predicted label for
the instance being classified. To determine the neighbors of an instance, we
need to measure its distance to all other instances based on some distance
metric. Commonly, Euclidean distance is employed; however, for higher
dimensional spaces, Euclidean distance becomes less meaningful and other
distance measures can be used.
Example 5.5. Consider the example depicted in Figure5.4. As shown,
depending on the value of k, different labels can be predicted for the circle.
In our example, k= 5 and k= 9 generate different labels for the instance
(triangle and square, respectively).
As shown in our example, an important issue with thek-nearest neighbor
algorithm is the choice ofk. The choice ofkcan easily change the label of
the instance being predicted. In general, we are interested in a value ofk
that maximizes the performance of the learning algorithm.

5.4.4 Classification with Network Information
Consider a friendship network on social media and a product being marketed
to this network. The product seller wants to know who the potential buyers
Free download pdf